Genetic-based approach for cue phrase selection in dialogue act recognition
-
Upload
independent -
Category
Documents
-
view
1 -
download
0
Transcript of Genetic-based approach for cue phrase selection in dialogue act recognition
RESEARCH PAPER
Genetic-based approach for cue phrase selection in dialogue actrecognition
Anwar Ali Yahya Æ Abd Rahman Ramli
Received: 15 August 2008 / Revised: 3 December 2008 / Accepted: 8 December 2008 / Published online: 16 January 2009
� Springer-Verlag 2009
Abstract Automatic cue phrase selection is a crucial step
for designing a dialogue act recognition model using
machine learning techniques. The approaches, currently
used, are based on specific type of feature selection
approaches, called ranking approaches. Despite their
computational efficiency for high dimensional domains,
they are not optimal with respect to relevance and redun-
dancy. In this paper we propose a genetic-based approach
for cue phrase selection which is, essentially, a variable
length genetic algorithm developed to cope with the high
dimensionality of the domain. We evaluate the perfor-
mance of the proposed approach against several ranking
approaches. Additionally, we assess its performance for the
selection of cue phrases enriched by phrase’s type and
phrase’s position. The results provide experimental evi-
dences on the ability of the genetic-based approach to
handle the drawbacks of the ranking approaches and to
exploit cue’s type and cue’s position information to
improve the selection. Furthermore, we validate the use of
the genetic-based approach for machine learning applica-
tions. We use selected sets of cue phrases for building a
dynamic Bayesian networks model for dialogue act rec-
ognition. The results show its usefulness for machine
learning applications.
Keywords Genetic algorithm � Feature selection �Cue phrase selection � Ranking feature selection �Dialogue act recognition
1 Introduction
Dialogue act (DA) is defined as a concise abstraction of a
speaker’s intention in his utterance. It has roots in several
language theories of meaning, particularly speech act the-
ory [4], which interprets any utterance as a kind of action,
called speech act and categorises speech acts into speech
acts categories [60]. DA, however, extends speech act by
taking into account the context of the utterance [7]. Fig-
ure 1 is a hypothetical dialogue annotated with DAs.
The automatic recognition of DA, dialogue act recog-
nition (DAR), is a task of crucial importance for the
processing of natural language at discourse level. It is
defined as follows: given an utterance with its preceding
context, how to determine the DA it realises. Formally, it is
a classification task in which the goal is to assign a suitable
DA to the given utterance. Due to its importance for var-
ious applications such as dialogue systems, machine
translation, speech recognition, and meeting summarisa-
tion, it has received a considerable amount of attention
[33]. For example, in dialogue systems it conditions the
successful interpretation of the user’s utterance and con-
sequently the generation of an appropriate response by the
system. Inspired by their successful applications to many
natural language processing problems, machine learning
(ML) techniques have become the current trend for tack-
ling the DAR problem [19]. In this regard, various ML
techniques have been investigated and the resulting models
have become known as cue-based models [33]. As depicted
in Fig. 2, ML technique builds a cue-based model of DAR
A. A. Yahya (&) � A. R. Ramli
Intelligent System and Robotics Laboratory,
Institute of Advanced Technology,
University Putra Malaysia,
43400 UPM Serdang, Selangor, Malaysia
e-mail: [email protected]
A. R. Ramli
e-mail: [email protected]
123
Evol. Intel. (2009) 1:253–269
DOI 10.1007/s12065-008-0016-6
by learning from utterances of a dialogue corpus the
association rules between surface linguistic features of
utterances and the set of DAs. In doing so, ML exploits
various types of linguistic features such as cue phrases,
syntactic features, prosodic features… etc.
Among different types of linguistic features, cue phrases
are the strongest [34]. They are defined by Hirschberg and
Litman [29] as linguistic expressions that function as
explicit indicators of the structure of a discourse. Since not
all phrases are relevant to the DAR, prior to applying a ML
technique, the selection of relevant cue phrases is of crucial
importance. A successful selection of cue phrases would
speed up the learning process, reduce the required training
data, and improve the classification accuracy [6].
One cue-based model, which has been proposed recently
and used as a context of the current research, is dynamic
Bayesian network (DBN) model [1]. As depicted in Fig. 3,
the DBN model of DAR consists of T time slices, in which
each slice is a Bayesian network (BN) composed of a
number of random variables. The DBN models a sequence
of utterances over time in such a way that each BN cor-
responds to a single utterance. In this sense DBN is time
invariant, meaning that the structure and parameters of BN
is the same for all time slices. Moreover, in each BN, there
is a hidden random variable, which represents the DA that
need to be recognised and a set of observation variables
extracted from the linguistic features of the corresponding
utterance. In this model, dynamic Bayesian ML algorithms
have been employed to construct the DBN model from a
dialogue corpus. An essential issue aroused while building
the DBN model for DAR is the specification of the
observation variables. For this model, it has been suggested
that the number of the random variables in each BN should
be equal to the number of DAs that the model recognises.
Moreover, each variable is defined as a logical rule, dis-
junctive normal form (DNF), consists of a set of cue
phrases which are informative to one and only one DA and
expressed as follows:
DNF ¼ if(p1 _ p2 _ . . . _ pmÞ then DA
where DA is the target DA and pi is a cue phrase selected
for that DA. In doing so, each variable works as a binary
classifier for the target DA.
Obviously, the cue phrase selection in the context of
DBN model of DAR is an instance of feature selection in
high dimensional domains, which are characterised by
huge number of features, numerous irrelevant features, and
high correlation between the features [18]. A very similar
task is the feature selection in the context of text categor-
isation [61], in which the goal is to select the most relevant
phrases for each category of documents. In general, the
selection of relevant cue phrases can be carried out either
manually, by the field expert, or automatically via feature
selection approaches. The problem with the manual
approach is that it generates general cues which cannot be
used for all domains, and therefore the automatic selection
of cue phrase becomes a practical choice [58].
The literature of feature selection reveals of vast number
of approaches developed in various domains [11, 45].
Typically, feature selection approach is composed of three
components: search algorithm to search the space of fea-
ture, evaluation function to evaluate features set, and
performance function to which the resulting set is applied
[11]. Based on the evaluation function, feature selection
approaches can be classified as filters or wrappers. The
wrapper approaches [38] employ ML algorithm as evalu-
ation function to estimate the relevance of the selected
features. On the other hands, the filter approaches pose
feature selection as a separate process from the ML algo-
rithm, therefore the evaluation and the performance
functions are different entities. The filter approaches
themselves can be further categorised into two groups,
namely ranking approaches and subset search approaches,
based on whether they evaluate the relevance of features
individually or as subset. Ranking approaches evaluate
each phrase individually according to a particular metric
(ranging from information theoretic metrics such as Mutual
Information (MI) and Information Gain (IG) to statistical
metrics such as Chi Square (v2) and Odd Ratio (OR) [50,
55] and then pick out the best k phrases. On the other hand,
subset search approaches employ search strategy (exhaus-
tive, heuristic, random search) to search through candidate
feature subsets [12]. The search is guided by a certain
cue-based model
utterance DA
dialogue corpus
ML
Fig. 2 Cue-based DAR model
Speaker Utterance DA
Fig. 1 Hypothetical dialogue annotated with DAs
254 Evol. Intel. (2009) 1:253–269
123
evaluation measure, which captures the goodness of each
subset, and an optimal or near optimal subset is selected
when the search stops [44]. In addition to filter and wrapper
approaches, a hybrid approach approaches were also pro-
posed to take advantages of both filter and wrapper [9, 77].
A typical hybrid approach makes use of both an indepen-
dent filter metric and ML algorithm to evaluate feature
subsets: It uses the filter metric to decide the best subsets
for a given dimension and uses the ML algorithm to select
the final best subset among the best subsets across different
cardinalities. To recap, filters are more computationally
efficient than wrapper; however filters may result in poor
performance because the evaluation function does not
match the performance function well, even if the selected
features is optimal for the evaluation function. Within fil-
ters, although the ranking approaches seem much less
resource intensive than subset search, they have severe
problem with respect to the relevance and redundancy of
the selected features.
With regard to cue phrase selection in DAR, the litera-
ture indicates that only the ranking approaches have been
investigated [35, 42, 58, 68, 70] due to their computational
efficiency, regardless of their inefficiency with respect to
the relevance and redundancy of the selection. More pre-
cisely, while the selection of the features should depend on
their subsequent use, in the ranking approaches the eval-
uation of the selected features is based on the intrinsic
properties of the data rather than their subsequent use [38].
The potential effect is the inclusion of some irrelevant
features or the exclusion of relevant ones. From another
side, the evaluation function assumes that the relevance of
the selected features is equal to the summation of the
individual relevance of each feature. In other words, the
evaluation function does not account for the correlation
among the features; therefore the evaluation function
always overestimates the relevance of the selected features
and results in a redundant selection [20].
Based on the above discussion, we propose a subset
search approach for cue phrase selection to avoid the
problems of the ranking approaches. More specifically, we
propose genetic algorithm (GA) due to its advantage over
many traditional local search heuristic methods, particu-
larly when the search space is highly modal, discontinuous,
or highly constrained [79]. As will be seen in the next
section, GA was applied previously for feature selection as
wrapper and filter in many works. In these works, due to
the low dimensionality of the search space, the application
of GA was straightforward. For cue phrase selection, the
straightforward application of GA is hindered by the high
dimensionality of the search space; therefore we propose
several modifications to adapt GA for this task.
The rest of the paper is organised as follows. Section 2
reviews the previous works of using GA for feature
selection. Section 3 elaborates on how the standard GA is
adapted for cue phrase selection. Section 4 presents the
obtained results and discussions from several cases of
experiments: baseline approaches, genetic-based approach,
genetic-based approach with negative cue phrase, genetic-
based approach with cue’s positional information, and
validation experiments. Finally Sect. 5 concludes the cur-
rent work on cue phrase selection.
2 Genetic-based feature selection
GA is a biologically inspired search algorithm, which is
loosely based on molecular genetics and natural selection.
The basic principles of GA were stated by John Holland
[30]. After that, GA has been reviewed in a number of
works [23, 28, 51, 69]. In the standard GA the candidate
Fig. 3 Example of the DBN model of DAR
Evol. Intel. (2009) 1:253–269 255
123
solutions are represented as bit strings (referred to as
chromosomes) whose interpretation depends on the appli-
cation. The search for an optimal solution begins with a
random population of initial solutions. The chromosomes
of the current population are evaluated relative to a given
measure of fitness, with the fit chromosome selected
probabilistically as seeds for the next population by means
of genetic operations such as random mutation and
crossover.
The seminal work on using GA for feature selection
goes back to [62]. Since then, there have been numerous
works on using GA for feature selection in different modes
(wrapper, filter, or hybrid) and various contexts [45]. As
previously mentioned, in the wrapper mode the ML algo-
rithm is used as the evaluation function, therefore a brief
review of the works of wrapper GA feature selection can be
carried out based on the employed ML algorithm. k-near-
est-neighbor was the first ML algorithm employed as GA
fitness in the seminal work of Siedlecki and Sklansky [62].
In [37], GA with k-nearest neighbor was used to find a
vector of weightings of the features to reduce the effects of
irrelevant or misleading features. Similarly, GA was
combined with k-nearest neighbor to find an optimal fea-
ture weighting to optimise a classification task [57]. This
approach has proven especially useful with large data sets,
where standard feature selection techniques are computa-
tionally expensive. GA with fitness function based on the
classification accuracy of k-nearest neighbor and features
subset complexity was used to improve the performance of
image annotation system [49]. GA and k-nearest neighbor
were combined to select feature (genes) that can jointly
discriminate between different classes of samples (e.g.
normal versus tumor) [43]. This approach is capable of
selecting a subset of predictive genes from a large noisy
data for sample classification.
Artificial neural networks were employed as GA fitness
function for feature selection for pattern classification in
several works. For example, GA with neural networks was
combined for feature selection in pattern classification and
knowledge discovery [74]. It was also used with neural
networks for selecting features for defect classification of
wood boards [8]. In [31], GA with neural network was
proposed to select feature subset to get high accuracy for
classification. Similarly GA with neural network was pro-
posed for feature selection for the classification of different
types of small breast abnormalities [78].
Another ML algorithm that was used as a fitness func-
tion of GA is support vector machine. To cite examples,
[16] used GA with support vector machine for feature
selection in time series classification. In [22], GA with
support vector machine was investigated and compared
with other existing algorithms for feature selection. Also,
Morariu [52] presented GA with a fitness function based on
the support vector machine for feature selection which has
proven to be efficient for nonlinearly separable input data.
For the classification of hyperspectral data, GA with sup-
port vector machine was proposed in [80]. Yu and Cho [75]
proposed a feature selection approach, in which GA was
employed to implement a randomised search and support
vector machine was employed as a base learner for key-
stroke dynamics identity verification.
GA with decision tree, (e.g. ID3, C4.5) was explored in
[65, 66] to find the best feature set to be used by the
induction system on difficult texture classification prob-
lems. William [72] designed a generic fitness function for
validation of input specification, and then used it to
develop GA wrapper for feature selection for decision tree
inducers. The effectiveness of GA for feature selection in
the automatic text summarisation task was investigated in
[63] where the decision tree was used as a fitness function.
With regard to the GA in the filter mode, it seems that
GA is more computationally attractive, as the computa-
tional time of the GA is quite high. Therefore, the
combination of GA and the ML algorithms is not so effi-
cient, particularly if we consider the run of the ML
algorithm is needed every time a chromosome in GA
population is evaluated. Some examples of using GA in the
filter mode include [47], in which MI measurement
between classes and features were used as evaluation
function. Based on the experimental results of handwritten
digit recognition, this method can reduce the number of
features needed in the recognition process without
impairing the performance of the classifier significantly. In
[56], GA was used for feature selection by minimising a
cost function derived from the correlation matrix between
the features and the activity of interest that is being mod-
eled. From a dataset with 160 features, GA selected a
feature subset (40 features) which built a better predictive
model than with full feature set. Another example is [10],
in which GA was used with MI to evolve a near optimal
input feature subset for neural networks. A fast filter GA
approach for feature selection which improves previous
results presented in the literature of feature selection was
described in [41].
As a hybrid approach for feature selection, GA was also
investigated. In [9], a hybrid approach based on GA and a
method based on class separability applied to the selection
of feature subsets for classification problems. This
approach is able to find compact feature subsets that give
the most accurate results, while beating the execution time
of some wrappers. A feature selection approach named
ReliefF-GA-Wrapper was proposed in [77] to combine the
advantages of filter and wrapper. In this approach, the
original features are evaluated by the Relief filter approach,
and the resulting estimation is embedded into the GA to
search optimal feature subset with the train accuracy of ML
256 Evol. Intel. (2009) 1:253–269
123
algorithm for handwritten Chinese characters dataset.
Additionally, in [17] a two stages feature selection
approach was proposed. The first stage employs MI to filter
out the least discriminant features, resulting in a reduced
feature space. Then a GA is applied to the reduced feature
space to further reduce its dimensionality and select the
best set of features.
An important fact with regard to the previous applica-
tions of GA for feature selection is that the fixed length
binary representation scheme is used to represent each
chromosome of the population as a feature subset. For an n-
dimension feature space, each chromosome is encoded by
an n-bit binary string b1…bn. bi = 1, if the i-th feature is
present in the feature subset represented by the chromo-
some and bi = 0 otherwise. Figure 4 is a hypothetical
chromosome represented using the fixed length binary
representation scheme of the standard GA.
The advantage of this representation is that the standard
GA is applied straightforward without any modification.
Unfortunately, the fixed length binary representation is
appropriate when the dimension of the feature space is not
high. If the feature space dimension is high, the computa-
tion resources required for GA evolution are very
expensive and the case worsens when only small numbers
of these features are relevant. Several attempts have been
made to adapt GA for feature selection in high dimensional
domains. For example, Moser and Murty [53] maintained
the use of fixed length binary representation but increase
the computational resources by using a distributed GA to
implement a wrapper genetic feature selection. Other
works [26, 31, 46] adapted GA by pre-specifying the
number of features that must be selected and representing
each chromosome by the indices of randomly selected
features rather than representing the presence or absence of
all features. While the former attempt is based on
increasing the computational resources, which are expen-
sive, the latter poses a simplification assumption which is
not true.
Based on that, we proposed a new approach to adapt GA
for feature selection in high dimensional domains in gen-
eral and cue phrase selection in particular. The proposed
approach is based on two intuitions: first, the chromosome
should represent the candidate features rather than the
presence or absence of each feature in the space. The
second intuition is that the number of the selected features
is specified during the selection and is not known a priori.
To realise that it turns out that the variable length and non-
binary representation are viable choices. However, choos-
ing variable length representation calls for modifying the
genetic operators or devising new operators to cope with
the new representation scheme as described in the sub-
sequent section.
3 Genetic-based cue phrase selection
In this section we describe the proposed genetic-based
approach for cue phrase selection in DAR. Due to the high
dimensionality of the phrase space, the standard GA is
adapted by using the variable length representation scheme,
modifying crossover and mutation operators, and propos-
ing a new genetic operator. Before diving into the details of
the proposed approach, two points are worth mentioning.
First, the proposed genetic-based approach for cue phrase
selection is another application of GA in natural language
processing domain. GA has been applied before to solve
the problem of word categorisation [40] and word seg-
mentation [36]. In [5], GA was used to construct a finite-
state automaton to deal with Russian and German pho-
netics. For grammar induction, GA was applied to evolve
grammars with a direct application in information retrieval
[48]. For dialogue systems, GA was incorporated into a
dialogue system to optimise the system’s parameter set-
tings [54]. A variant of Brill’s implementation that uses GA
to generate the instantiated rules and provide an adaptive
ranking for the part-of-speech problem was described in
[73]. A variety of GA, messy GA, was applied for part of
speech tagging problem in [3]. Even for feature selection in
natural language processing, GA was investigated for fea-
ture selection in automatic text summarisation [63] and text
categorisation [74].
The second point is that the idea of using variable length
representation in the context of evolutionary algorithms is
as old as the algorithms themselves. Fogel and Walsh [21]
seem to be one of the first experimenting with variable
length representation. In their work, they evolved finite
state machines of a varying number of states, therefore
making use of operators like addition and deletion. Holland
[30] proposed the concepts gene duplication and gene
deletion in order to raise the computational power of
evolutionary algorithms. Smith [64] departed from the
early fixed-length character strings by introducing variable
length strings, including strings whose elements were if-
then rules (rather than single characters). Since the first
attempts of using variable length representations, many
researchers made use of the idea under different motiva-
tions such as engineering applications [14, 15] or raising
the computational power of evolutionary algorithms [59].
With regard to GA as a paradigm of evolutionary
algorithms, the use of variable length representation has
Fig. 4 Fixed length binary chromosome for feature selection
Evol. Intel. (2009) 1:253–269 257
123
been proposed in several versions. Well known versions
are messy GA [24, 25], genetic programming [39], and
Species Adaptation GA [27]. These versions differ in the
specification of the representation scheme and conse-
quently the genetic operators. Messy GA uses binary
representation in which each gene is represented by a pair
of numbers that are the gene position and the gene value.
Messy GA uses the mutation operator as with standard GA.
Instead of crossover, messy GA uses the splice and cut
operators. The splice operator joins one chromosome to the
end of the other. The cut operator splits one chromosome
into two smaller chromosomes. Genetic programming is an
extension of GA with variable length representation
scheme in the form of hierarchical tree representing com-
puter program. The aim of genetic programming is to find
the best tree (computer program) that solves a given
problem. It adapts genetic operators of the standard GA to
cope with the tree representation scheme. Species Adap-
tation GA uses a variable length binary representation
scheme. It differs from the standard GAs subtly but sig-
nificantly. Evolution is directed by selection exploiting
differences in fitness causes by variations in the genetic
makeup of the population. While mutation operator in the
standard GA and genetic programming is considered as a
background operator, and crossover is usually assumed to
be the primary operator, in species adaptation GA the
reverse is true. Of these two genetic operators, mutation is
primary, and crossover, though useful, is secondary.
Besides that, researchers may opt to develop a domain-
specific version of variable length representation GA to
better meet the requirements of the domain, rather than
using the existing ones. For example, [76] investigated the
application of GA in the field of evolutionary electronics,
in which a special variable length GA was proposed to
cope with the main issues of variable length evolutionary
systems. Following this trend, we developed a special
version of GA specifically for cue phrase selection in DAR
and similar high dimensional domains as elaborated in the
following subsections.
3.1 Representation scheme
The representation scheme of the genetic-based approach
for cue phrase selection is based on variable length non
binary representation scheme, in which each chromosome
represents a candidate set of phrases. Figure 5 shows an
example of a chromosome represented using this scheme. It
is positional independent, meaning that the gene position
has no role in determining the aspects of the chromosome
at the phenotype level. This aspect is of particular impor-
tance for the designing of the genetic operators as
elaborated in the subsequent sections.
3.2 Phrases space mask
Technically speaking, GA explores the promising points in
the search space via genetic operations. Therefore, the
representation scheme and the genetic operators should
give rise to an effective exploration of the search space.
Using the proposed representation scheme directly does not
assist the genetic operators to explore new points in the
search space. Therefore, to ensure a good exploration of the
phrase space, we propose phrases space mask. The phrase
space mask is a binary string with length equal to the size
of phrase space, in which each bit marks the status of a
single phrase in the phrase space. Accordingly, the value 1
indicates that the phrase is being used by the current
population and the value 0 indicates that the phrase is not
in use. Figure 6 shows a fragment of the phrases space
mask adopted for cue phrase selection. It shows that the
phrases ‘‘the price’’, ‘‘who’’, and ‘‘information’’ are par-
ticipating in the current GA population, whereas the
phrases ‘‘good’’, ‘‘would you like’’, and ‘‘do you’’ are not.
As it will be described in the subsequent sections, the status
of the phrases in the phrases space mask is updated either
immediately, after performing the genetic operation, or
through a rebuilding step of the phrases space mask, which
takes place during the transition from generation t to gen-
eration t ? 1.
3.3 Fitness function
The fitness function is the driving force for the evolution in
GA. It performs the same role as the evaluation function in
feature selection. Since the general aim of feature selection
is to find a minimum number of features with maximum
relevance, the fitness function of a candidate set of phrases,
P, is a combination of two measures, namely relevance and
complexity as follows [53].
FðPÞ ¼ RelevðPÞ � pf � LðPÞN
ð1Þ
In the above formula, Relev(P) denotes the estimated
relevance of the phrases set P, and L(P) is a measure of the
complexity of the phrases set, usually the number of
utilised phrases. Furthermore, N is the phrases space
dimension, and pf is a punishment factor to weigh the
multiple objectives of the fitness function (in this research
pf = 1).
In general, the estimation of Relev(P) should be based
on the subsequent use of cue phrases by the ML algorithm.
Fig. 5 Variable length chromosome for cue phrase selection
258 Evol. Intel. (2009) 1:253–269
123
For cue phrase selection in the DBN context described
earlier, each variable works as a binary classifier for the
given DA, therefore Relev(P) can be estimated by the
following accuracy measure.
RelevðPÞ ¼ TPþ TN
NUð2Þ
where TP is the number of times the DNF, constituted from
the selected phrases P, returns true when the utterance
belongs to the target DA, TN is the number of times the
DNF returns false when the utterance does not belong to
the target DA, and NU is the total number of utterances in
the dialogue corpus.
3.4 Selection scheme
The selection scheme of the genetic-based approach for cue
phrase selection is (k, q) tournament selection. It,
randomly, chooses k chromosomes from the current pop-
ulation and with certain probability q returns the best
chromosome, otherwise returns the worst chromosome.
3.5 Genetic operators
The proposed genetic-based approach modifies the three
genetic operators of the standard GA to cope with the
variable length representation scheme. Furthermore, it
introduces a new operator called AlterLength.
3.5.1 Reproduction
The reproduction operator of the genetic-based approach is
similar to the reproduction operator of the standard GA.
With a reproduction probability, Pr, a chromosome is
randomly selected from the current generation, and copied
into the new generation without any modification.
3.5.2 Crossover
To cope with the variable length representation of the
proposed genetic-based approach, the uniform crossover of
the standard GA has been adapted. The uniform crossover
[51] is an operator that decides with a probability, Pc,
which parent will contribute to each of the gene values in
the offspring chromosomes. The uniform crossover has
been modified as follows. First two parents (chromosomes)
from the current population are selected. Then, with a
probability, 0.5, the length of the offspring is chosen to be
either the length of the short or long parent. If the length of
the short parent is chosen, then a uniform crossover is
performed between the short parent and an equal length
segment from the long parent. If the length of the long
Fig. 6 Fragment of phrase
space mask
Fig. 7 Example of the adapted uniform crossover
Evol. Intel. (2009) 1:253–269 259
123
parent is chosen, then a uniform crossover is performed
between the short parent and an equal length segment of
the long parent. The remaining parts of the long parent are
appended to the beginning and the end of the offspring.
Figure 7 is an example of the adapted uniform crossover.
3.5.3 Mutation
The proposed approach for mutation is to replace the val-
ues of some randomly selected genes with new values from
the phrase space which are not participating in the current
population. The mutation operator is applied with proba-
bility, Pm, to each chromosome generated from the
crossover operation. This operator is performed with the
assistance of the phrase space mask. More specifically, for
each gene in the chromosome, if it is selected for mutation
then it is replaced by a randomly selected phrase from the
phrases space, which has its status in the phrases space
mask marked inactive, and then the status of the selected
phrase in the phrase space mask is set to active immedi-
ately. With regard to the mutated phrase (gene), its status in
the phrases space is not set to inactive immediately,
because this phrase is still in use by the parents (members
of the current GA population). Setting the status of the
mutated gene to inactive is performed after the compilation
of all genetic operations on the current population and
during a rebuilding step of the phrases space mask. Fig-
ure 8 shows an example of mutation operator.
3.5.4 AlterLength
The crossover and mutation operators were designed to
introduce variation to the content of the chromosome. To
introduce a variation to the length of the chromosome, the
AlterLength operator is proposed. It randomly expands
(shrinks) the chromosome by inserting (deleting) a single
phrase into (from) it. In case of insertion, an inactive phrase
is randomly selected from the phrases space and inserted in
a randomly selected position of the chromosome. In case of
deletion the selected phrase is removed from the chromo-
some and its status in the phrases space mask remains
active until the rebuilding step of the phrases space mask.
The AlertLength operator is performed with a probability,
Pal, as shown in Fig. 9.
3.6 Stopping criteria and control parameters
The proposed genetic-based approach makes use of one of
the stopping criteria of the standard GA. It is to stop when
the evolution does not introduce any significant change. As
for the control parameters, the genetic-based approach uses
the following parameters: Population Size (PopSize),
tournament selection parameters (k, q), reproduction
Fig. 8 Example of mutation
operator
Fig. 9 Example of AlertLength
operator
260 Evol. Intel. (2009) 1:253–269
123
probability (Pr), crossover probability (Pc), mutation
probability (Pm), and AlterLength probability (Pal).
4 Results and discussions
To evaluate the performance of the genetic-based approach
on the selection of cue phrases, we conducted several
cases of experiments. These cases are named as follows:
baseline approaches, genetic-based approach, genetic-based
approaches with negative cue phrases, genetic-based
approach with cue’s positional information, and validation
experiments. Before diving into the details of the experi-
ments, a brief description of the used dialogue corpus and the
preprocessing steps applied to generate the phrase space is
useful.
4.1 Setting of the experiments
We used SCHISMA dialogue corpus which belongs to
SCHISMA project (SCHouwburg Informatie Systeem) at
University of Twente. It involves various activities to
realise theatre information and booking system [67]. The
dialogues are mixed-initiative, meaning that the initiative
may switch between the participants within a single dia-
logue. The task domain concerns with information
exchange and reservation transaction in theatre. Users are
able to make inquiries about theatre performances sched-
uled and if they desire, make ticket reservations. Figure 10
shows a fragment of SCHISMA dialogue corpus.
SCHISMA corpus was annotated with DAMSL coding
scheme [2]. In this process, each utterance is subdivided
into one or more segments, and the DAs are assigned to
each segment. In this research, we focus on the DAs given
in Table 1.
After annotating the corpus with DAs, the following
processes were performed to generate the phrases space.
1. Tokenisation: Tokenisation occurs at utterance level,
and the token is defined as a sequence of letters or
digits separated by separator (e.g.,’’.’’, ‘‘:’’, ‘‘;’’). In this
process, all punctuations are discarded except ‘‘?’’
which is treated as token.
2. Removing morphological variations: It has been
noticed that most morphological variations in
SCHISMA corpus are plurals and tenses variations
which are not relevant for the recognition process.
3. Semantic clustering: Clusters certain words into
semantic classes based on their semantic relatedness
and then replaces each occurrence of the words with
the cluster name. For SCHISMA corpus, the following
semantic clusters were identified:
(a) ShowName: Any show name appears in the
corpus.
(b) PlayerName: Any player name appears in the
corpus.
(c) Number: Any sequence of digits (0…9) or
alphabetic numbers(one, two, …)
(d) Day: Any occurrence of a day name (Monday,
Tuesday, …, Sunday)
(e) Month: Any occurrence of a month name (Jan-
uary, …, December)
(f) Price: A Number cluster preceded by currency
symbol (f, ff)
(g) Date: Any of the following sequences \Number
Month Number[, \Month Number[, \Number
Month[(h) Time: A Number cluster preceded by the prop-
osition ‘‘at’’
4. N-gram phrases generation: In this process all phrases
that consist of one, two, and three words were
generated from each utterance in the corpus.
5. Removing less frequent phrase: to reduce the dimen-
sion of the phrases space, the phrases occur less than a
frequency threshold number were removed. Based on
the experiments of [71], the chosen frequency thresh-
old was 3.
U: When is Sweeny Todd on? S: You can see 'Sweeny Todd' in the period December 28 to 30 U: What about under a blue roof? S: You can see 'Under a blue roof' in the 'Grote Zaal' on May U: Can I order a ticket for that S: Do you have a reduction card? U: I dont have a reduction card. U: Four tickets please.
Fig. 10 Fragment of SCHISMA dialogue corpus
Table 1 Experimented DA and their frequencies in SCHISMA
corpus
DA Meaning Frequency
Statement The speaker makes a claim about the
world
817
Query-if The speaker asks the hearer whether
something is the case or not
108
Query-ref The speaker asks the hearer for
information in the form of
references that satisfy some
specification given by the speaker
598
Positive-answer The speaker answer in positive 561
Negative-answer The speaker answer in negative 72
No-blf The utterance does not tagged with
any blf dialogue act
968
Evol. Intel. (2009) 1:253–269 261
123
The above preprocessing steps resulted in a phrases
space of 1,336 phrases. This phrase space was used in the
first two cases of experiments. However, in the subsequent
cases further preprocessing steps were introduced which
make the phrase space for each DA has different size.
4.2 Baseline approaches
The baseline approaches are a selected set of ranking
approaches, which are always applied for cue phrase
selection. As representatives of these approaches, we
choose the approaches shown in Table 2. Each approach
has a metric expressed in the corresponding formula. In
each formula, f denotes the feature (phrase) and c denotes
the class (DA). Each of these approaches was experimented
on the selection of cue phrases for each DA. More spe-
cifically, for each DA, each ranking approach ranked the
phrases using its own metric. Then, the fitness value, F(P),
along with relevance value, Relev(P), and complexity
value, L(P)/N, of each k phrases (k = 1,2,…n) in the
ranked list were calculated and the top k phrases which
give the maximum fitness value, F(P), is the selected set of
phrases for that DA as shown in Table 3.
The results of the baseline approach experiment are
given in Table 3. From these results, it is clear that there is
a similarity between the performance of MI and OR from
one side and the performance of IG and v2 from the other
side in three aspects. First, from the complexity values,
L(P)/N, it is clear that MI and OR tends to select larger
number of phrases than IG and v2. Second, the Relev(P)
values of MI and OR are higher than IG and v2. Third, as a
direct result of the similarity in Relev(P) and L(P)/N values
within each group,(MI, OR) and (IG, v2), the pattern of the
fitness values is similar within each group, though, between
the two groups the comparison of fitness values are not
conclusive .
The similarity between the two groups, (MI, OR) and
(IG, v2), can be understood through the following facts. For
each DA, each phrase has two sides, positive and negative.
The positive side depends on the presence of the phrase in
the utterances labeled with the target DA and the absence
of the phrase from the utterances labeled with other DAs.
The negative side depends on the absence of the phrase
from the utterances labeled with the target DA and the
presence of the phrase in the utterances labeled with other
DAs. Based on that, the ranking approach is classified as
either one-sided metric or two-sided metric depending on
whether it’s metric account for the negative side of the
phrase or not [79]. One-sided metrics rank the phrases
according to their positive side; therefore the top k phrases
in the ranked list are phrases with the highest positive sides
and, definitely, the lowest negative sides. With regard to
the two-sided metrics, they rank the phrases according to a
combination of both positive and negative sides. Therefore
the top k phrases in the list are phrases with the highest
negative or positive sides.
From Table 2, it is clear that MI and OR are one-sided
metrics, whereas IG and v2 are two-sided metrics. It is also
obvious that the fitness measure, Eq. 1, which was used for
the selection of cue phrases from the ranked list, has its
Relev(P) subpart depends on the positive side of the
phrases rather than the negative side. Therefore, the rank-
ing of the one-sided metrics is more appropriate for the
fitness measure than the two-sided metrics. This can
interpret the higher Relev(P) values of the cue phrases
selected by MI and OR. However, the inability of these
approaches to account for the correlation between cue
phrases leads to the selection of large number of cues
phrases in case of MI and OR. In other words, the ranking
approaches assume that the relevance of a set of phrases is
equal to the summation of the individual relevance of each
phrase which leads to redundant selection.
Table 2 Experimented ranking approaches
Metric Formula
MI MIðf ; cÞ ¼ logPðf ;cÞ
Pðf Þ:PðcÞ
OR ORðf ; cÞ ¼ Pðf=cÞ:ð1�Pðf=�cÞð1�Pðf=cÞÞ:Pðf=�cÞ
IG IGðf ; cÞ ¼P
c2fc;cgP
f2ff ;fg pðf ; cÞ:logPðf ;cÞ
Pðf Þ:PðcÞ
v2 v2ðf ; cÞ ¼ N�½Pðf ;cÞ:Pðf cÞ�Pðf ;cÞ:Pðf ;cÞ�Pðf Þ:Pðf Þ:PðcÞ:PðcÞ
2
Table 3 Results of the ranking approaches experiments
MI IG v2 OR
Relev(P) L(P)/N F(P) Relev(P) L(P)/N F(P) Relev(P) L(P)/N F(P) Relev(P) L(P)/N F(P)
Statement 0.8403 0.1055 0.7347 0.6385 0.0052 0.6385 0.6619 0.0007 0.6612 0.7675 0.0509 0.7166
Query-if 0.9634 0.0045 0.9589 0.9580 0.0007 0.9580 0.9580 0.0007 0.9572 0.9599 0.0030 0.9569
Query-ref 0.8505 0.0404 0.8101 0.8149 0.0060 0.8149 0.8149 0.0060 0.8089 0.8803 0.0412 0.8391
Positive-answer 0.8217 0.0636 0.7581 0.7333 0.0015 0.7333 0.7860 0.0007 0.7853 0.8105 0.0464 0.7640
Negative-answer 0.9687 0.0015 0.9672 0.9595 0.0015 0.9595 0.9595 0.0015 0.9580 0.9687 0.0022 0.9665
No-blf 0.8036 0.1198 0.6839 0.7333 0.0015 0.7333 0.7333 0.0015 0.7318 0.7851 0.0786 0.7065
262 Evol. Intel. (2009) 1:253–269
123
The general conclusion that can be drawn from this case
of experiments is that the ranking approaches are not able
to maintain a tradeoff between the two subparts of the
fitness functions. They tend to optimise one subpart at the
expense of the other.
4.3 Genetic-based approach
The aim of this case of experiments is to evaluate the
genetic-based approach on the selection of cues phrases for
each DA given in Table 1. The settings of the control
parameters are as follows PopSize = 500, q = 10, k = 0.7,
r = 0.3, Pc = 0.7, Pr = 0.1, Pal = 0.2, Pm = 0.1, and the
stopping criterion is to stop if there is no significant
improvement within 10 generations. Table 4 summarises
the results obtained from this case of experiments. It should
be mentioned that in each genetic-based approach experi-
mental cases, GA was run five times and the results of the
best run were reported.
Figure 11 is an example of the GA evolution during the
selection of the cue phrases for statement DA. The curves
correspond to the best evolutionary trends. In general, it
can be noticed that there is a rapid growth at the early
generations followed by a long period of slow evolution
until meeting the stopping criterion. This reflects the nature
of the search space of cue phrases which is hugely multi-
modal and contains a lot of peaks. An interesting aspect of
the average population fitness curve is that despite the
fluctuations, an overall look at the curve shows a general
tendency of improvement, particularly at the early
generations.
A comparative look at the result of Tables 3 and 4
shows that the genetic-based approach outperforms the
ranking approaches for cue phrase selection. It is obvious
from the difference of the fitness values, F(P). It is also
clear that the relevance values, Relev(P), of the genetic-
based approach are higher than the Relev(P) values of all
the ranking approaches. With regard to the complexity of
the selected cue phrases, L(P)/N, obviously the genetic-
based approach tends to select smaller number of phrases
than MI and OR ranking approaches, yet larger than IG and
v2.
From the above observations, it can be concluded that
the genetic-based approach manage to maintain a tradeoff
between the relevance and complexity, therefore achieve
higher fitness values. There are two reasons behind that.
First, in the genetic-based approach, the evaluation and the
selection processes are based on the fitness measure which
depend on the subsequent use of the selected phrases.
Conversely, in the ranking approach the evaluation of the
phrases is based on the intrinsic properties of the phrases,
whereas the selection depends on the fitness measure. The
second reason is the ability of the genetic-based approach
to account for the correlation between the selected cue
phrases. Unlike the ranking approaches, the genetic-based
approach evaluates the selected cue phrases as whole rather
than evaluating each phrase individually and assuming the
relevance of the phrases set is equal to the summation of
the individual relevance of each phrase, which leads to
redundant selection.
To confirm the above conclusion using statistical
inference tools, a paired t test of the statistical significance
of the difference between the F(P) values for both MI
ranking approach and genetic-based approach was con-
ducted at level P \ 0.05 and 5 degree of freedom. The
obtained t value (t = 3.1123) shows that the difference is
statistically significant.
4.4 Genetic-based approach with negative cue phrases
It has been pointed out earlier that for each DA, each
phrase has two sides, positive and negative and according
to that the phrase is classified either positive or negative
depending on the dominant side. An efficient way to
exploit negative phrases, which was described in [79], is to
select positive and negative phrases independently based
on their subsequent use. For cue phrase selection in DAR,
the positive phrases can be used to indicate the membership
of an instance to the target DA and the negative phrases can
be used to help in increasing the relevance of the positive
phrases by confidently rejecting instances which do not
belong to the target DA, yet contain the positive phrases.
For example, in SCHISMA dialogue corpus, the positive
phrase ‘‘ticket’’ is relevant for both the statement and
query-ref DAs. To increase the relevance of this cue phrase
for the statement DA, negative cue phrases such as ‘‘how
much’’ and ‘‘?’’, which are relevant to the query-ref DA,
yet not to the statement DA, might be selected and con-
juncted with the ‘‘ticket’’ to accept only the utterances that
contain ‘‘ticket’’ and does not contain ‘‘how much’’ and
‘‘?’’.
Unfortunately, the ranking approaches do not help in
exploiting the negative phrases efficiently. More specifi-
cally, one sided metrics rank the phrases according to their
positive side and ignore the selection of the negative
Table 4 Results of the genetic-based approach experiments
DA Relev(P) L(P)/N F(P)
Statement 0.8710 0.0322 0.8388
Query-if 0.9643 0.0037 0.9606
Query-ref 0.9033 0.0060 0.8973
Positive-answer 0.8525 0.0195 0.8330
Negative-answer 0.9692 0.0015 0.9677
No-blf 0.8114 0.0165 0.7950
Evol. Intel. (2009) 1:253–269 263
123
phrases. Moreover, two sided metrics rank phrases
according to the combination of their negative and positive
sides. To assess the ability of the genetic-based approach in
exploiting the negative cue phrases, we conducted this case
of experiments in which the aim is to select positive and
negative cue phrases to satisfy the following DNF
expression.
DNF ¼ if pp1 ^ !np1 ^ � � � ^ !npk1ð Þð_ � � � _ ppj ^ !npj ^ � � � ^ !npkj
� �
_ � � � _ ppm ^ !npm ^ � � � ^ !npkmð ÞÞ then DA
where ppj is a positive cue phrase and npj…npkj are neg-
ative phrases associated with ppi.
To account for the negative cue phrases, each phrase
occurs within the utterance that belongs to the target DA is
marked positive, and each phrase occurs within the utter-
ance that does not belong to the target DA is marked
negative. It could happen that some phrases occur in
utterances that are labeled with the target DA and in
utterances not labeled with the target DA, hence it might be
possible to find tow identical phrases marked negative and
positive. Table 5 summarises the results obtained from this
case of experiment. Figure 12 is a sample of GA evolution
for statement DA in this case of experiments.
It is obvious from the above results that there is an
improvement in the Relev(P) values at the expense of L(P)/
N. Definitely the inclusion of negative cue phrases is the
direct reason behind this improvement. The negative cue
phrases increase the search space by different values for
each DA as shown in Table 5, and consequently alleviates
the constraint imposed by the complexity subpart, L(P)/N,
of the fitness function on the number of the selected
phrases. As a result of this, more relevant phrases are
included in the selection and ultimately increases the
Relev(P) value. The test of the statistical significance of the
differences between Relev(P) values and L(P)/N values in
Tables 4 and 5 confirms the above conclusion. Using
paired t test at level p \ 0.05) with 5 degree of freedom,
the obtained t values, 3.2618 and 2.7288, show that they
are statistically significant. With regards to the fitness
values, F(P), the difference between the corresponding
values in Tables 4 and 5 is not conclusive. This is con-
firmed by the analysis of the statistical significance of the
difference between them using paired t test (at level
0.6
0.7
0.8
0.9
1
1 21 41 61 81 101 121 141 161 181
Last Gen. No =183
Informativeness 1 - Complexity Fitness Avrg. Pop. Fitness
Fig. 11 GA evolution of cue phrases selection for statement DA
Table 5 Results of the genetic-based approach with negative cue
phrases experiments
DA Phrase space
size
Relev(P) L(P)/N F(P)
Statement 2,000 0.8896 0.0360 0.8536
Query-if 1,594 0.9682 0.0088 0.9595
Query-ref 1,848 0.9174 0.0319 0.8855
Positive-answer 2,115 0.8696 0.0457 0.8239
Negative-answer 1,542 0.9673 0.0007 0.9665
No-blf 2,227 0.8324 0.0382 0.7943
264 Evol. Intel. (2009) 1:253–269
123
p \ 0.05 and 5 degree of freedom), which shows that they
are not statistically significant (the obtained t value is
0.4041). Recalling that F(P) is a combination of two con-
tradicted objectives, relevance and complexity, it is
obvious that the improvement of the Relev(P), which
results from the incorporation of the negative cue phrases is
affected by the increase of complexity of the selected cue
phrases.
On the other hands, a comparison between the results of
Tables 3 and 5 confirms the superiority of the genetic-
based approach over the ranking approaches. Despite the
slight increase of the L(P)/N values, they are still low if
compared with the corresponding values of the ranking
approaches. Definitely the ability of the genetic-based
approach to account for the correlation between the
selected phrases and the inclusion of negative phrases are
the direct reason behind the better achievement of the
genetic-based approach. This conclusion has been con-
firmed by the analysis of the statistical significance of the
difference between the corresponding values of F(P) in
Table 3 (MI) and Table 5 using paired t test at level
p \ 0.05 and 5 degree of freedom. The obtained t values
(t = 2.9127) shows that the difference is statistically
significant.
4.5 Genetic-based approach with cue’s positional
information
The information on phrase’s position within an utterance is
useful to increase its relevance for a given DA. We con-
ducted this case of experiment to investigate the ability of
the genetic-based approach to select cue phrases after
incorporating the phrase’s positional information. To do
that, each positive phrase in the phrases space was marked
with one of three possible positional labels, which repre-
sent the position of the phrase within the utterance. These
labels are: Begin, if the phrase occurs at the beginning of
any utterance labeled with the target DA, End, if the phrase
occurs at the end of any utterance labeled with the target
DA, and Contain, if the phrase occurs elsewhere. It might
happen that certain phrase occurs in different positions
within the utterance. In this case, multiple instances of this
phrase, each with different label, are created. Conse-
quently, each DA has a different size phrases space as
shown in Table 6. The genetic-based approach was applied
with the same parameters specified in the previous cases
and the results are shown in Table 6. Figure 13 is a sample
of GA evolution for statement DA.
It appears from the results in Table 6 that there is an
improvement in the fitness values, F(P), of the selected
0.5
0.6
0.7
0.8
0.9
1
201151101511
Last Gen. No. =224
Informativeness 1 - Complexity Fitness Avrg. Pop. Fitness
Fig. 12 GA evolution of cue phrases selection for statement
Table 6 Results of genetic-based approach with cue’s positional
information experiments
DA Phrase space size Relev(P) L(P)/N F(P)
Statement 2,329 0.9003 0.0369 0.8634
Query-if 1,625 0.9805 0.0086 0.9718
Query-ref 2,047 0.9262 0.0259 0.9003
Positive-answer 2,377 0.9008 0.0366 0.8642
Negative-answer 1,581 0.9888 0.0089 0.9799
No-blf 2,354 0.8447 0.0297 0.8149
Evol. Intel. (2009) 1:253–269 265
123
cues for each DA. This can be attributed to the improve-
ment of the corresponding Relev(P) values after the
incorporation of the cue’s positional information. In terms
of complexity, L(P)/N, there is a slight decrease in its
values for some DAs, however, there are cases where the
L(P)/N values are similar or even better. For instance, in
positive_answer DA, there is an obvious improvement in
both values, Relev(P) and L(P)/N. This is empirical evi-
dence on the ability of the genetic-based approach and on
the role of the positional information. The analysis of the
statistical significance of the difference between the fitness
values, F(P), of Tables 5 and 6 using paired t test at level
p \ 0.05 and 5 degree of freedom confirms this conclusion.
The obtained t value (t = 4.0410) shows that the difference
is very statistically significant.
Finally, to confirm the findings obtained from the four
experimental cases, a repeated measure ANOVA, with
Greenhouse-Geisser correction, was conducted to assess
whether there are differences between the fitness values,
F(P). Results indicated that the values are statistically
different, F (1.095, 5.475) = 10.580, p = 0.019, g2 = 0.68.
On the other hand, the polynomial contrasts test indicates
that there is a significant linear trend, F(1,5) = 12.597,
p = 0.016 \ 0.001, g2 = 0.72. However, this finding was
qualified by the significant cubic trend, F(1,5) = 13.729,
p = 0.0141, g2 = 0.73, reflecting the lower fitness of case
3 than case 2. Overall, there is a linear in the fitness values
from case 1 to case 4 reflecting the improvement of the
fitness values. However, case 2 has a somewhat higher
mean than case 3 producing the cubic trend.
4.6 Validation experiments
The aim of this case of experiments is to validate the use of
the proposed genetic-based approach for the ML applica-
tions. More specifically, the cues phrases generated from
the above experiments were used to build DBN model for
DAR. The hypothesis is that the better selection of cue
phrases, the more accurate DAR. First the sets of cue
phrases generated by the genetic-based approach in each of
the previous cases were used to specify the DBN random
variables as described earlier, so that each random variable
is a binary classifier for a single DA. Then DBNs ML
algorithms were used with 10-fold cross-validation to
construct the structure of the DBN model, assess its
parameter, and estimate its recognition accuracy using
probabilistic networks library [32], which is freely avail-
able from http://www.intel.com/research/mrl/pnl. The
same experiment was repeated using the sets of cue phrases
0.5
0.6
0.7
0.8
0.9
1
251201151101511
Last Gen. No. =275
Predictivity Aggressivity Fitness Avrg. Pop. Fitness
Fig. 13 GA evolution for cue phrases selection for Statement DA
Table 7 Accuracies of the DAR models
Selection approach Recognition
accuracy
Min. Max. Avrg.
Ranking approach (MI) 76.74 78.48 77.72
Genetic-based approach 76.89 79.19 78.34
Genetic-based with negative cues 77.72 79.89 78.48
Genetic-based with cue’s positional information 77.61 81.09 79.67
266 Evol. Intel. (2009) 1:253–269
123
generated by MI and the results of these experiments are
summarised in Table 7.
From the above results, it is clear that there are differ-
ences between the final recognition accuracies of the DBN
models of DAR. Although the differences between these
values are limited to 1 and 2 percentage points, they are
adequate to evidence the efficiency of the genetic-based
approach over the ranking approaches. This conclusion can
be interpreted by the fact that the effect of the feature
selection approach on the performance of ML model is
highly dependant on the interaction between feature
selection approach and the ML algorithm. This interaction
can be specified by the awareness of the feature selection
approach of what features are relevant to the ML algorithm
and how sensitive is the ML algorithm to the existence of
redundant features. Recalling that, both MI and genetic-
based approach are filter approaches, and one aspect of the
filter approach is that selection of the features is performed
based on the intrinsic properties of the data and indepen-
dently of the ML algorithm, the above results are expected.
In other words, because the selection of cue phrases was
performed independently of the DBN ML algorithm, it is
not expected that differences between the recognition
accuracies of DBN models will reflect the efficiency of the
different approaches accurately. In addition to that, as it has
pointed out before, the main advantage of genetic-based
approach over the ranking approach is their ability to
remove redundant features; however ML algorithms vary
on their sensitivity to the redundant features [13]. In the
above experiments it seems that DBN is not highly sensi-
tive to the redundant features.
Finally, to understand the influence of the cues phrase
selection approaches on the recognition accuracy, it should
be borne into mind that the construction of the DBNs
models of DAR is based on the binary representation of the
datasets, which result from the extraction of the random
variables’ values from the utterances. In this representation,
utterances that belong to a certain DA should have a distinct
pattern, ideally composed of n - 1 bit with 0s values and a
single bit with 1 value (n is the number of random variables)
that corresponds to the random variables of this DA. It is
obvious also that the quality of the representation depends
on the relevance of the selected cues phrases that form the
random variables. In other words, the better cues selection
approach, the better data representation, and consequently
the better constructed DBNs models.
5 Conclusion
In this paper, a genetic-based approach for cue phrases
selection in the context of DAR was introduced. The pro-
posed approach is a variable length GA with specialized
genetic operator developed specifically for this task. Sev-
eral cases of experiment were conducted and the obtained
results suggest a number of important conclusions. Firstly,
the results confirm that the ranking approaches are not the
optimal for cues phrases selection in DAR and similar high
dimensional domains. The selection in these approaches is
independent of the subsequent use and they are not able to
account for the correlation between the selected features.
Secondly, the results of the proposed genetic-based
approach evidences its ability to account for the correlation
between the candidate cues, which enables it to select a
minimal number of relevant phrases. It is apparent from the
high reduction of the number of the selected cues. Thirdly,
in contrast to the ranking approaches, the proposed genetic-
based approach shows its ability to exploit the negative
phrases to increase the relevance of the selected cue
phrases. Fourthly, the results confirm that the cue’s posi-
tional information is useful to improve the relevance of the
selected cue phrases. In general the proposed genetic-based
approach has proved its efficiency for the selection of
useful cue phrases for DAR. Finally, although the genetic-
based approach was applied for cue phrase selection, it can
be applied, similarly, for feature selection in any similar
high dimensional domains.
References
1. Ali A, Mahmod R, Ahmad F, Sullaiman N (2006) Dynamic
bayesian networks for intention recognition in conversational
agent. In: Proceedings of the 3rd international conference on arti-
ficial intelligence in engineering and technology (iCAiET2006).
Universiti Malaysia Sabah, Sabah, Malaysia
2. Allen J, Core M (1997) Draft of DAMSL: dialog act markup in
several layers. The Multiparty Discourse Group, University of
Rochester, Rochester, USA. Available from http://www.cs.
rochester.edu/research/cisd/resources/damsl
3. Araujo L (2002) Part-of-speech tagging with evolutionary algo-
rithms. In: Proceedings of the international conference on
intelligent text processing and computational linguistics, lecture
notes in computer science, vol 2276. Springer-Verlag, Berlin, pp
230–239
4. Austin JL (1962) How to do things with words. Oxford Univer-
sity Press, Oxford
5. Belz A, Eskikaya B (1998). A genetic algorithm for finite-state
automata induction with an application to phonotactics. In: Pro-
ceedings of ESSLLI-98 workshop on automated acquisition of
syntax and parsing
6. Blum A, Langley P (1997) Selection of relevant features and
examples in machine learning. Artif Intell 97:245–271
7. Bunt H (1994) Context and dialogue control. Think 3(1):19–31
8. Caballero RE, Estevez PA (1998) A niching genetic algorithm for
selecting features for neural network classifiers. In: Proceedings
of the 8th international conference of artificial neural networks.
Springer-Verlag, pp 311–316
9. Cantu-Paz E (2004) Feature subset selection, class separability,
and genetic algorithms. In: Proceedings of genetic and evolu-
tionary computation conference-GECCO 2004, Deb K, (ed) et al.
pp 959–970
Evol. Intel. (2009) 1:253–269 267
123
10. Chunkai K, Zhang HH (2005), An effective feature selection
scheme via genetic algorithm using mutual information. In:
Proceedings of 2nd international conference on fuzzy systems
and knowledge discovery, pp 73–80
11. Dash M, Liu H (1997) Feature selection for classification. Int
Data Anal Int J 1(3):131–156
12. Dash M, Liu H (2003) Consistency-based search in feature
selection. Artif Intell 151(1/2):155–176
13. David WA (1992) Tolerating noisy, irrelevant and novel attri-
butes in instance-based learning algorithms. Int J Man Mach Stud
36(1):267–287
14. Davidor Y (1991) A genetic algorithm applied to robot trajectory
generation. In: Davis L (ed) Handbook of genetic algorithms. Van
Nostrand Reinhold, pp 144–165
15. Davidor Y (1991) Genetic algorithms and robotics: a heuristic
strategy for optimisation, vol 1 of World Scientic Series in
robotics and automated systems World Scientific
16. Eads D, Hill D, Davis S, Perkins S, Ma J, Porter R, Theiler J
(2002) Genetic algorithms and support vector machines for time
series classification. In: Proceedings of 5th conference on the
application and science of neural networks, fuzzy systems and
evolutionary computation, symposium on optical science and
technology of the 2002 SPIE annual meeting, pp 74–85
17. Fatourechi M, Birch GE, Ward RK (2007) Application of a
hybrid wavelet feature selection method in the design of a self-
paced brain interface system. J Neuroeng Rehabil 4:11
18. Filho B (2000) Feature selection from huge feature sets in the
context of computer vision. Master’s thesis, Colorado State
University Fort Collins, Colorado
19. Fishel M (2007) Machine learning techniques in dialogue act
recognition. In: Proceedings of estonian papers in applied lin-
guistics 3, pp 117–134
20. Fleuret F (2004) Fast binary feature selection with conditional
mutual information. J Mach Learn Res 5:1531–1555
21. Fogel LJ, Owens AJ, Walsh MJ (1966) Artificial intelligence
through simulated evolution. Wiley, New York
22. Frohlich H, Chapelle O, Scholkopf B (2004) Feature selection for
support vector machines using genetic algorithms. Int J Artif
Intell Tools 13(4):791–800
23. Goldberg DE (1989) Genetic algorithms in search, optimisation,
and machine learning. Addison-Wesley, New York
24. Goldberg DE, Korb B, Deb K (1990) Messy genetic algorithms:
motivation, analysis, and first results. Complex Syst 3:493–530
25. Goldberg DE, Deb K, Korb B (1990) Messy genetic algorithms
revisited: studies in mixed size and scale. Complex Syst
4(4):415–444
26. Gonzalo V, Sanchez-Ferrero J, Arribas I (2007) A statistical-
genetic algorithm to select the most significant features in
mammograms. In: Proceedings of the 12th international confer-
ence on computer analysis of images and patterns, pp 189–196
27. Harvey I (1995) The artificial evolution of adaptive behaviour, D.
Phil. thesis, School of cognitive and computing sciences. Uni-
versity of Sussex
28. Haupt RL, Haupt SE (2004) Practical genetic algorithms, 2nd
edn. Wiley, New York
29. Hirschberg J, Litman D (1993) Empirical studies on the disam-
biguation of cue phrases. Comput Linguist 19(3):501–530
30. Holland JH (1975) Adaptation in natural and artificial systems.
The University of Michigan Press, Ann Arbor
31. Hong JH, Cho SB (2006) Efficient huge-scale feature selection with
speciated genetic algorithm. Pattern Recognit Lett 2(27):143–150
32. Intel Corporation (2004). Probabilistic network library—user
guide and reference manual
33. Jurafsky D (2004) Pragmatics and computational linguistics. In:
Horn L, Ward G (eds) The Handbook of pragmatics. Oxford,
Blackwell, pp 578–604
34. Jurafsky D, Shriberg E, Fox B, Traci C (1998) Lexical, prosodic,
and syntactic cues for dialog acts. In: Proceedings of ACL/coling
‘98 workshop on discourse relations and discourse markers,
Montreal, Quebec, Canada pp 114–120
35. Kats H (2006) Classification of user utterances in question
answering dialogues. Master’s thesis, University of Twente,
Netherlands
36. Kazakov D (1998) Genetic algorithms and MDL bias for word
segmentation. In: Proceeding of ESSLLI-97
37. Kelly JD, Davis L (1991) Hybridising the genetic algorithm and
the K nearest neighbors classification algorithm. In: ICGA pp
377–383
38. Kohavi R, John GH (1997) Wrappers for feature subset selection.
Artif Intell J 97(1/2):273–324
39. Koza JR (1992) Genetic programming: on the programming of
computers by means of natural selection. MIT Press, Cambridge
40. Lankhorst MM (1994) Automatic word categorisation with
genetic algorithms: computer science report CS-R9405. Univer-
sity of Groningen, The Netherlands
41. Lanzi P (1997) Fast feature selection with genetic algorithms: a
filter approach. In: Proceedings of IEEE international conference
on evolutionary computation, pp 537–540
42. Lesch S (2005) Classification of multidimensional dialogue acts
using maximum entropy. Diploma Thesis, Saarland University,
Postfach 151150, D-66041 Saarbrucken, Germany
43. Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene
selection for sample classification based on gene expression data:
study of sensitivity to choice of parameters of the GA/KNN
method. Bioinformatics 17:1131–1142
44. Liu H, Motoda H (1998) Feature selection for knowledge dis-
covery and data mining. Kluwer, Boston
45. Liu H, Yu L (2005) Toward integrating feature selection algo-
rithms for classification and clustering. IEEE Trans Knowl Data
Eng 17:491–502
46. Liu JJ, Cutler G, Li W, Pan Z, Peng S, Hoey T, Chen L, Ling X
(2005) Multiclass cancer classification and biomarker discovery
using GA-based algorithms. Bioinformatics 21:2691–2697
47. Liu W, Wang M, Zhong Y (1995) Selecting features with genetic
algorithm in handwritten digits recognition. In: Proceedings of
the international IEEE conference on evolutionary computation,
pp 396–399
48. Losee RM (1995) Learning syntactic rules and tags with genetic
algorithms for information retrieval and filtering: an empirical
basis for grammatical rules. Inf Process Manag 32:185–197
49. Lu J, Zhao T, Zhang Y (2008) Feature selection based-on genetic
algorithm for image annotation. Knowl Based Syst 21(8):887–891
50. Manning C, Schutze H (1999) Foundation of statistical natural
language processing. MIT Press, Cambridge
51. Mitchell M (1996) An introduction to genetic algorithms. MIT
Press, Cambridge
52. Morariu D, Vintan L, Tresp V, (2006) Evolutionary feature
selection for text documents using the SVM. In: Proceedings of
3rd international conference on machine learning and pattern
recognition (MLPR 2006), ISSN 1305–5313 vol 15, pp 215–221,
Barcelona
53. Moser A, Murty M. (2000) On the scalability of genetic algo-
rithms to very large-scale feature selection. EvoWorkshops, pp
77–86
54. Nettleton DJ, Gargliano R (1994). Evolutionary algorithms and
dialogue. In: Practical handbook of genetic algorithms. CRC
Press, New York
55. Oakes M (1997) Statistics for corpus linguistics. Edinburgh
University Press, Edinburgh
56. Ozdemir M, Embrechts MJ, Arciniegas F, Breneman CM, Lock-
wood L, Bennett KP (2001) Feature selection for in-silico drug
design using genetic algorithms and neural networks. IEEE
268 Evol. Intel. (2009) 1:253–269
123
mountain workshop on soft computing in industrial applications,
pp 53–57
57. Punch WF, Goodman ED, Pei M, Chia-Shun L, Hovland P, En-
body R (1993), Further research on feature selection and
classification using genetic algorithms. In: Proceedings of the
fifth international conference on genetic algorithms, champaign,
Ill: pp 557–564
58. Samuel K, Carberry S, Vijay-Shanker K (1999) Automatically
selecting useful phrases for dialogue act tagging, In: Proceedings
of PACLING ‘99 (fourth Conference of the Pacific Association
for Computational Linguistics). Waterloo, Ontario, Canada
59. Schutz M (1997) Other operators: gene duplication and deletion.
In: Back TH, Fogel DB, Michalewicz Z Hrsg., Handbook of
evolutionary computation C3.4:8–15. Oxford University Press,
New York, und Institute of Physics Publishing, Bristol
60. Searle JR (1975) A taxonomy of illocutionary acts. In: Gunderson
K, (eds), Language, mind and knowledge, Minnesota studies in the
philosophy of science. University of Minnesota Press 7:344–369
61. Sebastiani F (2002) Machine learning in automated text cate-
gorisation. ACM Comput Surv 34(1):1–47
62. Siedlecki W, Sklansky J (1988) On automatic feature selection.
Int J Patt Recognit Artif Intell 2(2):197–220
63. Silla CN, Pappa GL, Freitas AA, Kaestner CAA (2004) Auto-
matic text summarisation with genetic algorithm-based attribute
selection. In: IX IBERAMIA—Ibero-American conference on
artificial intelligence, Puebla
64. Smith SF (1980) A learning system based on genetic adaptive
algorithms. Ph. D. Thesis. University of Pittsburgh, PA, USA
65. Vafaie H, De Jong K (1995) Genetic algorithms as a tool for
restructuring feature space representations. In: Proceedings of the
seventh international conference on tools with artificial intelli-
gence. Henidon
66. Vafaie H, De Jong K (1992) Genetic algorithms as a tool for feature
selection in machine learning. In: Proceeding of the 4th interna-
tional conference on tools with artificial intelligence, Arlington
67. Van de Burgt SP, Schaake J, Nijholt A (1995) Language analysis
for dialogue management in theatre information and booking
system, language engineering, AI 95, 15th international confer-
ence, Montpellier, pp 351–362
68. Verbree AT, Rienks RJ, Heylen DKJ (2006) Dialogue act tagging
using smart feature selection: results on multiple corpora. In:
Proceedings of the first international IEEE workshop on spoken
language technology SLT, pp 10–13
69. Vose MD (1999) The simple genetic algorithm: foundation and
theory. MIT Press, Cambridge
70. Webb N, Hepple M, Wilks Y (2005) Dialogue act classification
based on Intra-utterance features’’. In: Proceedings of the AAAI 05
71. Webb N, Hepple M, Wilks Y (2005) Empirical determination of
thresholds for optimal dialogue act classification. In: Proceeding of
the ninth workshop on the semantics and pragmatics of dialogue
72. William H (2004) Genetic wrappers for feature selection in
decision tree induction and variable ordering in Bayesian network
structure learning. Inf Sci Int J 163(1–3):103–122
73. Wilson, GC, Heywood MI (2005) Use of a genetic algorithm in
brill’s transformation-based part-of-speech tagger. In: Proceed-
ings of the genetic and evolutionary computation conference
(GECCO 2005), June 25–29, 2005, Washington, DC, USA, ACM
Press, ISBN 1-59593-010-8, pp 2067–2073
74. Yang J, Honavar V (1998) Feature subset selection using a
genetic algorithm. IEEE Intell Syst 13(2):44–49
75. Yu E, Cho S (2003) GA-SVM wrapper approach for feature
subset selection in keystroke dynamics identity verification. In:
Proceedings of the IEEE international joint conference on neural
networks 3:2253–2257
76. Zebulum RS, Pacheco MA, Vellasco M (2000) Variable length
representation in evolutionary electronics. Evol Comput J
8(1):93–120
77. Zhang L, Wang J, Zhao Y, Yang Z (2003) A novel hybrid feature
selection algorithm: using Relief estimation for GA-wrapper
search. In: Proceedings of IEEE international conference on
machine learning and cybernetics
78. Zhang P, Verma B, Kumar K (2004) A neural-genetic algorithm
for feature selection and breast abnormality classification in
digital mammography. In: Proceedings of IEEE international
joint conference on neural networks, vol 3, pp 2303–2308
79. Zheng Z, Wu X, Srihari R (2004) Feature selection for text cat-
egorization on imbalanced data. SIGKDD 6(1):80–89
80. Zhuo L, Zheng J,Wang F, Li X, Ai B, Qian J, (2008) A genetic
algorithm based wrapper feature selection method for classifica-
tion of hyperspectral images using support vector machine. The
international archives of the photogrammetry, remote sensing and
spatial information sciences. vol XXXVII. Part B7. Beijing
Evol. Intel. (2009) 1:253–269 269
123