Genetic-based approach for cue phrase selection in dialogue act recognition

17
RESEARCH PAPER Genetic-based approach for cue phrase selection in dialogue act recognition Anwar Ali Yahya Abd Rahman Ramli Received: 15 August 2008 / Revised: 3 December 2008 / Accepted: 8 December 2008 / Published online: 16 January 2009 Ó Springer-Verlag 2009 Abstract Automatic cue phrase selection is a crucial step for designing a dialogue act recognition model using machine learning techniques. The approaches, currently used, are based on specific type of feature selection approaches, called ranking approaches. Despite their computational efficiency for high dimensional domains, they are not optimal with respect to relevance and redun- dancy. In this paper we propose a genetic-based approach for cue phrase selection which is, essentially, a variable length genetic algorithm developed to cope with the high dimensionality of the domain. We evaluate the perfor- mance of the proposed approach against several ranking approaches. Additionally, we assess its performance for the selection of cue phrases enriched by phrase’s type and phrase’s position. The results provide experimental evi- dences on the ability of the genetic-based approach to handle the drawbacks of the ranking approaches and to exploit cue’s type and cue’s position information to improve the selection. Furthermore, we validate the use of the genetic-based approach for machine learning applica- tions. We use selected sets of cue phrases for building a dynamic Bayesian networks model for dialogue act rec- ognition. The results show its usefulness for machine learning applications. Keywords Genetic algorithm Feature selection Cue phrase selection Ranking feature selection Dialogue act recognition 1 Introduction Dialogue act (DA) is defined as a concise abstraction of a speaker’s intention in his utterance. It has roots in several language theories of meaning, particularly speech act the- ory [4], which interprets any utterance as a kind of action, called speech act and categorises speech acts into speech acts categories [60]. DA, however, extends speech act by taking into account the context of the utterance [7]. Fig- ure 1 is a hypothetical dialogue annotated with DAs. The automatic recognition of DA, dialogue act recog- nition (DAR), is a task of crucial importance for the processing of natural language at discourse level. It is defined as follows: given an utterance with its preceding context, how to determine the DA it realises. Formally, it is a classification task in which the goal is to assign a suitable DA to the given utterance. Due to its importance for var- ious applications such as dialogue systems, machine translation, speech recognition, and meeting summarisa- tion, it has received a considerable amount of attention [33]. For example, in dialogue systems it conditions the successful interpretation of the user’s utterance and con- sequently the generation of an appropriate response by the system. Inspired by their successful applications to many natural language processing problems, machine learning (ML) techniques have become the current trend for tack- ling the DAR problem [19]. In this regard, various ML techniques have been investigated and the resulting models have become known as cue-based models [33]. As depicted in Fig. 2, ML technique builds a cue-based model of DAR A. A. Yahya (&) A. R. Ramli Intelligent System and Robotics Laboratory, Institute of Advanced Technology, University Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia e-mail: [email protected] A. R. Ramli e-mail: [email protected] 123 Evol. Intel. (2009) 1:253–269 DOI 10.1007/s12065-008-0016-6

Transcript of Genetic-based approach for cue phrase selection in dialogue act recognition

RESEARCH PAPER

Genetic-based approach for cue phrase selection in dialogue actrecognition

Anwar Ali Yahya Æ Abd Rahman Ramli

Received: 15 August 2008 / Revised: 3 December 2008 / Accepted: 8 December 2008 / Published online: 16 January 2009

� Springer-Verlag 2009

Abstract Automatic cue phrase selection is a crucial step

for designing a dialogue act recognition model using

machine learning techniques. The approaches, currently

used, are based on specific type of feature selection

approaches, called ranking approaches. Despite their

computational efficiency for high dimensional domains,

they are not optimal with respect to relevance and redun-

dancy. In this paper we propose a genetic-based approach

for cue phrase selection which is, essentially, a variable

length genetic algorithm developed to cope with the high

dimensionality of the domain. We evaluate the perfor-

mance of the proposed approach against several ranking

approaches. Additionally, we assess its performance for the

selection of cue phrases enriched by phrase’s type and

phrase’s position. The results provide experimental evi-

dences on the ability of the genetic-based approach to

handle the drawbacks of the ranking approaches and to

exploit cue’s type and cue’s position information to

improve the selection. Furthermore, we validate the use of

the genetic-based approach for machine learning applica-

tions. We use selected sets of cue phrases for building a

dynamic Bayesian networks model for dialogue act rec-

ognition. The results show its usefulness for machine

learning applications.

Keywords Genetic algorithm � Feature selection �Cue phrase selection � Ranking feature selection �Dialogue act recognition

1 Introduction

Dialogue act (DA) is defined as a concise abstraction of a

speaker’s intention in his utterance. It has roots in several

language theories of meaning, particularly speech act the-

ory [4], which interprets any utterance as a kind of action,

called speech act and categorises speech acts into speech

acts categories [60]. DA, however, extends speech act by

taking into account the context of the utterance [7]. Fig-

ure 1 is a hypothetical dialogue annotated with DAs.

The automatic recognition of DA, dialogue act recog-

nition (DAR), is a task of crucial importance for the

processing of natural language at discourse level. It is

defined as follows: given an utterance with its preceding

context, how to determine the DA it realises. Formally, it is

a classification task in which the goal is to assign a suitable

DA to the given utterance. Due to its importance for var-

ious applications such as dialogue systems, machine

translation, speech recognition, and meeting summarisa-

tion, it has received a considerable amount of attention

[33]. For example, in dialogue systems it conditions the

successful interpretation of the user’s utterance and con-

sequently the generation of an appropriate response by the

system. Inspired by their successful applications to many

natural language processing problems, machine learning

(ML) techniques have become the current trend for tack-

ling the DAR problem [19]. In this regard, various ML

techniques have been investigated and the resulting models

have become known as cue-based models [33]. As depicted

in Fig. 2, ML technique builds a cue-based model of DAR

A. A. Yahya (&) � A. R. Ramli

Intelligent System and Robotics Laboratory,

Institute of Advanced Technology,

University Putra Malaysia,

43400 UPM Serdang, Selangor, Malaysia

e-mail: [email protected]

A. R. Ramli

e-mail: [email protected]

123

Evol. Intel. (2009) 1:253–269

DOI 10.1007/s12065-008-0016-6

by learning from utterances of a dialogue corpus the

association rules between surface linguistic features of

utterances and the set of DAs. In doing so, ML exploits

various types of linguistic features such as cue phrases,

syntactic features, prosodic features… etc.

Among different types of linguistic features, cue phrases

are the strongest [34]. They are defined by Hirschberg and

Litman [29] as linguistic expressions that function as

explicit indicators of the structure of a discourse. Since not

all phrases are relevant to the DAR, prior to applying a ML

technique, the selection of relevant cue phrases is of crucial

importance. A successful selection of cue phrases would

speed up the learning process, reduce the required training

data, and improve the classification accuracy [6].

One cue-based model, which has been proposed recently

and used as a context of the current research, is dynamic

Bayesian network (DBN) model [1]. As depicted in Fig. 3,

the DBN model of DAR consists of T time slices, in which

each slice is a Bayesian network (BN) composed of a

number of random variables. The DBN models a sequence

of utterances over time in such a way that each BN cor-

responds to a single utterance. In this sense DBN is time

invariant, meaning that the structure and parameters of BN

is the same for all time slices. Moreover, in each BN, there

is a hidden random variable, which represents the DA that

need to be recognised and a set of observation variables

extracted from the linguistic features of the corresponding

utterance. In this model, dynamic Bayesian ML algorithms

have been employed to construct the DBN model from a

dialogue corpus. An essential issue aroused while building

the DBN model for DAR is the specification of the

observation variables. For this model, it has been suggested

that the number of the random variables in each BN should

be equal to the number of DAs that the model recognises.

Moreover, each variable is defined as a logical rule, dis-

junctive normal form (DNF), consists of a set of cue

phrases which are informative to one and only one DA and

expressed as follows:

DNF ¼ if(p1 _ p2 _ . . . _ pmÞ then DA

where DA is the target DA and pi is a cue phrase selected

for that DA. In doing so, each variable works as a binary

classifier for the target DA.

Obviously, the cue phrase selection in the context of

DBN model of DAR is an instance of feature selection in

high dimensional domains, which are characterised by

huge number of features, numerous irrelevant features, and

high correlation between the features [18]. A very similar

task is the feature selection in the context of text categor-

isation [61], in which the goal is to select the most relevant

phrases for each category of documents. In general, the

selection of relevant cue phrases can be carried out either

manually, by the field expert, or automatically via feature

selection approaches. The problem with the manual

approach is that it generates general cues which cannot be

used for all domains, and therefore the automatic selection

of cue phrase becomes a practical choice [58].

The literature of feature selection reveals of vast number

of approaches developed in various domains [11, 45].

Typically, feature selection approach is composed of three

components: search algorithm to search the space of fea-

ture, evaluation function to evaluate features set, and

performance function to which the resulting set is applied

[11]. Based on the evaluation function, feature selection

approaches can be classified as filters or wrappers. The

wrapper approaches [38] employ ML algorithm as evalu-

ation function to estimate the relevance of the selected

features. On the other hands, the filter approaches pose

feature selection as a separate process from the ML algo-

rithm, therefore the evaluation and the performance

functions are different entities. The filter approaches

themselves can be further categorised into two groups,

namely ranking approaches and subset search approaches,

based on whether they evaluate the relevance of features

individually or as subset. Ranking approaches evaluate

each phrase individually according to a particular metric

(ranging from information theoretic metrics such as Mutual

Information (MI) and Information Gain (IG) to statistical

metrics such as Chi Square (v2) and Odd Ratio (OR) [50,

55] and then pick out the best k phrases. On the other hand,

subset search approaches employ search strategy (exhaus-

tive, heuristic, random search) to search through candidate

feature subsets [12]. The search is guided by a certain

cue-based model

utterance DA

dialogue corpus

ML

Fig. 2 Cue-based DAR model

Speaker Utterance DA

Fig. 1 Hypothetical dialogue annotated with DAs

254 Evol. Intel. (2009) 1:253–269

123

evaluation measure, which captures the goodness of each

subset, and an optimal or near optimal subset is selected

when the search stops [44]. In addition to filter and wrapper

approaches, a hybrid approach approaches were also pro-

posed to take advantages of both filter and wrapper [9, 77].

A typical hybrid approach makes use of both an indepen-

dent filter metric and ML algorithm to evaluate feature

subsets: It uses the filter metric to decide the best subsets

for a given dimension and uses the ML algorithm to select

the final best subset among the best subsets across different

cardinalities. To recap, filters are more computationally

efficient than wrapper; however filters may result in poor

performance because the evaluation function does not

match the performance function well, even if the selected

features is optimal for the evaluation function. Within fil-

ters, although the ranking approaches seem much less

resource intensive than subset search, they have severe

problem with respect to the relevance and redundancy of

the selected features.

With regard to cue phrase selection in DAR, the litera-

ture indicates that only the ranking approaches have been

investigated [35, 42, 58, 68, 70] due to their computational

efficiency, regardless of their inefficiency with respect to

the relevance and redundancy of the selection. More pre-

cisely, while the selection of the features should depend on

their subsequent use, in the ranking approaches the eval-

uation of the selected features is based on the intrinsic

properties of the data rather than their subsequent use [38].

The potential effect is the inclusion of some irrelevant

features or the exclusion of relevant ones. From another

side, the evaluation function assumes that the relevance of

the selected features is equal to the summation of the

individual relevance of each feature. In other words, the

evaluation function does not account for the correlation

among the features; therefore the evaluation function

always overestimates the relevance of the selected features

and results in a redundant selection [20].

Based on the above discussion, we propose a subset

search approach for cue phrase selection to avoid the

problems of the ranking approaches. More specifically, we

propose genetic algorithm (GA) due to its advantage over

many traditional local search heuristic methods, particu-

larly when the search space is highly modal, discontinuous,

or highly constrained [79]. As will be seen in the next

section, GA was applied previously for feature selection as

wrapper and filter in many works. In these works, due to

the low dimensionality of the search space, the application

of GA was straightforward. For cue phrase selection, the

straightforward application of GA is hindered by the high

dimensionality of the search space; therefore we propose

several modifications to adapt GA for this task.

The rest of the paper is organised as follows. Section 2

reviews the previous works of using GA for feature

selection. Section 3 elaborates on how the standard GA is

adapted for cue phrase selection. Section 4 presents the

obtained results and discussions from several cases of

experiments: baseline approaches, genetic-based approach,

genetic-based approach with negative cue phrase, genetic-

based approach with cue’s positional information, and

validation experiments. Finally Sect. 5 concludes the cur-

rent work on cue phrase selection.

2 Genetic-based feature selection

GA is a biologically inspired search algorithm, which is

loosely based on molecular genetics and natural selection.

The basic principles of GA were stated by John Holland

[30]. After that, GA has been reviewed in a number of

works [23, 28, 51, 69]. In the standard GA the candidate

Fig. 3 Example of the DBN model of DAR

Evol. Intel. (2009) 1:253–269 255

123

solutions are represented as bit strings (referred to as

chromosomes) whose interpretation depends on the appli-

cation. The search for an optimal solution begins with a

random population of initial solutions. The chromosomes

of the current population are evaluated relative to a given

measure of fitness, with the fit chromosome selected

probabilistically as seeds for the next population by means

of genetic operations such as random mutation and

crossover.

The seminal work on using GA for feature selection

goes back to [62]. Since then, there have been numerous

works on using GA for feature selection in different modes

(wrapper, filter, or hybrid) and various contexts [45]. As

previously mentioned, in the wrapper mode the ML algo-

rithm is used as the evaluation function, therefore a brief

review of the works of wrapper GA feature selection can be

carried out based on the employed ML algorithm. k-near-

est-neighbor was the first ML algorithm employed as GA

fitness in the seminal work of Siedlecki and Sklansky [62].

In [37], GA with k-nearest neighbor was used to find a

vector of weightings of the features to reduce the effects of

irrelevant or misleading features. Similarly, GA was

combined with k-nearest neighbor to find an optimal fea-

ture weighting to optimise a classification task [57]. This

approach has proven especially useful with large data sets,

where standard feature selection techniques are computa-

tionally expensive. GA with fitness function based on the

classification accuracy of k-nearest neighbor and features

subset complexity was used to improve the performance of

image annotation system [49]. GA and k-nearest neighbor

were combined to select feature (genes) that can jointly

discriminate between different classes of samples (e.g.

normal versus tumor) [43]. This approach is capable of

selecting a subset of predictive genes from a large noisy

data for sample classification.

Artificial neural networks were employed as GA fitness

function for feature selection for pattern classification in

several works. For example, GA with neural networks was

combined for feature selection in pattern classification and

knowledge discovery [74]. It was also used with neural

networks for selecting features for defect classification of

wood boards [8]. In [31], GA with neural network was

proposed to select feature subset to get high accuracy for

classification. Similarly GA with neural network was pro-

posed for feature selection for the classification of different

types of small breast abnormalities [78].

Another ML algorithm that was used as a fitness func-

tion of GA is support vector machine. To cite examples,

[16] used GA with support vector machine for feature

selection in time series classification. In [22], GA with

support vector machine was investigated and compared

with other existing algorithms for feature selection. Also,

Morariu [52] presented GA with a fitness function based on

the support vector machine for feature selection which has

proven to be efficient for nonlinearly separable input data.

For the classification of hyperspectral data, GA with sup-

port vector machine was proposed in [80]. Yu and Cho [75]

proposed a feature selection approach, in which GA was

employed to implement a randomised search and support

vector machine was employed as a base learner for key-

stroke dynamics identity verification.

GA with decision tree, (e.g. ID3, C4.5) was explored in

[65, 66] to find the best feature set to be used by the

induction system on difficult texture classification prob-

lems. William [72] designed a generic fitness function for

validation of input specification, and then used it to

develop GA wrapper for feature selection for decision tree

inducers. The effectiveness of GA for feature selection in

the automatic text summarisation task was investigated in

[63] where the decision tree was used as a fitness function.

With regard to the GA in the filter mode, it seems that

GA is more computationally attractive, as the computa-

tional time of the GA is quite high. Therefore, the

combination of GA and the ML algorithms is not so effi-

cient, particularly if we consider the run of the ML

algorithm is needed every time a chromosome in GA

population is evaluated. Some examples of using GA in the

filter mode include [47], in which MI measurement

between classes and features were used as evaluation

function. Based on the experimental results of handwritten

digit recognition, this method can reduce the number of

features needed in the recognition process without

impairing the performance of the classifier significantly. In

[56], GA was used for feature selection by minimising a

cost function derived from the correlation matrix between

the features and the activity of interest that is being mod-

eled. From a dataset with 160 features, GA selected a

feature subset (40 features) which built a better predictive

model than with full feature set. Another example is [10],

in which GA was used with MI to evolve a near optimal

input feature subset for neural networks. A fast filter GA

approach for feature selection which improves previous

results presented in the literature of feature selection was

described in [41].

As a hybrid approach for feature selection, GA was also

investigated. In [9], a hybrid approach based on GA and a

method based on class separability applied to the selection

of feature subsets for classification problems. This

approach is able to find compact feature subsets that give

the most accurate results, while beating the execution time

of some wrappers. A feature selection approach named

ReliefF-GA-Wrapper was proposed in [77] to combine the

advantages of filter and wrapper. In this approach, the

original features are evaluated by the Relief filter approach,

and the resulting estimation is embedded into the GA to

search optimal feature subset with the train accuracy of ML

256 Evol. Intel. (2009) 1:253–269

123

algorithm for handwritten Chinese characters dataset.

Additionally, in [17] a two stages feature selection

approach was proposed. The first stage employs MI to filter

out the least discriminant features, resulting in a reduced

feature space. Then a GA is applied to the reduced feature

space to further reduce its dimensionality and select the

best set of features.

An important fact with regard to the previous applica-

tions of GA for feature selection is that the fixed length

binary representation scheme is used to represent each

chromosome of the population as a feature subset. For an n-

dimension feature space, each chromosome is encoded by

an n-bit binary string b1…bn. bi = 1, if the i-th feature is

present in the feature subset represented by the chromo-

some and bi = 0 otherwise. Figure 4 is a hypothetical

chromosome represented using the fixed length binary

representation scheme of the standard GA.

The advantage of this representation is that the standard

GA is applied straightforward without any modification.

Unfortunately, the fixed length binary representation is

appropriate when the dimension of the feature space is not

high. If the feature space dimension is high, the computa-

tion resources required for GA evolution are very

expensive and the case worsens when only small numbers

of these features are relevant. Several attempts have been

made to adapt GA for feature selection in high dimensional

domains. For example, Moser and Murty [53] maintained

the use of fixed length binary representation but increase

the computational resources by using a distributed GA to

implement a wrapper genetic feature selection. Other

works [26, 31, 46] adapted GA by pre-specifying the

number of features that must be selected and representing

each chromosome by the indices of randomly selected

features rather than representing the presence or absence of

all features. While the former attempt is based on

increasing the computational resources, which are expen-

sive, the latter poses a simplification assumption which is

not true.

Based on that, we proposed a new approach to adapt GA

for feature selection in high dimensional domains in gen-

eral and cue phrase selection in particular. The proposed

approach is based on two intuitions: first, the chromosome

should represent the candidate features rather than the

presence or absence of each feature in the space. The

second intuition is that the number of the selected features

is specified during the selection and is not known a priori.

To realise that it turns out that the variable length and non-

binary representation are viable choices. However, choos-

ing variable length representation calls for modifying the

genetic operators or devising new operators to cope with

the new representation scheme as described in the sub-

sequent section.

3 Genetic-based cue phrase selection

In this section we describe the proposed genetic-based

approach for cue phrase selection in DAR. Due to the high

dimensionality of the phrase space, the standard GA is

adapted by using the variable length representation scheme,

modifying crossover and mutation operators, and propos-

ing a new genetic operator. Before diving into the details of

the proposed approach, two points are worth mentioning.

First, the proposed genetic-based approach for cue phrase

selection is another application of GA in natural language

processing domain. GA has been applied before to solve

the problem of word categorisation [40] and word seg-

mentation [36]. In [5], GA was used to construct a finite-

state automaton to deal with Russian and German pho-

netics. For grammar induction, GA was applied to evolve

grammars with a direct application in information retrieval

[48]. For dialogue systems, GA was incorporated into a

dialogue system to optimise the system’s parameter set-

tings [54]. A variant of Brill’s implementation that uses GA

to generate the instantiated rules and provide an adaptive

ranking for the part-of-speech problem was described in

[73]. A variety of GA, messy GA, was applied for part of

speech tagging problem in [3]. Even for feature selection in

natural language processing, GA was investigated for fea-

ture selection in automatic text summarisation [63] and text

categorisation [74].

The second point is that the idea of using variable length

representation in the context of evolutionary algorithms is

as old as the algorithms themselves. Fogel and Walsh [21]

seem to be one of the first experimenting with variable

length representation. In their work, they evolved finite

state machines of a varying number of states, therefore

making use of operators like addition and deletion. Holland

[30] proposed the concepts gene duplication and gene

deletion in order to raise the computational power of

evolutionary algorithms. Smith [64] departed from the

early fixed-length character strings by introducing variable

length strings, including strings whose elements were if-

then rules (rather than single characters). Since the first

attempts of using variable length representations, many

researchers made use of the idea under different motiva-

tions such as engineering applications [14, 15] or raising

the computational power of evolutionary algorithms [59].

With regard to GA as a paradigm of evolutionary

algorithms, the use of variable length representation has

Fig. 4 Fixed length binary chromosome for feature selection

Evol. Intel. (2009) 1:253–269 257

123

been proposed in several versions. Well known versions

are messy GA [24, 25], genetic programming [39], and

Species Adaptation GA [27]. These versions differ in the

specification of the representation scheme and conse-

quently the genetic operators. Messy GA uses binary

representation in which each gene is represented by a pair

of numbers that are the gene position and the gene value.

Messy GA uses the mutation operator as with standard GA.

Instead of crossover, messy GA uses the splice and cut

operators. The splice operator joins one chromosome to the

end of the other. The cut operator splits one chromosome

into two smaller chromosomes. Genetic programming is an

extension of GA with variable length representation

scheme in the form of hierarchical tree representing com-

puter program. The aim of genetic programming is to find

the best tree (computer program) that solves a given

problem. It adapts genetic operators of the standard GA to

cope with the tree representation scheme. Species Adap-

tation GA uses a variable length binary representation

scheme. It differs from the standard GAs subtly but sig-

nificantly. Evolution is directed by selection exploiting

differences in fitness causes by variations in the genetic

makeup of the population. While mutation operator in the

standard GA and genetic programming is considered as a

background operator, and crossover is usually assumed to

be the primary operator, in species adaptation GA the

reverse is true. Of these two genetic operators, mutation is

primary, and crossover, though useful, is secondary.

Besides that, researchers may opt to develop a domain-

specific version of variable length representation GA to

better meet the requirements of the domain, rather than

using the existing ones. For example, [76] investigated the

application of GA in the field of evolutionary electronics,

in which a special variable length GA was proposed to

cope with the main issues of variable length evolutionary

systems. Following this trend, we developed a special

version of GA specifically for cue phrase selection in DAR

and similar high dimensional domains as elaborated in the

following subsections.

3.1 Representation scheme

The representation scheme of the genetic-based approach

for cue phrase selection is based on variable length non

binary representation scheme, in which each chromosome

represents a candidate set of phrases. Figure 5 shows an

example of a chromosome represented using this scheme. It

is positional independent, meaning that the gene position

has no role in determining the aspects of the chromosome

at the phenotype level. This aspect is of particular impor-

tance for the designing of the genetic operators as

elaborated in the subsequent sections.

3.2 Phrases space mask

Technically speaking, GA explores the promising points in

the search space via genetic operations. Therefore, the

representation scheme and the genetic operators should

give rise to an effective exploration of the search space.

Using the proposed representation scheme directly does not

assist the genetic operators to explore new points in the

search space. Therefore, to ensure a good exploration of the

phrase space, we propose phrases space mask. The phrase

space mask is a binary string with length equal to the size

of phrase space, in which each bit marks the status of a

single phrase in the phrase space. Accordingly, the value 1

indicates that the phrase is being used by the current

population and the value 0 indicates that the phrase is not

in use. Figure 6 shows a fragment of the phrases space

mask adopted for cue phrase selection. It shows that the

phrases ‘‘the price’’, ‘‘who’’, and ‘‘information’’ are par-

ticipating in the current GA population, whereas the

phrases ‘‘good’’, ‘‘would you like’’, and ‘‘do you’’ are not.

As it will be described in the subsequent sections, the status

of the phrases in the phrases space mask is updated either

immediately, after performing the genetic operation, or

through a rebuilding step of the phrases space mask, which

takes place during the transition from generation t to gen-

eration t ? 1.

3.3 Fitness function

The fitness function is the driving force for the evolution in

GA. It performs the same role as the evaluation function in

feature selection. Since the general aim of feature selection

is to find a minimum number of features with maximum

relevance, the fitness function of a candidate set of phrases,

P, is a combination of two measures, namely relevance and

complexity as follows [53].

FðPÞ ¼ RelevðPÞ � pf � LðPÞN

ð1Þ

In the above formula, Relev(P) denotes the estimated

relevance of the phrases set P, and L(P) is a measure of the

complexity of the phrases set, usually the number of

utilised phrases. Furthermore, N is the phrases space

dimension, and pf is a punishment factor to weigh the

multiple objectives of the fitness function (in this research

pf = 1).

In general, the estimation of Relev(P) should be based

on the subsequent use of cue phrases by the ML algorithm.

Fig. 5 Variable length chromosome for cue phrase selection

258 Evol. Intel. (2009) 1:253–269

123

For cue phrase selection in the DBN context described

earlier, each variable works as a binary classifier for the

given DA, therefore Relev(P) can be estimated by the

following accuracy measure.

RelevðPÞ ¼ TPþ TN

NUð2Þ

where TP is the number of times the DNF, constituted from

the selected phrases P, returns true when the utterance

belongs to the target DA, TN is the number of times the

DNF returns false when the utterance does not belong to

the target DA, and NU is the total number of utterances in

the dialogue corpus.

3.4 Selection scheme

The selection scheme of the genetic-based approach for cue

phrase selection is (k, q) tournament selection. It,

randomly, chooses k chromosomes from the current pop-

ulation and with certain probability q returns the best

chromosome, otherwise returns the worst chromosome.

3.5 Genetic operators

The proposed genetic-based approach modifies the three

genetic operators of the standard GA to cope with the

variable length representation scheme. Furthermore, it

introduces a new operator called AlterLength.

3.5.1 Reproduction

The reproduction operator of the genetic-based approach is

similar to the reproduction operator of the standard GA.

With a reproduction probability, Pr, a chromosome is

randomly selected from the current generation, and copied

into the new generation without any modification.

3.5.2 Crossover

To cope with the variable length representation of the

proposed genetic-based approach, the uniform crossover of

the standard GA has been adapted. The uniform crossover

[51] is an operator that decides with a probability, Pc,

which parent will contribute to each of the gene values in

the offspring chromosomes. The uniform crossover has

been modified as follows. First two parents (chromosomes)

from the current population are selected. Then, with a

probability, 0.5, the length of the offspring is chosen to be

either the length of the short or long parent. If the length of

the short parent is chosen, then a uniform crossover is

performed between the short parent and an equal length

segment from the long parent. If the length of the long

Fig. 6 Fragment of phrase

space mask

Fig. 7 Example of the adapted uniform crossover

Evol. Intel. (2009) 1:253–269 259

123

parent is chosen, then a uniform crossover is performed

between the short parent and an equal length segment of

the long parent. The remaining parts of the long parent are

appended to the beginning and the end of the offspring.

Figure 7 is an example of the adapted uniform crossover.

3.5.3 Mutation

The proposed approach for mutation is to replace the val-

ues of some randomly selected genes with new values from

the phrase space which are not participating in the current

population. The mutation operator is applied with proba-

bility, Pm, to each chromosome generated from the

crossover operation. This operator is performed with the

assistance of the phrase space mask. More specifically, for

each gene in the chromosome, if it is selected for mutation

then it is replaced by a randomly selected phrase from the

phrases space, which has its status in the phrases space

mask marked inactive, and then the status of the selected

phrase in the phrase space mask is set to active immedi-

ately. With regard to the mutated phrase (gene), its status in

the phrases space is not set to inactive immediately,

because this phrase is still in use by the parents (members

of the current GA population). Setting the status of the

mutated gene to inactive is performed after the compilation

of all genetic operations on the current population and

during a rebuilding step of the phrases space mask. Fig-

ure 8 shows an example of mutation operator.

3.5.4 AlterLength

The crossover and mutation operators were designed to

introduce variation to the content of the chromosome. To

introduce a variation to the length of the chromosome, the

AlterLength operator is proposed. It randomly expands

(shrinks) the chromosome by inserting (deleting) a single

phrase into (from) it. In case of insertion, an inactive phrase

is randomly selected from the phrases space and inserted in

a randomly selected position of the chromosome. In case of

deletion the selected phrase is removed from the chromo-

some and its status in the phrases space mask remains

active until the rebuilding step of the phrases space mask.

The AlertLength operator is performed with a probability,

Pal, as shown in Fig. 9.

3.6 Stopping criteria and control parameters

The proposed genetic-based approach makes use of one of

the stopping criteria of the standard GA. It is to stop when

the evolution does not introduce any significant change. As

for the control parameters, the genetic-based approach uses

the following parameters: Population Size (PopSize),

tournament selection parameters (k, q), reproduction

Fig. 8 Example of mutation

operator

Fig. 9 Example of AlertLength

operator

260 Evol. Intel. (2009) 1:253–269

123

probability (Pr), crossover probability (Pc), mutation

probability (Pm), and AlterLength probability (Pal).

4 Results and discussions

To evaluate the performance of the genetic-based approach

on the selection of cue phrases, we conducted several

cases of experiments. These cases are named as follows:

baseline approaches, genetic-based approach, genetic-based

approaches with negative cue phrases, genetic-based

approach with cue’s positional information, and validation

experiments. Before diving into the details of the experi-

ments, a brief description of the used dialogue corpus and the

preprocessing steps applied to generate the phrase space is

useful.

4.1 Setting of the experiments

We used SCHISMA dialogue corpus which belongs to

SCHISMA project (SCHouwburg Informatie Systeem) at

University of Twente. It involves various activities to

realise theatre information and booking system [67]. The

dialogues are mixed-initiative, meaning that the initiative

may switch between the participants within a single dia-

logue. The task domain concerns with information

exchange and reservation transaction in theatre. Users are

able to make inquiries about theatre performances sched-

uled and if they desire, make ticket reservations. Figure 10

shows a fragment of SCHISMA dialogue corpus.

SCHISMA corpus was annotated with DAMSL coding

scheme [2]. In this process, each utterance is subdivided

into one or more segments, and the DAs are assigned to

each segment. In this research, we focus on the DAs given

in Table 1.

After annotating the corpus with DAs, the following

processes were performed to generate the phrases space.

1. Tokenisation: Tokenisation occurs at utterance level,

and the token is defined as a sequence of letters or

digits separated by separator (e.g.,’’.’’, ‘‘:’’, ‘‘;’’). In this

process, all punctuations are discarded except ‘‘?’’

which is treated as token.

2. Removing morphological variations: It has been

noticed that most morphological variations in

SCHISMA corpus are plurals and tenses variations

which are not relevant for the recognition process.

3. Semantic clustering: Clusters certain words into

semantic classes based on their semantic relatedness

and then replaces each occurrence of the words with

the cluster name. For SCHISMA corpus, the following

semantic clusters were identified:

(a) ShowName: Any show name appears in the

corpus.

(b) PlayerName: Any player name appears in the

corpus.

(c) Number: Any sequence of digits (0…9) or

alphabetic numbers(one, two, …)

(d) Day: Any occurrence of a day name (Monday,

Tuesday, …, Sunday)

(e) Month: Any occurrence of a month name (Jan-

uary, …, December)

(f) Price: A Number cluster preceded by currency

symbol (f, ff)

(g) Date: Any of the following sequences \Number

Month Number[, \Month Number[, \Number

Month[(h) Time: A Number cluster preceded by the prop-

osition ‘‘at’’

4. N-gram phrases generation: In this process all phrases

that consist of one, two, and three words were

generated from each utterance in the corpus.

5. Removing less frequent phrase: to reduce the dimen-

sion of the phrases space, the phrases occur less than a

frequency threshold number were removed. Based on

the experiments of [71], the chosen frequency thresh-

old was 3.

U: When is Sweeny Todd on? S: You can see 'Sweeny Todd' in the period December 28 to 30 U: What about under a blue roof? S: You can see 'Under a blue roof' in the 'Grote Zaal' on May U: Can I order a ticket for that S: Do you have a reduction card? U: I dont have a reduction card. U: Four tickets please.

Fig. 10 Fragment of SCHISMA dialogue corpus

Table 1 Experimented DA and their frequencies in SCHISMA

corpus

DA Meaning Frequency

Statement The speaker makes a claim about the

world

817

Query-if The speaker asks the hearer whether

something is the case or not

108

Query-ref The speaker asks the hearer for

information in the form of

references that satisfy some

specification given by the speaker

598

Positive-answer The speaker answer in positive 561

Negative-answer The speaker answer in negative 72

No-blf The utterance does not tagged with

any blf dialogue act

968

Evol. Intel. (2009) 1:253–269 261

123

The above preprocessing steps resulted in a phrases

space of 1,336 phrases. This phrase space was used in the

first two cases of experiments. However, in the subsequent

cases further preprocessing steps were introduced which

make the phrase space for each DA has different size.

4.2 Baseline approaches

The baseline approaches are a selected set of ranking

approaches, which are always applied for cue phrase

selection. As representatives of these approaches, we

choose the approaches shown in Table 2. Each approach

has a metric expressed in the corresponding formula. In

each formula, f denotes the feature (phrase) and c denotes

the class (DA). Each of these approaches was experimented

on the selection of cue phrases for each DA. More spe-

cifically, for each DA, each ranking approach ranked the

phrases using its own metric. Then, the fitness value, F(P),

along with relevance value, Relev(P), and complexity

value, L(P)/N, of each k phrases (k = 1,2,…n) in the

ranked list were calculated and the top k phrases which

give the maximum fitness value, F(P), is the selected set of

phrases for that DA as shown in Table 3.

The results of the baseline approach experiment are

given in Table 3. From these results, it is clear that there is

a similarity between the performance of MI and OR from

one side and the performance of IG and v2 from the other

side in three aspects. First, from the complexity values,

L(P)/N, it is clear that MI and OR tends to select larger

number of phrases than IG and v2. Second, the Relev(P)

values of MI and OR are higher than IG and v2. Third, as a

direct result of the similarity in Relev(P) and L(P)/N values

within each group,(MI, OR) and (IG, v2), the pattern of the

fitness values is similar within each group, though, between

the two groups the comparison of fitness values are not

conclusive .

The similarity between the two groups, (MI, OR) and

(IG, v2), can be understood through the following facts. For

each DA, each phrase has two sides, positive and negative.

The positive side depends on the presence of the phrase in

the utterances labeled with the target DA and the absence

of the phrase from the utterances labeled with other DAs.

The negative side depends on the absence of the phrase

from the utterances labeled with the target DA and the

presence of the phrase in the utterances labeled with other

DAs. Based on that, the ranking approach is classified as

either one-sided metric or two-sided metric depending on

whether it’s metric account for the negative side of the

phrase or not [79]. One-sided metrics rank the phrases

according to their positive side; therefore the top k phrases

in the ranked list are phrases with the highest positive sides

and, definitely, the lowest negative sides. With regard to

the two-sided metrics, they rank the phrases according to a

combination of both positive and negative sides. Therefore

the top k phrases in the list are phrases with the highest

negative or positive sides.

From Table 2, it is clear that MI and OR are one-sided

metrics, whereas IG and v2 are two-sided metrics. It is also

obvious that the fitness measure, Eq. 1, which was used for

the selection of cue phrases from the ranked list, has its

Relev(P) subpart depends on the positive side of the

phrases rather than the negative side. Therefore, the rank-

ing of the one-sided metrics is more appropriate for the

fitness measure than the two-sided metrics. This can

interpret the higher Relev(P) values of the cue phrases

selected by MI and OR. However, the inability of these

approaches to account for the correlation between cue

phrases leads to the selection of large number of cues

phrases in case of MI and OR. In other words, the ranking

approaches assume that the relevance of a set of phrases is

equal to the summation of the individual relevance of each

phrase which leads to redundant selection.

Table 2 Experimented ranking approaches

Metric Formula

MI MIðf ; cÞ ¼ logPðf ;cÞ

Pðf Þ:PðcÞ

OR ORðf ; cÞ ¼ Pðf=cÞ:ð1�Pðf=�cÞð1�Pðf=cÞÞ:Pðf=�cÞ

IG IGðf ; cÞ ¼P

c2fc;cgP

f2ff ;fg pðf ; cÞ:logPðf ;cÞ

Pðf Þ:PðcÞ

v2 v2ðf ; cÞ ¼ N�½Pðf ;cÞ:Pðf cÞ�Pðf ;cÞ:Pðf ;cÞ�Pðf Þ:Pðf Þ:PðcÞ:PðcÞ

2

Table 3 Results of the ranking approaches experiments

MI IG v2 OR

Relev(P) L(P)/N F(P) Relev(P) L(P)/N F(P) Relev(P) L(P)/N F(P) Relev(P) L(P)/N F(P)

Statement 0.8403 0.1055 0.7347 0.6385 0.0052 0.6385 0.6619 0.0007 0.6612 0.7675 0.0509 0.7166

Query-if 0.9634 0.0045 0.9589 0.9580 0.0007 0.9580 0.9580 0.0007 0.9572 0.9599 0.0030 0.9569

Query-ref 0.8505 0.0404 0.8101 0.8149 0.0060 0.8149 0.8149 0.0060 0.8089 0.8803 0.0412 0.8391

Positive-answer 0.8217 0.0636 0.7581 0.7333 0.0015 0.7333 0.7860 0.0007 0.7853 0.8105 0.0464 0.7640

Negative-answer 0.9687 0.0015 0.9672 0.9595 0.0015 0.9595 0.9595 0.0015 0.9580 0.9687 0.0022 0.9665

No-blf 0.8036 0.1198 0.6839 0.7333 0.0015 0.7333 0.7333 0.0015 0.7318 0.7851 0.0786 0.7065

262 Evol. Intel. (2009) 1:253–269

123

The general conclusion that can be drawn from this case

of experiments is that the ranking approaches are not able

to maintain a tradeoff between the two subparts of the

fitness functions. They tend to optimise one subpart at the

expense of the other.

4.3 Genetic-based approach

The aim of this case of experiments is to evaluate the

genetic-based approach on the selection of cues phrases for

each DA given in Table 1. The settings of the control

parameters are as follows PopSize = 500, q = 10, k = 0.7,

r = 0.3, Pc = 0.7, Pr = 0.1, Pal = 0.2, Pm = 0.1, and the

stopping criterion is to stop if there is no significant

improvement within 10 generations. Table 4 summarises

the results obtained from this case of experiments. It should

be mentioned that in each genetic-based approach experi-

mental cases, GA was run five times and the results of the

best run were reported.

Figure 11 is an example of the GA evolution during the

selection of the cue phrases for statement DA. The curves

correspond to the best evolutionary trends. In general, it

can be noticed that there is a rapid growth at the early

generations followed by a long period of slow evolution

until meeting the stopping criterion. This reflects the nature

of the search space of cue phrases which is hugely multi-

modal and contains a lot of peaks. An interesting aspect of

the average population fitness curve is that despite the

fluctuations, an overall look at the curve shows a general

tendency of improvement, particularly at the early

generations.

A comparative look at the result of Tables 3 and 4

shows that the genetic-based approach outperforms the

ranking approaches for cue phrase selection. It is obvious

from the difference of the fitness values, F(P). It is also

clear that the relevance values, Relev(P), of the genetic-

based approach are higher than the Relev(P) values of all

the ranking approaches. With regard to the complexity of

the selected cue phrases, L(P)/N, obviously the genetic-

based approach tends to select smaller number of phrases

than MI and OR ranking approaches, yet larger than IG and

v2.

From the above observations, it can be concluded that

the genetic-based approach manage to maintain a tradeoff

between the relevance and complexity, therefore achieve

higher fitness values. There are two reasons behind that.

First, in the genetic-based approach, the evaluation and the

selection processes are based on the fitness measure which

depend on the subsequent use of the selected phrases.

Conversely, in the ranking approach the evaluation of the

phrases is based on the intrinsic properties of the phrases,

whereas the selection depends on the fitness measure. The

second reason is the ability of the genetic-based approach

to account for the correlation between the selected cue

phrases. Unlike the ranking approaches, the genetic-based

approach evaluates the selected cue phrases as whole rather

than evaluating each phrase individually and assuming the

relevance of the phrases set is equal to the summation of

the individual relevance of each phrase, which leads to

redundant selection.

To confirm the above conclusion using statistical

inference tools, a paired t test of the statistical significance

of the difference between the F(P) values for both MI

ranking approach and genetic-based approach was con-

ducted at level P \ 0.05 and 5 degree of freedom. The

obtained t value (t = 3.1123) shows that the difference is

statistically significant.

4.4 Genetic-based approach with negative cue phrases

It has been pointed out earlier that for each DA, each

phrase has two sides, positive and negative and according

to that the phrase is classified either positive or negative

depending on the dominant side. An efficient way to

exploit negative phrases, which was described in [79], is to

select positive and negative phrases independently based

on their subsequent use. For cue phrase selection in DAR,

the positive phrases can be used to indicate the membership

of an instance to the target DA and the negative phrases can

be used to help in increasing the relevance of the positive

phrases by confidently rejecting instances which do not

belong to the target DA, yet contain the positive phrases.

For example, in SCHISMA dialogue corpus, the positive

phrase ‘‘ticket’’ is relevant for both the statement and

query-ref DAs. To increase the relevance of this cue phrase

for the statement DA, negative cue phrases such as ‘‘how

much’’ and ‘‘?’’, which are relevant to the query-ref DA,

yet not to the statement DA, might be selected and con-

juncted with the ‘‘ticket’’ to accept only the utterances that

contain ‘‘ticket’’ and does not contain ‘‘how much’’ and

‘‘?’’.

Unfortunately, the ranking approaches do not help in

exploiting the negative phrases efficiently. More specifi-

cally, one sided metrics rank the phrases according to their

positive side and ignore the selection of the negative

Table 4 Results of the genetic-based approach experiments

DA Relev(P) L(P)/N F(P)

Statement 0.8710 0.0322 0.8388

Query-if 0.9643 0.0037 0.9606

Query-ref 0.9033 0.0060 0.8973

Positive-answer 0.8525 0.0195 0.8330

Negative-answer 0.9692 0.0015 0.9677

No-blf 0.8114 0.0165 0.7950

Evol. Intel. (2009) 1:253–269 263

123

phrases. Moreover, two sided metrics rank phrases

according to the combination of their negative and positive

sides. To assess the ability of the genetic-based approach in

exploiting the negative cue phrases, we conducted this case

of experiments in which the aim is to select positive and

negative cue phrases to satisfy the following DNF

expression.

DNF ¼ if pp1 ^ !np1 ^ � � � ^ !npk1ð Þð_ � � � _ ppj ^ !npj ^ � � � ^ !npkj

� �

_ � � � _ ppm ^ !npm ^ � � � ^ !npkmð ÞÞ then DA

where ppj is a positive cue phrase and npj…npkj are neg-

ative phrases associated with ppi.

To account for the negative cue phrases, each phrase

occurs within the utterance that belongs to the target DA is

marked positive, and each phrase occurs within the utter-

ance that does not belong to the target DA is marked

negative. It could happen that some phrases occur in

utterances that are labeled with the target DA and in

utterances not labeled with the target DA, hence it might be

possible to find tow identical phrases marked negative and

positive. Table 5 summarises the results obtained from this

case of experiment. Figure 12 is a sample of GA evolution

for statement DA in this case of experiments.

It is obvious from the above results that there is an

improvement in the Relev(P) values at the expense of L(P)/

N. Definitely the inclusion of negative cue phrases is the

direct reason behind this improvement. The negative cue

phrases increase the search space by different values for

each DA as shown in Table 5, and consequently alleviates

the constraint imposed by the complexity subpart, L(P)/N,

of the fitness function on the number of the selected

phrases. As a result of this, more relevant phrases are

included in the selection and ultimately increases the

Relev(P) value. The test of the statistical significance of the

differences between Relev(P) values and L(P)/N values in

Tables 4 and 5 confirms the above conclusion. Using

paired t test at level p \ 0.05) with 5 degree of freedom,

the obtained t values, 3.2618 and 2.7288, show that they

are statistically significant. With regards to the fitness

values, F(P), the difference between the corresponding

values in Tables 4 and 5 is not conclusive. This is con-

firmed by the analysis of the statistical significance of the

difference between them using paired t test (at level

0.6

0.7

0.8

0.9

1

1 21 41 61 81 101 121 141 161 181

Last Gen. No =183

Informativeness 1 - Complexity Fitness Avrg. Pop. Fitness

Fig. 11 GA evolution of cue phrases selection for statement DA

Table 5 Results of the genetic-based approach with negative cue

phrases experiments

DA Phrase space

size

Relev(P) L(P)/N F(P)

Statement 2,000 0.8896 0.0360 0.8536

Query-if 1,594 0.9682 0.0088 0.9595

Query-ref 1,848 0.9174 0.0319 0.8855

Positive-answer 2,115 0.8696 0.0457 0.8239

Negative-answer 1,542 0.9673 0.0007 0.9665

No-blf 2,227 0.8324 0.0382 0.7943

264 Evol. Intel. (2009) 1:253–269

123

p \ 0.05 and 5 degree of freedom), which shows that they

are not statistically significant (the obtained t value is

0.4041). Recalling that F(P) is a combination of two con-

tradicted objectives, relevance and complexity, it is

obvious that the improvement of the Relev(P), which

results from the incorporation of the negative cue phrases is

affected by the increase of complexity of the selected cue

phrases.

On the other hands, a comparison between the results of

Tables 3 and 5 confirms the superiority of the genetic-

based approach over the ranking approaches. Despite the

slight increase of the L(P)/N values, they are still low if

compared with the corresponding values of the ranking

approaches. Definitely the ability of the genetic-based

approach to account for the correlation between the

selected phrases and the inclusion of negative phrases are

the direct reason behind the better achievement of the

genetic-based approach. This conclusion has been con-

firmed by the analysis of the statistical significance of the

difference between the corresponding values of F(P) in

Table 3 (MI) and Table 5 using paired t test at level

p \ 0.05 and 5 degree of freedom. The obtained t values

(t = 2.9127) shows that the difference is statistically

significant.

4.5 Genetic-based approach with cue’s positional

information

The information on phrase’s position within an utterance is

useful to increase its relevance for a given DA. We con-

ducted this case of experiment to investigate the ability of

the genetic-based approach to select cue phrases after

incorporating the phrase’s positional information. To do

that, each positive phrase in the phrases space was marked

with one of three possible positional labels, which repre-

sent the position of the phrase within the utterance. These

labels are: Begin, if the phrase occurs at the beginning of

any utterance labeled with the target DA, End, if the phrase

occurs at the end of any utterance labeled with the target

DA, and Contain, if the phrase occurs elsewhere. It might

happen that certain phrase occurs in different positions

within the utterance. In this case, multiple instances of this

phrase, each with different label, are created. Conse-

quently, each DA has a different size phrases space as

shown in Table 6. The genetic-based approach was applied

with the same parameters specified in the previous cases

and the results are shown in Table 6. Figure 13 is a sample

of GA evolution for statement DA.

It appears from the results in Table 6 that there is an

improvement in the fitness values, F(P), of the selected

0.5

0.6

0.7

0.8

0.9

1

201151101511

Last Gen. No. =224

Informativeness 1 - Complexity Fitness Avrg. Pop. Fitness

Fig. 12 GA evolution of cue phrases selection for statement

Table 6 Results of genetic-based approach with cue’s positional

information experiments

DA Phrase space size Relev(P) L(P)/N F(P)

Statement 2,329 0.9003 0.0369 0.8634

Query-if 1,625 0.9805 0.0086 0.9718

Query-ref 2,047 0.9262 0.0259 0.9003

Positive-answer 2,377 0.9008 0.0366 0.8642

Negative-answer 1,581 0.9888 0.0089 0.9799

No-blf 2,354 0.8447 0.0297 0.8149

Evol. Intel. (2009) 1:253–269 265

123

cues for each DA. This can be attributed to the improve-

ment of the corresponding Relev(P) values after the

incorporation of the cue’s positional information. In terms

of complexity, L(P)/N, there is a slight decrease in its

values for some DAs, however, there are cases where the

L(P)/N values are similar or even better. For instance, in

positive_answer DA, there is an obvious improvement in

both values, Relev(P) and L(P)/N. This is empirical evi-

dence on the ability of the genetic-based approach and on

the role of the positional information. The analysis of the

statistical significance of the difference between the fitness

values, F(P), of Tables 5 and 6 using paired t test at level

p \ 0.05 and 5 degree of freedom confirms this conclusion.

The obtained t value (t = 4.0410) shows that the difference

is very statistically significant.

Finally, to confirm the findings obtained from the four

experimental cases, a repeated measure ANOVA, with

Greenhouse-Geisser correction, was conducted to assess

whether there are differences between the fitness values,

F(P). Results indicated that the values are statistically

different, F (1.095, 5.475) = 10.580, p = 0.019, g2 = 0.68.

On the other hand, the polynomial contrasts test indicates

that there is a significant linear trend, F(1,5) = 12.597,

p = 0.016 \ 0.001, g2 = 0.72. However, this finding was

qualified by the significant cubic trend, F(1,5) = 13.729,

p = 0.0141, g2 = 0.73, reflecting the lower fitness of case

3 than case 2. Overall, there is a linear in the fitness values

from case 1 to case 4 reflecting the improvement of the

fitness values. However, case 2 has a somewhat higher

mean than case 3 producing the cubic trend.

4.6 Validation experiments

The aim of this case of experiments is to validate the use of

the proposed genetic-based approach for the ML applica-

tions. More specifically, the cues phrases generated from

the above experiments were used to build DBN model for

DAR. The hypothesis is that the better selection of cue

phrases, the more accurate DAR. First the sets of cue

phrases generated by the genetic-based approach in each of

the previous cases were used to specify the DBN random

variables as described earlier, so that each random variable

is a binary classifier for a single DA. Then DBNs ML

algorithms were used with 10-fold cross-validation to

construct the structure of the DBN model, assess its

parameter, and estimate its recognition accuracy using

probabilistic networks library [32], which is freely avail-

able from http://www.intel.com/research/mrl/pnl. The

same experiment was repeated using the sets of cue phrases

0.5

0.6

0.7

0.8

0.9

1

251201151101511

Last Gen. No. =275

Predictivity Aggressivity Fitness Avrg. Pop. Fitness

Fig. 13 GA evolution for cue phrases selection for Statement DA

Table 7 Accuracies of the DAR models

Selection approach Recognition

accuracy

Min. Max. Avrg.

Ranking approach (MI) 76.74 78.48 77.72

Genetic-based approach 76.89 79.19 78.34

Genetic-based with negative cues 77.72 79.89 78.48

Genetic-based with cue’s positional information 77.61 81.09 79.67

266 Evol. Intel. (2009) 1:253–269

123

generated by MI and the results of these experiments are

summarised in Table 7.

From the above results, it is clear that there are differ-

ences between the final recognition accuracies of the DBN

models of DAR. Although the differences between these

values are limited to 1 and 2 percentage points, they are

adequate to evidence the efficiency of the genetic-based

approach over the ranking approaches. This conclusion can

be interpreted by the fact that the effect of the feature

selection approach on the performance of ML model is

highly dependant on the interaction between feature

selection approach and the ML algorithm. This interaction

can be specified by the awareness of the feature selection

approach of what features are relevant to the ML algorithm

and how sensitive is the ML algorithm to the existence of

redundant features. Recalling that, both MI and genetic-

based approach are filter approaches, and one aspect of the

filter approach is that selection of the features is performed

based on the intrinsic properties of the data and indepen-

dently of the ML algorithm, the above results are expected.

In other words, because the selection of cue phrases was

performed independently of the DBN ML algorithm, it is

not expected that differences between the recognition

accuracies of DBN models will reflect the efficiency of the

different approaches accurately. In addition to that, as it has

pointed out before, the main advantage of genetic-based

approach over the ranking approach is their ability to

remove redundant features; however ML algorithms vary

on their sensitivity to the redundant features [13]. In the

above experiments it seems that DBN is not highly sensi-

tive to the redundant features.

Finally, to understand the influence of the cues phrase

selection approaches on the recognition accuracy, it should

be borne into mind that the construction of the DBNs

models of DAR is based on the binary representation of the

datasets, which result from the extraction of the random

variables’ values from the utterances. In this representation,

utterances that belong to a certain DA should have a distinct

pattern, ideally composed of n - 1 bit with 0s values and a

single bit with 1 value (n is the number of random variables)

that corresponds to the random variables of this DA. It is

obvious also that the quality of the representation depends

on the relevance of the selected cues phrases that form the

random variables. In other words, the better cues selection

approach, the better data representation, and consequently

the better constructed DBNs models.

5 Conclusion

In this paper, a genetic-based approach for cue phrases

selection in the context of DAR was introduced. The pro-

posed approach is a variable length GA with specialized

genetic operator developed specifically for this task. Sev-

eral cases of experiment were conducted and the obtained

results suggest a number of important conclusions. Firstly,

the results confirm that the ranking approaches are not the

optimal for cues phrases selection in DAR and similar high

dimensional domains. The selection in these approaches is

independent of the subsequent use and they are not able to

account for the correlation between the selected features.

Secondly, the results of the proposed genetic-based

approach evidences its ability to account for the correlation

between the candidate cues, which enables it to select a

minimal number of relevant phrases. It is apparent from the

high reduction of the number of the selected cues. Thirdly,

in contrast to the ranking approaches, the proposed genetic-

based approach shows its ability to exploit the negative

phrases to increase the relevance of the selected cue

phrases. Fourthly, the results confirm that the cue’s posi-

tional information is useful to improve the relevance of the

selected cue phrases. In general the proposed genetic-based

approach has proved its efficiency for the selection of

useful cue phrases for DAR. Finally, although the genetic-

based approach was applied for cue phrase selection, it can

be applied, similarly, for feature selection in any similar

high dimensional domains.

References

1. Ali A, Mahmod R, Ahmad F, Sullaiman N (2006) Dynamic

bayesian networks for intention recognition in conversational

agent. In: Proceedings of the 3rd international conference on arti-

ficial intelligence in engineering and technology (iCAiET2006).

Universiti Malaysia Sabah, Sabah, Malaysia

2. Allen J, Core M (1997) Draft of DAMSL: dialog act markup in

several layers. The Multiparty Discourse Group, University of

Rochester, Rochester, USA. Available from http://www.cs.

rochester.edu/research/cisd/resources/damsl

3. Araujo L (2002) Part-of-speech tagging with evolutionary algo-

rithms. In: Proceedings of the international conference on

intelligent text processing and computational linguistics, lecture

notes in computer science, vol 2276. Springer-Verlag, Berlin, pp

230–239

4. Austin JL (1962) How to do things with words. Oxford Univer-

sity Press, Oxford

5. Belz A, Eskikaya B (1998). A genetic algorithm for finite-state

automata induction with an application to phonotactics. In: Pro-

ceedings of ESSLLI-98 workshop on automated acquisition of

syntax and parsing

6. Blum A, Langley P (1997) Selection of relevant features and

examples in machine learning. Artif Intell 97:245–271

7. Bunt H (1994) Context and dialogue control. Think 3(1):19–31

8. Caballero RE, Estevez PA (1998) A niching genetic algorithm for

selecting features for neural network classifiers. In: Proceedings

of the 8th international conference of artificial neural networks.

Springer-Verlag, pp 311–316

9. Cantu-Paz E (2004) Feature subset selection, class separability,

and genetic algorithms. In: Proceedings of genetic and evolu-

tionary computation conference-GECCO 2004, Deb K, (ed) et al.

pp 959–970

Evol. Intel. (2009) 1:253–269 267

123

10. Chunkai K, Zhang HH (2005), An effective feature selection

scheme via genetic algorithm using mutual information. In:

Proceedings of 2nd international conference on fuzzy systems

and knowledge discovery, pp 73–80

11. Dash M, Liu H (1997) Feature selection for classification. Int

Data Anal Int J 1(3):131–156

12. Dash M, Liu H (2003) Consistency-based search in feature

selection. Artif Intell 151(1/2):155–176

13. David WA (1992) Tolerating noisy, irrelevant and novel attri-

butes in instance-based learning algorithms. Int J Man Mach Stud

36(1):267–287

14. Davidor Y (1991) A genetic algorithm applied to robot trajectory

generation. In: Davis L (ed) Handbook of genetic algorithms. Van

Nostrand Reinhold, pp 144–165

15. Davidor Y (1991) Genetic algorithms and robotics: a heuristic

strategy for optimisation, vol 1 of World Scientic Series in

robotics and automated systems World Scientific

16. Eads D, Hill D, Davis S, Perkins S, Ma J, Porter R, Theiler J

(2002) Genetic algorithms and support vector machines for time

series classification. In: Proceedings of 5th conference on the

application and science of neural networks, fuzzy systems and

evolutionary computation, symposium on optical science and

technology of the 2002 SPIE annual meeting, pp 74–85

17. Fatourechi M, Birch GE, Ward RK (2007) Application of a

hybrid wavelet feature selection method in the design of a self-

paced brain interface system. J Neuroeng Rehabil 4:11

18. Filho B (2000) Feature selection from huge feature sets in the

context of computer vision. Master’s thesis, Colorado State

University Fort Collins, Colorado

19. Fishel M (2007) Machine learning techniques in dialogue act

recognition. In: Proceedings of estonian papers in applied lin-

guistics 3, pp 117–134

20. Fleuret F (2004) Fast binary feature selection with conditional

mutual information. J Mach Learn Res 5:1531–1555

21. Fogel LJ, Owens AJ, Walsh MJ (1966) Artificial intelligence

through simulated evolution. Wiley, New York

22. Frohlich H, Chapelle O, Scholkopf B (2004) Feature selection for

support vector machines using genetic algorithms. Int J Artif

Intell Tools 13(4):791–800

23. Goldberg DE (1989) Genetic algorithms in search, optimisation,

and machine learning. Addison-Wesley, New York

24. Goldberg DE, Korb B, Deb K (1990) Messy genetic algorithms:

motivation, analysis, and first results. Complex Syst 3:493–530

25. Goldberg DE, Deb K, Korb B (1990) Messy genetic algorithms

revisited: studies in mixed size and scale. Complex Syst

4(4):415–444

26. Gonzalo V, Sanchez-Ferrero J, Arribas I (2007) A statistical-

genetic algorithm to select the most significant features in

mammograms. In: Proceedings of the 12th international confer-

ence on computer analysis of images and patterns, pp 189–196

27. Harvey I (1995) The artificial evolution of adaptive behaviour, D.

Phil. thesis, School of cognitive and computing sciences. Uni-

versity of Sussex

28. Haupt RL, Haupt SE (2004) Practical genetic algorithms, 2nd

edn. Wiley, New York

29. Hirschberg J, Litman D (1993) Empirical studies on the disam-

biguation of cue phrases. Comput Linguist 19(3):501–530

30. Holland JH (1975) Adaptation in natural and artificial systems.

The University of Michigan Press, Ann Arbor

31. Hong JH, Cho SB (2006) Efficient huge-scale feature selection with

speciated genetic algorithm. Pattern Recognit Lett 2(27):143–150

32. Intel Corporation (2004). Probabilistic network library—user

guide and reference manual

33. Jurafsky D (2004) Pragmatics and computational linguistics. In:

Horn L, Ward G (eds) The Handbook of pragmatics. Oxford,

Blackwell, pp 578–604

34. Jurafsky D, Shriberg E, Fox B, Traci C (1998) Lexical, prosodic,

and syntactic cues for dialog acts. In: Proceedings of ACL/coling

‘98 workshop on discourse relations and discourse markers,

Montreal, Quebec, Canada pp 114–120

35. Kats H (2006) Classification of user utterances in question

answering dialogues. Master’s thesis, University of Twente,

Netherlands

36. Kazakov D (1998) Genetic algorithms and MDL bias for word

segmentation. In: Proceeding of ESSLLI-97

37. Kelly JD, Davis L (1991) Hybridising the genetic algorithm and

the K nearest neighbors classification algorithm. In: ICGA pp

377–383

38. Kohavi R, John GH (1997) Wrappers for feature subset selection.

Artif Intell J 97(1/2):273–324

39. Koza JR (1992) Genetic programming: on the programming of

computers by means of natural selection. MIT Press, Cambridge

40. Lankhorst MM (1994) Automatic word categorisation with

genetic algorithms: computer science report CS-R9405. Univer-

sity of Groningen, The Netherlands

41. Lanzi P (1997) Fast feature selection with genetic algorithms: a

filter approach. In: Proceedings of IEEE international conference

on evolutionary computation, pp 537–540

42. Lesch S (2005) Classification of multidimensional dialogue acts

using maximum entropy. Diploma Thesis, Saarland University,

Postfach 151150, D-66041 Saarbrucken, Germany

43. Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene

selection for sample classification based on gene expression data:

study of sensitivity to choice of parameters of the GA/KNN

method. Bioinformatics 17:1131–1142

44. Liu H, Motoda H (1998) Feature selection for knowledge dis-

covery and data mining. Kluwer, Boston

45. Liu H, Yu L (2005) Toward integrating feature selection algo-

rithms for classification and clustering. IEEE Trans Knowl Data

Eng 17:491–502

46. Liu JJ, Cutler G, Li W, Pan Z, Peng S, Hoey T, Chen L, Ling X

(2005) Multiclass cancer classification and biomarker discovery

using GA-based algorithms. Bioinformatics 21:2691–2697

47. Liu W, Wang M, Zhong Y (1995) Selecting features with genetic

algorithm in handwritten digits recognition. In: Proceedings of

the international IEEE conference on evolutionary computation,

pp 396–399

48. Losee RM (1995) Learning syntactic rules and tags with genetic

algorithms for information retrieval and filtering: an empirical

basis for grammatical rules. Inf Process Manag 32:185–197

49. Lu J, Zhao T, Zhang Y (2008) Feature selection based-on genetic

algorithm for image annotation. Knowl Based Syst 21(8):887–891

50. Manning C, Schutze H (1999) Foundation of statistical natural

language processing. MIT Press, Cambridge

51. Mitchell M (1996) An introduction to genetic algorithms. MIT

Press, Cambridge

52. Morariu D, Vintan L, Tresp V, (2006) Evolutionary feature

selection for text documents using the SVM. In: Proceedings of

3rd international conference on machine learning and pattern

recognition (MLPR 2006), ISSN 1305–5313 vol 15, pp 215–221,

Barcelona

53. Moser A, Murty M. (2000) On the scalability of genetic algo-

rithms to very large-scale feature selection. EvoWorkshops, pp

77–86

54. Nettleton DJ, Gargliano R (1994). Evolutionary algorithms and

dialogue. In: Practical handbook of genetic algorithms. CRC

Press, New York

55. Oakes M (1997) Statistics for corpus linguistics. Edinburgh

University Press, Edinburgh

56. Ozdemir M, Embrechts MJ, Arciniegas F, Breneman CM, Lock-

wood L, Bennett KP (2001) Feature selection for in-silico drug

design using genetic algorithms and neural networks. IEEE

268 Evol. Intel. (2009) 1:253–269

123

mountain workshop on soft computing in industrial applications,

pp 53–57

57. Punch WF, Goodman ED, Pei M, Chia-Shun L, Hovland P, En-

body R (1993), Further research on feature selection and

classification using genetic algorithms. In: Proceedings of the

fifth international conference on genetic algorithms, champaign,

Ill: pp 557–564

58. Samuel K, Carberry S, Vijay-Shanker K (1999) Automatically

selecting useful phrases for dialogue act tagging, In: Proceedings

of PACLING ‘99 (fourth Conference of the Pacific Association

for Computational Linguistics). Waterloo, Ontario, Canada

59. Schutz M (1997) Other operators: gene duplication and deletion.

In: Back TH, Fogel DB, Michalewicz Z Hrsg., Handbook of

evolutionary computation C3.4:8–15. Oxford University Press,

New York, und Institute of Physics Publishing, Bristol

60. Searle JR (1975) A taxonomy of illocutionary acts. In: Gunderson

K, (eds), Language, mind and knowledge, Minnesota studies in the

philosophy of science. University of Minnesota Press 7:344–369

61. Sebastiani F (2002) Machine learning in automated text cate-

gorisation. ACM Comput Surv 34(1):1–47

62. Siedlecki W, Sklansky J (1988) On automatic feature selection.

Int J Patt Recognit Artif Intell 2(2):197–220

63. Silla CN, Pappa GL, Freitas AA, Kaestner CAA (2004) Auto-

matic text summarisation with genetic algorithm-based attribute

selection. In: IX IBERAMIA—Ibero-American conference on

artificial intelligence, Puebla

64. Smith SF (1980) A learning system based on genetic adaptive

algorithms. Ph. D. Thesis. University of Pittsburgh, PA, USA

65. Vafaie H, De Jong K (1995) Genetic algorithms as a tool for

restructuring feature space representations. In: Proceedings of the

seventh international conference on tools with artificial intelli-

gence. Henidon

66. Vafaie H, De Jong K (1992) Genetic algorithms as a tool for feature

selection in machine learning. In: Proceeding of the 4th interna-

tional conference on tools with artificial intelligence, Arlington

67. Van de Burgt SP, Schaake J, Nijholt A (1995) Language analysis

for dialogue management in theatre information and booking

system, language engineering, AI 95, 15th international confer-

ence, Montpellier, pp 351–362

68. Verbree AT, Rienks RJ, Heylen DKJ (2006) Dialogue act tagging

using smart feature selection: results on multiple corpora. In:

Proceedings of the first international IEEE workshop on spoken

language technology SLT, pp 10–13

69. Vose MD (1999) The simple genetic algorithm: foundation and

theory. MIT Press, Cambridge

70. Webb N, Hepple M, Wilks Y (2005) Dialogue act classification

based on Intra-utterance features’’. In: Proceedings of the AAAI 05

71. Webb N, Hepple M, Wilks Y (2005) Empirical determination of

thresholds for optimal dialogue act classification. In: Proceeding of

the ninth workshop on the semantics and pragmatics of dialogue

72. William H (2004) Genetic wrappers for feature selection in

decision tree induction and variable ordering in Bayesian network

structure learning. Inf Sci Int J 163(1–3):103–122

73. Wilson, GC, Heywood MI (2005) Use of a genetic algorithm in

brill’s transformation-based part-of-speech tagger. In: Proceed-

ings of the genetic and evolutionary computation conference

(GECCO 2005), June 25–29, 2005, Washington, DC, USA, ACM

Press, ISBN 1-59593-010-8, pp 2067–2073

74. Yang J, Honavar V (1998) Feature subset selection using a

genetic algorithm. IEEE Intell Syst 13(2):44–49

75. Yu E, Cho S (2003) GA-SVM wrapper approach for feature

subset selection in keystroke dynamics identity verification. In:

Proceedings of the IEEE international joint conference on neural

networks 3:2253–2257

76. Zebulum RS, Pacheco MA, Vellasco M (2000) Variable length

representation in evolutionary electronics. Evol Comput J

8(1):93–120

77. Zhang L, Wang J, Zhao Y, Yang Z (2003) A novel hybrid feature

selection algorithm: using Relief estimation for GA-wrapper

search. In: Proceedings of IEEE international conference on

machine learning and cybernetics

78. Zhang P, Verma B, Kumar K (2004) A neural-genetic algorithm

for feature selection and breast abnormality classification in

digital mammography. In: Proceedings of IEEE international

joint conference on neural networks, vol 3, pp 2303–2308

79. Zheng Z, Wu X, Srihari R (2004) Feature selection for text cat-

egorization on imbalanced data. SIGKDD 6(1):80–89

80. Zhuo L, Zheng J,Wang F, Li X, Ai B, Qian J, (2008) A genetic

algorithm based wrapper feature selection method for classifica-

tion of hyperspectral images using support vector machine. The

international archives of the photogrammetry, remote sensing and

spatial information sciences. vol XXXVII. Part B7. Beijing

Evol. Intel. (2009) 1:253–269 269

123