PISA: A framework for multiagent classification using argumentation

24
PISA: A framework for multiagent classication using argumentation Maya Wardeh , Frans Coenen, Trevor Bench Capon Department of Computer Science, The University of Liverpool, Liverpool L69 3BX, UK article info abstract Article history: Received 28 June 2010 Received in revised form 12 March 2012 Accepted 13 March 2012 Available online 21 March 2012 This paper describes an approach to multi-agent classification using an argumentation from experience paradigm whereby individual agents argue for a given example to be classified with a particular label according to their local data. Arguments are expressed in the form of classification rules which are generated dynamically. As such each local database can be con- ceptualised as an experience repository; and the individual classification rules, generated from this repository, as describing generalisations drawn from this experience. The argumen- tation process and the supporting data structures are fully described. The process has been implemented in the PISA (Pooling Information from Several Agents) multi-agent framework which is fully described. Experiments indicate that the operation of PISA is comparable with other classification approaches and that, when operating groups or in the presence of noise, PISA outperforms such comparable approaches. © 2012 Elsevier B.V. All rights reserved. Keywords: Classification Argumentation Multiagent (data mining) Classification Association Rules 1. Introduction Argumentation is concerned with the dialogical reasoning processes required to arrive at a conclusion given two or more alter- native viewpoints. Argumentation is an increasingly influential reasoning mechanism [13], particularly in the context of Multia- gent Systems (MAS). One area where argumentation has attracted recent attention is in machine learning (e.g. [38,44]). Machine learning algorithms allow for the detection and extraction of interesting data patterns with respect to a variety of problems; yet most of these algorithms provide an output based on quantitative evidence so that the inference process which led to this output is often opaque [29]. By integrating argumentation with existing machine learning techniques the inference model for the latter can be made explicit. The process of Multiagent Argumentation can be conceptualised as a discussion, about some issue that requires a solution, between a group of software agents with different points of view; where each agent attempts to persuade the others that its point of view, and the consequent solution, is the correct one. The discussion can be conducted using a variety of argumentation schemes. Particular schemes and the characteristic attacks relevant to a specific task are often, as here, embod- ied in an agent dialogue protocol. Computer automation and modelling of the argumentation process have found applications in legal reasoning [30], e-democracy [14], decision support [8] and so on. In this paper the authors investigate the possibility of applying the argumentation process to facilitate the classification of unseen data instances. One particular model of argumentation, Arguing from Experience [57,56], is well suited to the classification task. Much work on argumentation has been related to the deductive paradigm, where axioms are used to license the move from premises to conclusion in a series of truth preserving steps. Argumentation differs from deduction, however, in that it is inherent- ly defeasible, and the rules used to derive each next step are typically defeasible, and defeasible inference is not transitive. Thus the steps of an argument are typically not guaranteed to be truth preserving, but may be attacked in a variety of ways. See [49] for a recent comprehensive account of this style of argumentation. Such argumentation does, however, rely on the existence of a theory which supplies the defeasible rules to be used to move from premises to conclusions. While this is natural in some domains, especially those which rely heavily on definitions, or on concepts capable of sharp delineation in terms of necessary and sufficient Data & Knowledge Engineering 75 (2012) 3457 Corresponding author. E-mail addresses: [email protected] (M. Wardeh), [email protected] (F. Coenen), [email protected] (T.B. Capon). 0169-023X/$ see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.datak.2012.03.001 Contents lists available at SciVerse ScienceDirect Data & Knowledge Engineering journal homepage: www.elsevier.com/locate/datak

Transcript of PISA: A framework for multiagent classification using argumentation

PISA: A framework for multiagent classification using argumentation

Maya Wardeh⁎, Frans Coenen, Trevor Bench CaponDepartment of Computer Science, The University of Liverpool, Liverpool L69 3BX, UK

a r t i c l e i n f o a b s t r a c t

Article history:Received 28 June 2010Received in revised form 12 March 2012Accepted 13 March 2012Available online 21 March 2012

This paper describes an approach to multi-agent classification using an argumentation fromexperience paradigm whereby individual agents argue for a given example to be classifiedwith a particular label according to their local data. Arguments are expressed in the form ofclassification rules which are generated dynamically. As such each local database can be con-ceptualised as an experience repository; and the individual classification rules, generatedfrom this repository, as describing generalisations drawn from this experience. The argumen-tation process and the supporting data structures are fully described. The process has beenimplemented in the PISA (Pooling Information from Several Agents) multi-agent frameworkwhich is fully described. Experiments indicate that the operation of PISA is comparable withother classification approaches and that, when operating groups or in the presence of noise,PISA outperforms such comparable approaches.

© 2012 Elsevier B.V. All rights reserved.

Keywords:ClassificationArgumentationMultiagent (data mining)Classification Association Rules

1. Introduction

Argumentation is concerned with the dialogical reasoning processes required to arrive at a conclusion given two or more alter-native viewpoints. Argumentation is an increasingly influential reasoning mechanism [13], particularly in the context of Multia-gent Systems (MAS). One area where argumentation has attracted recent attention is in machine learning (e.g. [38,44]). Machinelearning algorithms allow for the detection and extraction of interesting data patterns with respect to a variety of problems; yetmost of these algorithms provide an output based on quantitative evidence so that the inference process which led to this outputis often opaque [29]. By integrating argumentation with existing machine learning techniques the inference model for the lattercan be made explicit. The process of Multiagent Argumentation can be conceptualised as a discussion, about some issue thatrequires a solution, between a group of software agents with different points of view; where each agent attempts to persuadethe others that its point of view, and the consequent solution, is the correct one. The discussion can be conducted using a varietyof argumentation schemes. Particular schemes and the characteristic attacks relevant to a specific task are often, as here, embod-ied in an agent dialogue protocol. Computer automation and modelling of the argumentation process have found applications inlegal reasoning [30], e-democracy [14], decision support [8] and so on. In this paper the authors investigate the possibility ofapplying the argumentation process to facilitate the classification of unseen data instances. One particularmodel of argumentation,Arguing from Experience [57,56], is well suited to the classification task.

Much work on argumentation has been related to the deductive paradigm, where axioms are used to license the move frompremises to conclusion in a series of truth preserving steps. Argumentation differs from deduction, however, in that it is inherent-ly defeasible, and the rules used to derive each next step are typically defeasible, and defeasible inference is not transitive. Thusthe steps of an argument are typically not guaranteed to be truth preserving, but may be attacked in a variety of ways. See [49] fora recent comprehensive account of this style of argumentation. Such argumentation does, however, rely on the existence of atheorywhich supplies the defeasible rules to be used tomove from premises to conclusions. While this is natural in some domains,especially those which rely heavily on definitions, or on concepts capable of sharp delineation in terms of necessary and sufficient

Data & Knowledge Engineering 75 (2012) 34–57

⁎ Corresponding author.E-mail addresses: [email protected] (M. Wardeh), [email protected] (F. Coenen), [email protected] (T.B. Capon).

0169-023X/$ – see front matter © 2012 Elsevier B.V. All rights reserved.doi:10.1016/j.datak.2012.03.001

Contents lists available at SciVerse ScienceDirect

Data & Knowledge Engineering

j ourna l homepage: www.e lsev ie r .com/ locate /datak

conditions, it is less suitable for many other domains in which classification is a central task. For example in legal domains legalconcepts tend to be open-textured [60], and so no necessary and sufficient condition are, even in principle, available. In other do-mains there are necessary and sufficient conditions, but these cannot reasonably be used in classification. This can be illustrated ifwe consider that there is a definitive test for whether a mushroom is poisonous or not, but we would not wish to use it before wehad classified the fungus! Such concepts are usually taught not through definitions, but ostensively, through showing the learner anumber of examples.

For such concepts argumentation based on a series of rules is inappropriate: while we may form a theory based on the exam-ples, the examples themselves retain primary authority and any theory is just a convenience, and a matter of questionable inter-pretation. There are many other forms of argumentation: a recent book on argumentation schemes [61] lists well over 50 distinctstereotypical patterns of argumentation. Many of these patterns do not involve a chain of reasoning, but step immediately from aparticular set of circumstances (e.g.most egg laying creatures are birds) to a presumptive conclusion (e.g. this egg laying creature isa bird). Arguing from experience is such a pattern:

1. In my experience, most things with features F are of kind K.2. This thing of unknown kind has features F.3. Thus, I have reason to think that this thing is of kind K.

Like any argument based on a presumptive argumentation scheme, such an argument is defeasible: in particular we need toconsider the strength of the reasons, and the possible existence of reasons to think that the thing is not of kind K. One muchexplored instance of this type of reasoning is Case Based Reasoning (CBR) applied to Law. As noted above, legal concepts oftenare open textured and so cannot be defined in terms of necessary and sufficient condition. Legal argument, in common law juris-dictions, takes the form of the presentation of cases similar to a current case, and taking the features in common as reasons todecide the current case in the same way as the precedent (e.g. [6]). This style of reasoning was established by the HYPO system[7] in terms of an adversarial three ply exchange.

1. Proponent cites a case supporting his side.2. Opponent points to features distinguishing the current cases, or supplies counter examples supporting her side.3. Proponent cites a stronger case or distinguishes the current case.

The dialogue moves used for our proposed argumentation from experience schema are closely related to this style of argu-ment. A more formal account of reasoning with legal cases was given in [12]. There the reasoning was characterised in termsof features of cases providing reasons to decide for one party or the other. This emphasised the single step nature of the process:the left hand side of the rules used in [12] are features and the right hand side is the party that should win on the basis of thosefeatures. Consideration of [12] also shows a difference between legal reasoning with cases and our arguing from experience. Legalreasoning typically concerns a small number of precedent cases, perhaps even a single case, and conflicts between reasons areresolved through a consideration of value preferences as shown in past decisions. For a recent attempt to capture the logic ofsuch reasoning see [33]. In arguing from experience, the authority of a reason is given by its evidential backing in experience,what we will call, using the data mining terms, its support and confidence.1 Thus the quantity, as well as the quality, of theexamples must be considered.

Wewould suggest therefore that arguing from experience is particularly suited to classification problems where necessary andsufficient conditions do not exist, or are unknown, or are inapplicable for any reason. Such classification problems often relate toconcepts introduced by ostensive definition, and it is well known that ostensive definition can give rise to misunderstandings andmisconceptions (e.g. [62]), because it is based on a necessarily partial set of examples. Therefore it can be helpful to have availablea range of different experiences, which can then act as correctives for deficiencies in the experience of particular agents. Suchexperiences can then be shared to reach consensus.We therefore contend thatmulti-agent arguing from experience is a promisingand natural way to approach classification of such concepts.

The proposed model allows a number of agents to draw directly from past examples to find reasons for coming to a decisionabout the classification of an instance case from some domain. Agents formulate arguments, for or against a classification, from abackground dataset of past examples which is mined as required. Each agent's database can be considered to encapsulate thatagent's experience. The exchange of arguments between agents represents a dialogue which continues until an agent poses anargument for a particular classification that no other agent can deny. The overall arguing from experience process is directedas producing a reasoned consensus obtained through argumentation (dialogue), as opposed to some other mechanism such as(say) voting [10]. It is suggested that this dialogue process increases the acceptability of the outcome to all parties. This classifi-cation by argumentation concept has been built into an argumentation framework called PISA (Pooling Information from Several

1 Note that confidence is not a subjective term, but one which is precisely defined in terms of the source dataset. Thus when we say that the association with themost confidence is preferred, we are adjudicating on the basis of which association is objectively better justified by the experience of the agent, and which asso-ciation can command a sufficient level of justification from the experiences of all the agents concerned.

35M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

Agents) implemented. The promoted Argumentation-Based Classification is an interestingMultiagent Classification technique thatoffers a number of practical advantages:

1. It enjoys general applicability as it does not require the generation of specialised knowledge or case bases.2. It employs an automatic rule generation process using a tried and tested data mining technique and consequently avoids the

need for reference to a domain expert.3. It generates knowledge on the fly according to the requirements of the participant in question; it is able to do this because it

operates directly with the raw case data.4. It avoids the need for any knowledge re-engineering, again because it works directly with the case data.5. It provides an easy-to-understand explanation, in the form of dialogues, concerning a particular classification.6. It is particularly good at handling noisy data.

In addition the approach provides for a natural representation of the participants experience as a set of records, and the argu-ments as Classification Association Rules (CARs). The advocated approach also preserves the privacy of the information that eachparticipant knows, therefore it can be used in domains which involve sensitive data. Typical applications where arguing fromexperienced based multi-agent classification may be applied including: (i) welfare benefits monitoring, (ii) cross boarder policingand (iii) academic adjudication. Such applications typically operate using geographically distributed data sets which, for a varietyof reasons, cannot be readily combined into a single “data warehouse”.

The application of multi-agent classification based on argumentation requires the resolution of two high level issues:

1. The nature of the framework that will allow the envisioned argumentation from experience process whereby a collection ofagents can reach an agreement about the classification of cases in some domain.

2. The required algorithms and data structure that these agents will need to adopt to be able to engage in this process with regardto classification.

These issues are all addressed by the PISA framework which forms the focus of the remainder of this paper.The rest of this paper is organised as follows. Section 2 gives an overview of the concept of Classification Association Rule

Mining (CARM). Section 3 provides an overview of the basic structure of the PISA framework. Section 4 then reviews the strate-gies open to PISA agents; followed, in Section 5, by a review the role of two central PISA artefacts, namely the Chair Person Agent(CPA) and the argumentation tree. Section 6 gives an overview of the PISA dialogue protocol and includes discussion on arguing ingroups, and then Section 7 details the nature of the CARM algorithms used to support the proposed dialogue. Section 8 provides a“real-life” example of the application of PISA to the classification problem. In Section 9 a detailed analysis of PISA is presented,together with a comparison of the operation of PISA with respect to other classification paradigms using both clean and noisydata. Section 10 then discusses some related previous work, and finally we conclude with a summary of the main findings iden-tified from the reported investigation and some suggested future directions.

2. Classification Association Rule Mining

Central to the operation of PISA is the concept of Classification Association Rule Mining (CARM), which is a special form ofAssociation RuleMining (ARM). ARM is awell establish techniquewithin the realm of datamining. The basic idea is to find interestingpatterns in a binary valued tabular data set D={d1, d2,…, dn} where each record di (0b i≤n) comprises some subset of a global set ofattributes A={a1, a2,…, am}. Subsets of A are referred to as itemsets. The patterns are described in terms of Association Rules (ARs) ofthe form P→Q where P⊂A, Q⊂A and P∩Q=∅. An AR is deemed to be interesting if its confidence is greater than a user suppliedthreshold μ. The confidence value for an AR is calculated in terms of “support”. The support for an itemset P, supp(P), is the numberof records in which P occurs with respect to the total number of records in D expressed as a percentage:

supp Pð Þ ¼ occurrence count of Pn

� 100:

The confidence of a rule is the given by:

conf P→Qð Þ ¼ supp P∪Qð Þsupp Pð Þ � 100:

(A discussion on the use of support and confidence thresholds in CARM can be found in [21], further discussion on CARM per-formance metrics can be found in [53].)

The standard mechanism for determining all the ARs in a data set D is to first identify all the itemsets that are contained in D,and then to generate the desired ARs. Given an attribute set A there are 2|A|−1 possible combinations (excluding the empty set).Clearly the number of combinations to be considered needs to be limited in some way. This is typically done by introducing asupport threshold, σ, to identify frequent itemsets, an itemset P is deemed to be frequent if supp(P)≥σ (we say the itemset issupported). Thus if an itemset is not supported (infrequent) none of its supersets will be supported (this is known as the downward

36 M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

closure property of itemsets). This property can be used to limit the search space as adopted in the classic Apriori ARM algorithm [2].Apriori operates using a generate-test-prune loop. We commence with the K=1 candidate itemsets (where K is the number of attri-butes in an itemset), calculate the support for each K itemset and then prune those whose support is less than σ. From the supportedK-itemsets we identify the K+1 itemsets and continue in this manner till there are no more candidate itemsets. Once all the frequentitemsets have been identified we can generate the desired ARs. The computationally expensive part of ARM is the identification of thefrequent itemsets. The complexity of ARM algorithms, such as the Apriori algorithm, is exponential with respect to |A|. ARM is welldocumented (see for example [9]).

Classification rules are ruleswhose consequent is a class label; such rules are used in some types of classifier to categorise “unseen”data. Classification rules can be generated using a number ofmechanisms such as decision trees and inductive rule learning. They canalso be generated using ARM techniques. In this case the records in D are usually organised so that the last attribute in each record is aclass attribute taken from a set of classes C, thus a record di has the form {a1,…, a|di|−1, c|di|}, where {a1,…, a|di|−1}⊂ I⊂A and c|di|∈C(I∪C=A and I∩C=∅). The knowledge that the consequent of the ARs of interest must include a class label can then be used to fur-ther limit the search space. The process of using ARM techniques to generating classification rules is called Classification AssociationRule Mining (CARM) and the generated rules Classification Association Rules (CARs).

There are many CARM techniques that have been proposed in the literature (see for example [52]), that used to support theproposed PISA arguing from experience framework is founded on TFPC [20]. TFPC uses two tree structures, a Partial support tree(P-tree) and a Total support tree (T-tree). The P-tree results from a single pass of the dataset during which a partial itemset countis generated. Each node in the P-tree represents a leading sub-set that appears in the input data. The number of nodes in the P-treeis typically much fewer than |D|. In the best case it will be one (where every record in D is identical), in the worst case it will be |D|(where every record in D is unique, i.e. the intersection of all the records is the empty set), both extremes are very unlikely. Thelikelihood of common leading subsets increases as |D| increases. The P-tree is therefore well suited to large data sets.

The T-tree is a reverse set enumeration tree structured intended to hold all the frequently occurring itemsets (one itemset pernode) in a given input dataset. The nodes at the same level of any sub-branch are organised in one dimensional arrays. In otherwords the T-tree is a trie. This enables direct indexing according to attribute number; therefore computational efficiency gains areachieved over other prefix tree structures (such as FP-Trees [32]). The nodes are also organised using a reverse lexicographicordering. This offers the advantage, with respect to PISA, that the each class attribute is the root of the sub-tree containing allfrequent sets associated with that class. Consequently CARs for a particular class label can be generated by concentrating only onthis branch.2 The T-tree is constructed in an Apriori manner. The time complexity of this generation approximates to O(|Ptree|×|S|),where |Ptree| is the number of nodes in the P-tree and S is the set of candidate itemsets that are required to be generated to identifythe complete set of support frequent itemsets (remember that by using the downward closure property we do not have to generateall combinations of A). The makeup of S is very much dependent on the nature of D. Given an equally distributed dataset, then themaximum size (Kmax) of a frequent itemsets will be equivalent to ⌊logP×σ⌋ where P is the average probability that an item existsin D. In which case to identify the set of frequent itemsets the maximum size of an itemset in Swill beKmax+1. In which case:

Sj j ¼Xn¼kmaxþ1

n¼1

∏ibni¼0 A−ið Þ

n!:

However, this is a gross simplification as it is highly unlikely that individual items will be evenly distributed. Further details onthe nature of T-trees and P-trees can be found in [22].

3. The PISA framework

In this section the PISA framework is described. Throughout this section the reader may find it helpful to refer to the symbol tablepresented in Table 1. The framework is designed to allow any number of software agents to engage in a dialogue process, the aim ofwhich is to classify an unseen instance from some domain. Each PISA agent is responsible for advocating a particular class. Agents for-mulate arguments, for their own advocated classification or against the classification advocated by other agents, presented as CARs,generated from local datasets of past examples which are mined as required. For this reason PISA agents require both “positive” and“negative” examples in their data repository. The authors refer to the overall process as arguing from experience in the sense that eachagent's database can be considered to encapsulate that agent's experience. More formally the PISA framework can be described asfollows:

PISA ¼ AM;G;D;φ;σ ; μ;CPAð Þ ð1Þ

where: (i) AM (Argumentation Mechanism) is some argumentation from experience mechanism, (ii) G is a set participant agents(players), (iii) D is the collected experience of the participants divided into |G| mutually exclusive subsets (D={E1, E2, …, E|G|})describing each participants experience (the records are of the form described in Section 2 such that the last attribute is alwaysa class attribute), (iv) φ is the case to be considered (φ={ai, …, ar}⊂A), (v) σ is the global support threshold, (vi) μ is the globalconfidence threshold, and (vii) CPA is the “Chair Person Agent” (the role of the CPA is described in more detail in Section 5).

2 Although confidence value calculation will necessitate reference to other parts of the T-tree.

37M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

The AM comprises:

AM ¼ R; τ; Stratð Þ ð2Þ

where: (i) R a set CAR generation algorithms (see Section 7 for further detail), (ii) τ is some local confidence threshold (specifiedby individual agents) such that τ≥μ, and (iii) Strat is a set of argumentation from experience strategies. A full discussion of theargumentation strategies available to PISA is presented in [58]; the strategies available to PISA, in its simplest form, can besummarised as follows:

Strat ¼ aggressive; defensivef g: ð3Þ

The first is an aggressive (attacking) strategy where participant agents aim to win the game by favouring rules that supporttheir own advocated class. The second is a defensive strategy where players try to win by favouring rules that undermine theiradversaries' arguments. Alternative strategies, not discussed further in this paper, include the strategy where participant agents“play a waiting game” and allow other agents to first argue amongst themselves. PISA strategy, in the context of this paper, isdiscussed in more detail in Section 4 below.

The set G comprises the set of participant agents, in its simplest form PISA requires |G|=|C|. Each participant agent gi has thefollowing form:

gi ¼ Ei; strati; cið Þ ð4Þ

where: (i) Ei is the agents local data set (∀Ei, Ej∈D|Ei∩Ej=∅), (ii) strati is the arguing from experience strategy adopted by theagent (strati∈Strat) and (iii) ci is the class advocated by the agent (ci∈C). Note that Ei contains examples featuring all the classesin C, because the agent needs to be able to argue against the classes advocated by other agents as well as its own advocated class.Participant agents produce reasons for and against classifications by mining CARs from their local dataset Ei using algorithmsselected from R. The antecedent of every CAR represents a set of reasons for believing the consequent; which will include aclass attribute. In other words given a CAR, P→Q where c∈Q, this should be read as: P are reasons to believe that the case shouldclassified as c, as long as the additional attributes in Q also exist in the case.

The exchange of arguments between agents represents a dialogue which continues until one agent poses an argument for aparticular classification that no other agent can refute. The dialogue is mediated by the CPA (Chair Person Agent), a neutralagent that does not advocate any particular class. For turn taking, a structure with rounds is adopted, rather than a linear structurewhere a given agent is selected as the next speaker (such as in the case of the turn taking protocol described in [11] where thecurrent “speaker” chooses who will speak next). In each round, any agent which can make a legal move may do so. There is nolimitation on the number of agents that can participate in any round. However, to simplify the process, each participant is limitedto one move per round. The dialogue terminates when no participant makes a contribution for two rounds,3 or after some limitingnumber of rounds have been played (see Section 5 for further discussion); the termination of the dialogue is thus guaranteed.Cycles are avoided by preventing participant agents from “playing” the same rule twice (in fact there is no need to play a ruletwice, a rule that is played remains “in force” until it is directly or indirectly defeated).

4. Strategy

Each participant agent can employ one of the six different types ofmove to generate arguments. The six available moves are cate-gorised under three headings: (i) ProposingMoves, (ii) Attacking (Aggressive)Moves and (iii) Refining (Defensive)Moves as follows:

Proposing Moves. There are two kinds of proposing moves:1. Propose Rule: Allows a new CAR (argument), with a confidence higher than μ, to be proposed. This move requires the

participant agent to mine CARs that support its advocated classification. All PISA dialogues commence with an initial ProposeRule move.

Table 1Symbol table.

Symb. Definition Symb. Definition

D A data set comprising n records σ A predefined global support thresholdEi The experience of agent i (Ei⊂D) μ A predefined global confidence thresholdA A set of m attributes that feature in D C The set of class attributessupp(X) The occurrence count for an itemset X G The set of participant agents, G={g1, g2, …}conf(X) The confidence value for an itemset X τ Some local confidence threshold specific to an agent (τ>μ)

3 Although not detailed in this paper, as already noted, an agent may adopt a waiting strategy; where this is the case such agents should be given an oppor-tunity to pose an argument if the alternative is that the game will end.

38 M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

2. Counter Rule: Similar to the Propose Rule move but used to cite an alternative classification with higher confidence thanthose currently under consideration (as defined by a local confidence threshold τ).

Attacking Moves. Moves intended to show that a CAR (argument) proposed by some other agent should not be considereddecisive with respect to the current case. Three sub-types of attacking moves are available: (i) Distinguish, (ii) UnwantedConsequence, and (iii) Counter Rule, as follows:

3. Distinguish: Move that allows an agent to add some new attributes (premises) to a previously proposed CAR so that theconfidence of the new rule is lower than μ, thus rendering the original classification inadmissible.

4. Unwanted Consequent: Move that allows an agent to demonstrate that a CAR previously proposed by some other agentdoes not apply in the current case as it has attributes not included in the case in its conclusion. Consequently the conclusion(classification) of this CAR is erroneous.

Refining Moves. Moves that enable a CAR (argument), proposed by the current agent, to be refined to meet a “counter attack”:5. Increase Confidence: Move that allows the addition of one or more attribute(s) to the premise associated with a previously

proposed CAR so as to increase the confidence of the rule, thus increasing the confidence that the case should be classified asproposed by the CAR.

6. Withdraw Unwanted Consequent: Move to generate a rule that omits Unwanted Consequents from a CAR previouslyproposed by another agent, while maintaining a certain local level of confidence τ. In other words, demonstrating that itis still possible to classify the case as proposed by this CAR without the need to associate the class attribute with the addi-tional attributes that do not match the case under consideration.

At first glance it would appear that moves 5 and 3 contradict one another, however it easily possible to both increase anddecrease the confidence of a CAR by adding different attributes to its premise. If we consider the dataset D={{a, z}, {a, c, x}, {b, c, t},{b, c, d, z}, {b, d, y}} (where I={a, b, c, d} and C={x, y, z}) and a rule r11

=a→x then conf(ri)=1/2=50%. If we now add c to the pre-mise to give r12

=a, c→xthe confidence value will be increased to 1/1=100%. Now, considering the same data set again and the ruleR21

=b→y the confidence value will be 2/3=67%; if we now add c to give r22=b, cowy the confidence value will be decreased to 1/

2=50%. Thus the same mechanism has been used to both increase and decrease the confidence value.Each of the above six moves has a set of legal next moves, these are itemised in Table 2. Note that move 5 (Increase Confi-

dence) is only “played” if a Distinguish move has been played, and move 6 (Withdraw Unwanted Consequents) only if anUnwanted Consequent move has been played. Section 8 provides an example dialogue using the above moves, and illustrateshow they can be utilised to reach an agreement about the classification of a given case.

From the previous section (and as noted in Section 4) PISA, at its simplest, provides support for two argumentation strategies,aggressive and defensive. The Distinguish and Unwanted Consequent moves are associated with the aggressive (attacking) strategywhere a participant agent continuously attempts to undermine an opponents arguments (and thus indirectly advocate their ownclass). The Increase Confidence and Withdraw Unwanted Consequent moves are associated with the defensive (refining) strategywhere a participant agent continuously attempts to find arguments to support its own advocated class. Note that in both cases anagent that cannot propose any other move will attempt to play a Counter Rule move. Participant agents select a player to direct amove at according to their strategy and according to which other player presents the greatest threat as defined by the confidenceassociated with the current “winning” rule. Information concerning the latter is obtained from a data storage artefact called the argu-mentation tree which is described in the following section.

5. The CPA and the argumentation tree

As already noted above the CPA is a neutral agent which administers a variety of tasks aimed at facilitating the desired multi-party arguing from experience dialogues. The function of the chairperson agent resembles that of the mediator artefact suggestedin [41]. The CPA has the following specific responsibilities:

• Starting a dialogue.• Terminating a dialogue when a termination condition is satisfied.• Announcing the resulting classification for the given case (once the dialogue has terminated).

Note that in this model, the chairperson is not a referee, of the form described in [34,35], but more a mentor facilitating thedialogue.

Table 2Legal next moves in PISA.

Move Label Next move

1 Propose Rule 2, 3, 42 Counter Rule 1, 3,43 Distinguish 1, 4, 54 Unwanted Consequent 1, 65 Increase Conf 2, 3, 46 Withdraw Unwanted Consequent 2, 3, 4

39M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

The CARs exchanged, via the moves described in Section 4, are stored in a central data structure, called the argumentation tree,which is maintained by the Chairperson. This is defined as:

Ψ ¼ V ; E; LV ; LE;λð Þ ð5Þ

where: (i) V is a set vertices such that |V|≤ηwhere η is a predefined threshold, (ii) E is a set of directed edges (E⊂V×V), (iii) Lv isa set of vertex labels, (iv) LE is a set of edge labels and λ is a label function that defines the mappings V→LV and E→LE. The set ofvertex labels, Lv, is defined as:

Lv ¼ CAR;Colourð Þ ð6Þ

where CAR is an argument for experience and Colour is a set colour codings, Colour={red, blue, green, purple}, which have thefollowing interpretation:

Green: New CAR.Blue: New CAR that undermines an existing CAR.Red: Existing CAR that is under direct attack.Purple: Existing CAR that is under indirect attack.

The colour coding for nodes is updated at the end of each round as shown in Table 3. Nodes are connected by two types of link:explicit links representing direct attacks, and implicit links representing t indirect attacks. Referring to Table 2, moves 1, 2, 5 and 6are indirect attacks in that they implicitly attack all other 1, 2, 5 and 6 moves played by other participants which have contentwith lower confidence; while moves 3 and4 are direct attacks. The set of edge labels, LE, is thus defined as:

LE ¼ →;↪f g ð7Þ

where → indicates a direct attack, and ↪ an indirect attack.The argumentation tree has a role not unlike the dialogue mediating artefact described in [42]. All participant agents in the set

G have access to the tree and use it to influence their choice of next move. The moves available to agents at a given turn are thosethat change the colouring of the argumentation tree. For example it would not make sense for an agent to attack red nodes asthese are nodes representing CARs that are already defeated; however, it does make sense for participants to attack purplenodes as direct attacks against purple nodes will change their colouring to red. Agents select nodes to direct their next move ataccording to their strategy (aggressive or defensive). If there are a number of competing nodes(given an agents strategy) thata move may be directed at the node that presents the greatest threat is selected (i.e. the node representing the rule with the highestconfidence). Note also that agent cannot playmoves identical tomoves they have played previously, agents therefore have a finite setof moves (this prevents “deadlocks” and cycling).

6. The PISA dialogue protocol

Assuming that we have a case to predict a class value for, and a number of PISA participant agents, each promoting one of thepossible classes in the domain from which this case was drawn; then the PISA dialogue protocol operates as follows:

1. Before the start of the dialogue the chairperson arbitrarily selects one participant (g1∈G) to start the dialogue.2. At the first round g1 proposes a new CAR, CAR1, such that conf(CAR1)>μ. The CPA establishes a new argumentation tree, whose

root node represents CAR1. If g1 fails to play an opening move, the CPA selects another participant to commence the dialogue. Ifall the participants fail to propose an opening argument, the dialogue terminates with failure.

Table 3Argumentation tree colour encoding.

Colour Meaning Changes to

Green An undefeated CAR (up to the given round) generated using moves 1, 4, 5or 6.

Red: if directly attacked by at least one undefeated CAR. Purple: ifindirectly attacked by at least one higher confidence undefeated CAR.

Red A defeated CAR Green: if all attacks against it are successfully defeated and the previousnode colour was green. Blue: if all attacks against it are successfullydefeated and the previous node colour was blue.

Blue An undefeated CAR (up to the given round) generated using moves 2 or 3. Red: if attacked by at least one undefeated CAR.Purple CAR generated using moves 1, 4, 5 or 6 that are indirectly attacked by a

higher confidence CAR, represented by a green node, played by someother participant.

Green: if all (direct and indirect) attacks against it are successfullydefeated. Red: if attacked by at least one undefeated CAR.

40 M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

3. In the second round the other remaining participants attempt to attack CAR1 using any of the moves (2, 3, 4) discussedpreviously. If all the participants fail to attack CAR1, the dialogue terminates, and the case is classified according to the classpromoted by g1. Otherwise, the argumentation tree data structure is updated with the submitted attacks.

4. Before the beginning of each of the subsequent rounds, the CPA removes participant agents that have not taken part in the lastm rounds from the dialogue and updates the argumentation tree. This is to exclude non-participating agents (agents who areno longer able to generate further moves), m is usually set to 2 (the reason for this was given at the end of Section 3). If onlyone agent remains in the game then the dialogue is terminated, and the case is classified according to the class promoted bythis agent. Otherwise, any participant who can play a legal move, according to the protocol highlighted in Table 2, can doso, and the argumentation tree data structure is updated with the submitted attacks.

5. If two subsequent rounds pass without any new moves being submitted to the argumentation tree (i.e. no change in the jointset of CARs), or if |V|>η rounds have passed without reaching an agreement, the chairperson terminates the dialogue. Exten-sive experiments were undertaken and it was found that the most appropriate setting for η was |G|×50, in other words thedialogue would continue for 50 rounds. The experiments conducted by the authors, using many different data sets, indicatedthat a dialogue never required more than 40 rounds to complete (even with very large data sets featuring many classes), themaximum number of round in most cases was nearer 25.

Once a dialogue has terminated the status of the argumentation tree will indicate the “winning” class as follows:

1. If all green nodes within the argumentation tree belong to the same participant, that participant is the winner. This condition isrealised only when no other participant has played an undefeated move with higher or similar confidence.

2. If there are no green nodes, and all the blue nodes were played by the same participant, that participant wins.

If no winner can be identified a tie-break resolution mechanism is adopted. Otherwise, the case under discussion is classifiedaccording to the winner's promoted class. Note that the dialogues produced by PISA also explain the resulting classifications. Thisis an essential feature of the proposed argumentation from experienced based classification. Where a tie situation exists, PISAimplements a tie resolution mechanism. We have identified various possible tie resolution mechanisms [58]. For example wecan repeat the argument process but with only the tied parties, adopt a voting strategy or simply adopt a random resolution.The latter is akin to using a default rule and has, for simplicity, been used in the remainder of this paper.

6.1. Groups of agents advocating the same class

In PISA individual participant agents advocating the same possible classification (in a given domain) are required to join forcesand act as a single Group of Participants. Every group is allowed only one move per round, in effect it behaves like a single agent.The proposed notion of groups prevents individual agents sharing the same objective from arguing against one another. In eachgroup, one member is selected to be the leader of the group to act as its representative and guides the inter-group dialogue. Theagent with the greatest experience (number of records in E) is selected, if two or more agents have identical experience a randomselection is made.

Once a group leader has been selected, this agent will have authority over other members of the group. This authority entitlesthe leader to select the move to be played (if any) at each round. The selection is made from moves suggested by the group'smembers. The inter-group dialogue model is as follows. At the start of a new round the group leader instructs other membersof the group to suggest moves. The following dialogue then occurs:

1. All the group members who are able to suggest moves (according to their strategies and experience).2. If all the group members fail in suggesting any move, the leader passes the round and submits no moves.3. Otherwise the leader compares the members' moves and identify a best move (i.e. the move with the best confidence) and

submits this move.

The authors have conducted work to support more sophisticated group argumentation, that takes into consideration theindividual strategies of the group members, but because of space limitations this is not reported here.

7. The classification rule mining algorithms

Having introduced, in the foregoing, the legal moves in PISA dialogues, the realisation of these moves using CARM techniquesis described in this section. A suite of five algorithms was developed although some of these are variations of one another. Thealgorithms are as follows:

Algorithm A. Find a rule given a particular case to be classified and some local confidence threshold τ. The algorithm is used withrespect to the Propose Rule and Counter Rule moves (Moves 1 and 2). In the first case u=μ, in the second case τ>μ.

Algorithm B. Distinguish a given rule by adding additional attributes so as to decrease its associated confidence so that it fallsbelow some local threshold τ. The algorithm is used with respect to the Distinguish move (Move 3). A variation of this algorithm

41M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

(Algorithm B′) is used with respect to the Increase Confidence move (move 5) where attributes are added to the antecedent so asto increase the confidence to above some local threshold τ.

Algorithm C. Generalise a given rule by removing attributes from the consequent. The algorithm is used in relation to theWithdraw Unwanted Consequent move (Move 6). A similar algorithm (Algorithm C′) is used with respect to the add UnwantedConsequentmove (Move 4).The first seeks to increase the confidence value above τwhile the second seeks to reduce the confidencevalue to below τ (preferably below μ).

Note that local confidence thresholds (τ) are used because, although all acceptable rules must have a confidence value ofabove the global threshold μ, during the dialogue there will be a “winning” confidence threshold τ (τ≥μ) which the participantagents will seek to address. Agents will wish to present rules with confidence values above τ if they are seeking to advocate theirown position, or below τ if they are seeking to attack some other agent's advocated position.

As noted in Section 2 the P-tree and T-tree data storage structures are used. Recall that each participant agent's experience(data set) is stored in a P-tree (one per agent). Each agent also has a T-tree, but at the start of a dialogue this will be empty;the T-tree is only populated as required (so as to avoid unwanted overheads). In other words the CARM is dynamic.

The complexity of the algorithms is difficult to express succinctly. Much depends on the nature of the agent's data set (expe-rience), namely the distribution of attributes within it and the density. The support threshold used will also have an influence. Ifwe assume a 50% density and a normal distribution, each single item will occur in 50% of the records (P1=0.5), the probability ofeach 2 itemset existing will be 0.25 (P2=0.25), P3=0.125, P4=0.0625, P5=0.03125, P6=0.015625, and so on. In this case, ifσ=0.01 (a fairly standard value) the likelihood is that there will be no supported itemsets of size 7 or more, in other wordsthe likely maximum size of a CAR's antecedent is 5. However, the data will not be normally distributed and the density will typ-ically be less than 50%. Given that a T-tree is essentially a trie we can index through each level to determine if a single K itemsetexists, therefore a maximum of K indexing operations will be required. The maximum complexity to determine whether a partic-ular K itemset, Ii, exists in the T-tree is thus O(K)4; but if the itemset does not exits this will not normally require an exhaustivesearch, thus the complexity can more accurately be expressed as O(K)×p(Ii) (where p(Ii) is the probability that Ii exists). Thenumber of combinations in an itemset I is 2|I|−1 (excluding the null set). An exhaustive search of the T-tree covering all combi-nations of I (assuming all these combinations exist) will thus require 2|Inst| indexing operations (O(2|I|)). However, in practicemany of these combinations will not exist because they are not supported. To give some upper bound on the complexity of theproposed algorithms we will assume that an exhaustive search is required. Each of the algorithms is considered in more detailon the remainder of this section.

Algorithm A finds a new classification rule (CAR) given: (i) an instance Inst (a set of attributes from the problem domain), (ii) adesired class attribute c, (iii) a desired local confidence threshold τ (μ≥τ) and (iv) a maximum T-tree level Max when the searchshould stop so as to limit the search (note that the maximum T-tree level equates to the maximum size of the CAR we are lookingfor). The algorithm is presented in Fig. 1. From the figure the algorithm commences by creating a set of candidate two itemsets I2by combining each single element in Itemp (Itemp= Inst) with c. We then start walking the T-tree by indexing first to the “top level”node representing c. If the next level is missing we generate the next level so as to populate the level with the frequent 2-itemsets(recall that level one in the T-tree contains one itemsets, level two two itemsets, and so on). For each I2 candidate set, if it ispresent in the T-tree, we generate a CAR. If the confidence of the CAR is above the threshold we return the CAR (regardless ofwhether a more exhaustive search might generate a higher confidence rule). If no appropriate CARs can be generated from I2we generate the set of I3 itemsets from the I2 itemsets and continue the search (generating additional levels in branches of theT-tree as required). The process continues until either: (i) an appropriate CAR is found, or (ii) Itemp is empty (we cannot generate

4 Using “Big O” notation.

Fig. 1. Pseudo code for Algorithm A (Propose and Counter Rule moves).

42 M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

any more candidate itemsets) or (iii) K=L. In the last two cases no rule is returned and thus the agent will either have to attempta different strategy or not make a move at all in the current round. Note that the algorithm favours rules with few attributes inthat it commences the searchwith two itemsets (one of which is the class attribute) and then builds up to larger itemsets. Assumingthat all candidate itemsets Ik are present in the T-tree, but all generated rules have a confidence value of less than μ (i.e. an exhaustivesearch of the relevant T-tree branch is required resulting in no appropriate CARs bring discovered), the complexity of AlgorithmAwillbeO(2Inst+1) (plus one fro the initial indexing operation to the “top level” node representing c). The value of μ also has an effect, weare less likely to find an appropriate CAR if the value of τ is high.

Algorithm B is applied to distinguish a rule Ant→c (represented by a specific frequent K itemset) proposed by some other par-ticipant agent, by adding additional conditions to the rule antecedent so as to reduce the associated confidence. The algorithmoperates as follows. First it generates a set of candidate K+1 itemsets, by adding new attributes from the current instance Ithat are not yet contained in Ant. For each K+1 itemset the algorithm attempts to find the itemset in the T-tree so far (buildingadditional levels in sub-branches as required). If found (i.e. it is supported) the algorithm generates a new rule. If the confidenceof the new rule is belowτ the rule is returned. Otherwise the algorithm continues its search until an appropriate rule is foundor no more candidate attribute sets can be generated (in which case null is returned). Fig. 2 presents the pseudo code forAlgorithm B. A variation of this algorithm, Algorithm B?′, is used to generate rules for the Increase Confidence move (move 5).This alternative algorithm proceeds in a similar manner, but instead of returning rules whose confidence is below τ (but higherthan μ), it returns rules with a confidence higher or equal to τ. As for Algorithm A, the complexity of Algorithm B (and Algorithm B′?) is difficult to define. However, to conduct an exhaustive search will require 2|Inst \Ant| indexing operations,5 thus we can saythat the maximum complexity is O(2|Inst \Ant|).

Algorithm C is used in order to withdraw some unwanted consequences (Cons) from an input rule (Ant→c∪Cons) where theparticipant agent is attempting to defend its advocated position. The pseudo code for the algorithm is presented in Fig. 3. Thealgorithm first tries to produce a rule: Ant→c. If such a rule satisfies the local confidence value τ, then the algorithm returnsthis rule, otherwise, alternative candidate attribute sets are generated and further rules produced and tested, in a similar mannerto Algorithm B, to produce CARs of the form Ant→c∪Cons′ where the set Cons′ comprises attributes that do not feature in theinput case. A similar algorithm can be used (Algorithm C′) to generate rules where the participant agent wishes to introduceUnwanted Consequents and reduce the confidence threshold. In this case the generated CARs must have a confidence valuesuch that conf(rule)≤μ. The worst case complexity will be O(|{Ant∪c}|+2|Inst \Ant|) (the number of indexing operations requiredto find the support for the itemset for {Ant∪c} plus the number of indexing operations required to perform an exhaustive searchof the T-tree with respect to potential attributes to include in the new consequent).

8. Explanatory example

PISA has been fully implemented in Java, and uses the algorithms described in Section 7 above to mine CARs as appropriate.This section illustrates the operation of PISA by considering its application to the process of allocating nursery school places inLjubljana [39]. The dataset was obtained from the UCL data repository [15]. The original dataset consisted of 12,960 recordsclassified into five levels of recommendation: notRecommended, reco-mmended, highlyRecommended, priority and specialPriority.The distribution of the classes in terms of records was as follows: 33.33%, 0.02%, 2.53%, 32.92% and 31.20% (corresponding to4320, 2, 328, 4266 and 4044 records respectively). Note that the recommended and highlyRecommended classifications are rare.For the purpose of the example described here the two records associated within the recommended class were removed from

Fig. 2. Pseudo code for Algorithm B (Distinguish move).

5 We use the \ operator to indicate the complement of the prefix operand with respect to the postfix operand.

43M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

the data set because the number of records (experience) was not adequate to support a PISA dialogue. The dataset also has eightattributes other than the class attribute:

1. ParentsOccupation = {usual | pretentious | ofGreatPretension}.2. ChildsNursery = {proper | lessProper | improper | critical | veryCritical}.3. FormOfTheCFamily = {complete | completed | incomplete | foster}.4. NumberOfChildren = {1 | 2 | 3 | moreThan3}.5. HousingConditions = {convenient | lessConvenient | critical}.6. FinancialStandingOfTheFamily = {convenient | inconvenient}.7. SocialConditions = {nonProblematic | slightlyProblematic | problematic}.8. HealthConditions = {recommended | priority | notRecommended}.

(Some of the translations for the value labels are somewhat eccentric, but this is how they are listed in the UCI repository.)For the example PISA was configured so that it featured four agents, each representing one of the four possible classifications.

The agents are referred to as NR (notRecommended), HR (highlyRecommended), protect PR (priority) and SP (specialPriority). Theavailable records were distributed evenly amongst the four participants so that each had equal experience with respect to each ofthe four classes. Note that ct HR is disadvantaged because it only has 82 (328/4=82) records from which to generate CARsadvocating its class. We will therefore consider a highlyRecommended case as the subject for the classification (discussion), as thisis the classification most likely to be in dispute. Specifically the case has the following attribute values {usual, less proper, completedfamily, 2, convenient, inconvenient, nonProblematic, recommended} (with respect to the above attribute list). The support and confi-dence thresholds were set to σ=1% and μ=50% respectively; these values were chosen as they are well established as the defaultthresholds within the data mining community (see for example [20]).

The dialogue commences when the chairperson invites the HR player agent to propose the opening argument, HR proposesthe following CAR (R1):

HR - Propose a New Rule: ParentsOccupation=usual, ChildsNursery=lessProper,HousingConditions=convenient, HealthConditions=recommended -->

class=highlyRecommended. C=52.38%.

(where C is the confidence value). The rule was selected, using Algorithm A, because it was the simplest rule (smallest antecedent)whose confidence exceeded μ. This rule is attacked by the other three agents in the second round (R2, R3 and R4 respectively) asfollows (the reader might find it helpful to refer to the completed argument tree shown in Fig. 4 as the debate develops):

NR - Counter Rule: ParentsOccupation=usual, FormOfTheCFamily=complete,

NumberOfChildren=2, HousingConditions=convenient,FinancialStandingOfTheFamily=inconvenient -->class=notRecommended. C=55.55%.

PR - Counter Rule: HealthConditions=recommended --> class=priority. C=55.72%.SP - Distinguishes the previous rule: The case has the following additional feature

FormOfTheCFamily=complete giving class=highlyRecommended with C=20% only.

Fig. 3. Pseudo code for Algorithm C (Withdraw Unwanted Consequent move).

44 M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

Note that SP does not propose a rule of advocating its own class but instead attacks the rule proposed by HR. Since the casefalls into the narrow band of highlyRecommended we might expect to find reasons for the classifications on either side, but notthe very different specialPriority. SP was thus unable to find a rule with appropriate confidence to support its own advocatedclass. Nonetheless, SP could play a useful role in critiquing the arguments of the other players. At this stage PR has the winningargument as it has the best unattacked rule.

In round three all four players make moves:

• HR: propose a new rule to attack the current best rule, a rule which will turn out to be the winning rule (R5):

HR - Propose a New Rule: ParentsOccupation=usual, ChildsNursery=lessProper,

FormOfTheCFamily=complete, HousingConditions=convenient,HealthConditions=recommended--> class=highlyRecommended. C=85.71%.

• NR: distinguishes PR's argument by demonstrating that ParentsOccupation=usual and HealthConditions=recommended onlygives priority with 18.64% confidence (R6).

• PR: proposes a Counter Rule against NR's rule from round two (R7):

PR - Proposes Counter Rule: ParentsOccupation=usual, ChildsNursery=lessProper,

HealthConditions=recommended --> class=priority. C=61.16%.

• SP: distinguishes PR's rule from round two by demonstrating that by adding ParentsOccupation=usual, using its data, themodified rule has 19.9% confidence (R8).

Now HR is back in the “lead”. Note that the proposed rule is the same as the rule modified by SP in round two; however there isa difference in confidence because of the nature of the two different data sets used by HR and SP and both datasets are relativelysmall.

In round four SP has no more moves it can make. The other two agents can, however, make moves:

• NR: distinguishes PR's rule from round three, by demonstrating that Health conditions=recommended reduces the confidence ofclass=priority to only 20% (R9).

• PR: proposes a Counter Rule against HR's rule of round three (R10):

PR - Counter Rule: ChildsNursery=lessProper,FormOfTheCFamily=complete, FinancialStandingOfTheFamily=inconvenient,

HealthConditions=recommended --> class=priority. C=86.95%.

Fig. 4. The argumentation tree for the nursery example. (Dark grey = green nodes, light grey = blue nodes, double lined = purple nodes, and single lined = rednodes, solid arrows = direct attacks, dashed arrows = indirect attacks).

45M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

Now PR is winning, but in the fifth round this can be distinguished by NR, since the addition of the attribute SocialConditions=nonProblematic reduces the confidence to just 20% (R11). In the sixth PR proposes another rule (R12):

PR - Proposes a Counter Rule: ParentsOccupation=usual, ChildsNursery=lessProper,HealthConditions=recommended--> class=priority. C=65.95%.

This, however, can be distinguished by HR since adding SocialConditions=nonProblematic reduces the confidence to 20% (R13).This reinstates the argument HR made in round 3. No more arguments are now possible, and so the final classification ishighlyRecommended (i.e. the correct classification).

9. Experimental evaluation

This section reports on the empirically evaluation of the PISA framework as described on this paper. For the evaluation a number ofreal-world datasets were used from the UCI repository [15] where appropriate continuous values were discretised into ranges. Thechosen datasets (Table 4) display a variety of characteristics with respect to number of records (|D|), number of classes (|C|) andnumber of non-class attributes (|I|) before after discretisation. Importantly, the datasets include a diverse number of class labels,distributed in a different manner in each dataset (balanced and unbalanced), thus providing the desired variation in the experienceassigned to individual PISA participants.

The experiments were designed to evaluate the following:

1. The hypothesis that the proposed classification using argumentation process produces at least comparative results to thatobtained using traditional classification paradigms. In particular, ensemble classification methods, which PISA can be arguedto resemble in that both approaches combine the results of the application of more than one classifier to produce a finalclassification.

2. The effectiveness of PISA when agents work (collaborate) in groups.3. The performance of PISA, with respect to alternative classification mechanisms, in the presence of very “noisy” data.

9.1. Methodology

The results presented throughout this section were obtained using standard Tenfold Cross Validation (TCV). For each TCV thedata set was first partitioned into 10 equal sized sets, then each set was used in turn as the test set while the “arguments” wereconducted using the remaining nine sets. For the purposes of running PISA, each training dataset was equally divided among theparticipant agents so that each participant was given a random share of the dataset under consideration. Then a number of PISAdialogues were executed to classify cases in the test sets. For each reported evaluation μ=50% and σ=1%. With respect to theother classifiers used in the evaluation (see below) each of them operated on the entire dataset (i.e. the union of the participants'datasets).

In order to fully assess its operation, PISA was compared against a range of classification paradigms, namely: (i) decision trees,(ii) association rule classifiers, (iii) SVMs and (iv) ensemble classifiers. For the decision tree paradigm C4.5 [51], as implementedin theWeke environment [31], and theRandomDecision Tree (RDT) paradigm [19], were used. For the association rules classifiers theTFPC (Total From Partial Classification) algorithm [20] was adopted because this algorithm utilises the same P-tree and T-tree datastructures [22] as PISA. TFPC was applied with the same support and confidence thresholds as PISA. For the SVM the SequentialMinimal Optimisation (SMO) Weka implementation [31] was used. For the ensemble classifiers Table 5 provides a summary of thetechniques used. We chose to apply Boosting and Bagging, combined with decision trees (C4.5 and RDT), because previous workdemonstrated that such combination is very effective ( [10,46]).

Table 4Summary of data sets.

Name |D| |C| |I| (discretised) Bal (yes/no)

Breast 699 2 11 (20) YesCongressional voting 435 2 17(34) YesMushrooms 8124 2 23 (90) YesPima (diabetes) 768 2 9 (38) YesWaveform 5000 3 22 (101) YesCar Evaluation 1728 4 7 (25) NoNursery 12,960 5 9 (32) NoPage Blocks 5473 7 11 (46) NoLed7 3200 10 8 (24) YesPen digits 10,992 10 17 (89) Yes

46 M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

9.2. Evaluation of PISA and other classification paradigms

This sub-section describes experiments conducted to compare the operation of PISA, using the datasets listed in Table 4,against a number of other classification algorithms (see Table 5). For each of the included methods three values were calculatedfor each dataset: (i) classification Error Rate (ER), (ii) Balanced Error Rate (BER) calculated from the confusion matrix obtainedfrom each TCV,6 and (iii) the execution time required for each experiment. These three values then provided the criteria for assessingand comparing the classification paradigms included in the evaluation. BER was used because it is akin to the precision measurefrequently used to evaluate the operation of binary classifiers.

The results are presented in Tables 6 to 9. Tables 6 to 8 compare the operation of PISA, according to ER, BER and executiontime, with respect to the ensemble methods, decision trees and TFPC. Table 9 compares the operation of PISA with the SMOimplementation of an SVM.

From Table 6 it can be seen that, in the context of ER, PISA performs consistently well; outperforming the other association ruleclassifier, and giving comparable results to the decision tree methods. It can also be seen that PISA produced results better or com-parable to those produced by the ensemble methods. PISA produces the best overall performance in that the lowest ER resultswere associated with PISA with respect to four out of the ten data sets tested (better than all the other paradigms considered).From Table 6 the ensemble methods tended to perform worse than PISA in domains that contain large numbers of classes (thedomains to which arguing from experience is most applicable), although the ensemble methods tended outperform PISA intwo-class domains. The results also show that PISA performs well with imbalanced class domains (e.g. Car Evaluation, Nurseryand Page Blocks).

Table 7 shows the BER for each of the given datasets. From the table it can be seen that PISA produced good results overall,producing the best result in five out of the ten data sets tested. Fig. 5 illustrates the difference between the error-rate and balancederror-rate for the three imbalanced datasets.

Table 8 gives the execution times (in milliseconds) for each of the methods. As expected, because of the communication over-head, PISA is not the quickest classification method. However, the recorded performance is by no means the slowest (for instancedecorate runs slower than PISA with respect to the majority of the datasets). Additionally it is worth remarking that PISA runsfaster than Bagging and ADABoost with some datasets. Fig. 6 shows that, on average, the time complexity associate with theoperation of PISA is comparable to that of the other methods considered.

6 Balanced Error Rates (BER) were calculated, for each dataset, as follows: BER ¼ 1C∑i¼1

cFci

FciþTci

where C = the number of classes in the dataset, Tci = the number of

cases which are correctly classified as class ci, and Fci = the number of cases which should have been classified as ci but where classified incorrectly.

Table 6ER (%), lowest values shown in bold for each dataset.

Dataset PISA Ensembles Decision trees CAR

Bagging ADABoost.M1 MultiBoost Decorate

C4.5 RDT C4.5 RDT C4.5 RDT C4.5 RDT TFPC

Breast 3.91 5.01 4.86 4.86 4.86 4.86 4.86 5.43 4.86 5.07 10.00Congress 1.78 3.01 2.31 2.08 3.01 2.08 3.01 2.77 4.16 0.00 9.30Mushrooms 0.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.06 1.05Pima 14.47 27.21 25.26 25.26 23.83 25.13 24.87 25.66 26.69 16.18 25.92Waveform 2.16 17.97 11.98 21.48 21.48 13.62 11.96 21.48 21.48 2.42 33.32Car Eval 6.16 4.51 1.24 2.43 6.25 2.60 6.25 4.28 5.09 5.90 30.00Nursery 4.55 2.08 3.09 0.38 3.09 0.35 3.09 1.91 2.62 3.72 22.25Page Block 2.67 6.93 6.93 7.02 6.93 7.02 6.93 6.93 7.02 6.94 9.95Pen digit 2.75 4.47 1.35 1.58 2.51 5.07 1.87 2.51 5.65 1.08 18.24Led7 12.00 24.81 24.16 24.84 24.28 24.91 24.34 24.75 24.84 24.25 31.03

Table 5Summary of the ensemble methods used (σ=1%).

Ensemble Ensemble technique Generated classifiers Base-method Bag size % Threshold weight

Bagging-C4.5 Bagging [16] 25 C4.5 100Bagging-RDT Bagging [16] 25 Random trees 100ADABoost.M1-C4.5 ADABoost.M1 [26] 25 C4.5 100ADABoost.M1-RDT ADABoost.M1 [26] 25 Random trees 100MutliBoostAB-C4.5 MultiBoosting [59] 25 C4.5 100MultiBoostAB-RDT MultiBoosting [59] 25 Random trees 100Decorate [36] 25 C4.5

47M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

Table 9 provides a comparison between the above results for PISA and SMO. The SMO implementation was run in the defaultsettings utilising two kernels: Poly Kernel and Normalised Poly Kernel. The results show that PISA produced higher accuracies ingeneral than the SMO Implementation, additionally PISA performed faster than SMO with respect to most of the datasetsconsidered.

9.3. The role of groups in increasing the accuracy of PISA

This sub-section reports on the investigation conducted to support the hypothesis that by further splitting the data available toeach agent among a number of agents supporting the same classification (i.e. groups of agents) the quality of the resultingclassifications will increase, with some trade-off in execution time. For the evaluation the above reported experiments wererun again using group sizes of 2, 4 and 5 agents supporting each possible class. Fig. 7 shows the reduction in the error-rates of

Table 7BER (%), lowest values shown in bold for each dataset.

Dataset PISA Ensembles Decision trees CAR

Bagging ADABoost.M1 MultiBoost Decorate

C4.5 RDT C4.5 RDT C4.5 RDT C4.5 RDT TFPC

Breast 4.75 6.03 6.20 6.20 6.20 6.07 6.20 6.71 6.20 4.71 12.89Congress 2.35 3.43 2.66 2.27 3.19 2.27 2.27 3.05 4.69 0.00 9.71Mushrooms 0.59 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.06 1.04Pima 13.94 28.88 26.86 26.86 25.18 26.72 26.12 27.16 28.34 24.47 33.67Waveform 3.93 18.00 11.99 21.51 21.51 13.64 11.97 21.51 21.51 2.39 33.35Car Eval 8.21 11.25 6.77 4.79 10.43 5.29 10.43 10.24 16.55 10.67 75.00Nursery 5.47 4.14 2.28 1.05 5.82 0.76 5.78 4.55 5.98 5.74 40.10Page Block 9.45 21.46 22.85 27.89 21.46 27.87 21.46 22.85 27.89 21.48 19.89Pen digit 3.47 4.48 1.51 1.59 2.23 4.92 1.89 2.23 5.57 3.73 18.38Led7 11.84 24.56 24.07 24.87 24.31 25.09 24.23 24.92 24.72 24.36 31.39

Table 8Execution times (milliseconds).

Dataset PISA Ensembles Decision trees CAR

Bagging ADABoost.M1 MultiBoost Decorate

C4.5 RDT C4.5 RDT C4.5 RDT C4.5 RDT TFPC

Congress (435) 34 50 20 20 140 130 20 590 30 15 154Breast (699) 31 110 110 140 110 170 170 330 8.1 8 11Pima (768) 75 160 90 80 130 80 110 500 20 21 11Car Evaluation (1728) 74 300 110 370 20 20 310 1580 80 24 17Led7 (3200) 78 730 360 260 130 1150 480 3380 110 90 25Waveform (5000) 1243 1840 380 4400 830 1650 560 4730 200 102 862Page Blocks (5473) 159 130 430 430 130 280 130 430 120 55 60Mushrooms (8124) 313 750 380 110 50 60 50 6400 80 117 630Pen digits (10,992) 1345 2300 460 5810 820 2790 800 2300 290 80 1606Nursery (12,960) 965 1790 720 3130 60 3760 10 1449 110 139 204

Table 9Comparing PISA to The WEKA SMO implementation of a SVM.

Dataset ER (%) BER (%) Exec time (ms)

PISA Poly Normal–poly PISA Poly Normal–poly PISA Poly Normal–poly

Breast 3.91 6.29 5.58 4.75 7.64 6.69 31 70 250Congress 1.98 2.08 3.93 2.34 1.99 4.65 34 340 310Mushrooms 0.41 0.00 0.00 0.59 0.00 0.00 313 52,178 53,195Pima 14.47 26.69 26.17 13.94 28.34 27.72 75 990 1320Waveform 2.16 21.40 19.60 3.93 21.32 19.44 243 5830 6240CARs 6.16 7.29 3.13 8.21 9.71 8.91 15 1500 6700Nursery 4.55 6.87 6.01 5.47 6.26 5.77 965 1606 1222Page Blocks 2.67 6.96 7.53 9.45 24.80 29.83 159 3730 6602Pen digits 2.75 2.42 2.39 3.47 2.38 2.36 1345 1032 3144Led7 12.00 24.22 24.38 11.84 23.05 24.29 78 819 1356

48 M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

PISA. Interestingly PISA performs better using groups than individual agents: it seems that the greater the number of separatedatabases the greater the number of arguments found, enabling a more thorough exploration of the problem. Note, however,that with small(er) datasets (Congressional Voting, Breast, Pima and Led7) dividing the data among 4 or 5 agents increases theerror rate, as here each participant is allocated a very small dataset that is not sufficient for a meaningful application of the argu-mentation process. However, halving the data between two agents (supporting the same classification) always produced betterresults than using one agent. These results apply also to the BER as can be seen from Fig. 8.

Another issue is the communication overhead resulting from the PISA decision making process within each group. For eachround of the dialogue, each member of the group attempts to suggest a move designed to promote their advocated classification.The group's leader has to choose one of these moves to present in the ongoing dialogue. The experiments reported in sub-section 2 provided information about execution time when groups are not used. Fig. 9 shows the increase in execution times(in milliseconds) when using groups of 2, 4 and 5 agents compared to the execution time recorded when using PISA with oneagent per class. The increase in execution time is not unacceptable.

9.4. Experimenting with noisy data

One of the generic challenges of classification is how to deal with (very) noisy data as most real world data is often infectedwith varying levels of noise. This may be the result of data entry errors during data capture, or transcription errors when renderingthe data into some appropriate computer readable form. Alternatively noise may be introduced during the discretisation and/ornormalisation of data, for example it is sometimes necessary to range numeric data. Noise may also be introduced as a result of

Fig. 5. Difference between ER and BER with respect to imbalanced datasets.

Fig. 6. Average execution time (milliseconds) per method.

49M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

the application of mechanisms to handle missing data, or as a result of data perturbation for the purpose of privacy and securityhandling. Whatever the case, the effectiveness of any classification paradigm will be influenced by the presence of noise in thedata: noisy data results in classification inaccuracies, typically caused by the over-fitting of the classifier to the (noisy) data. Varioustechniques have been proposed to deal with the influence of noise on classifier generation. These include: (i) the post-pruning ofrules in the generated classifier so as to make the classifier less specific (e.g. [17,54]), (ii) developing robust systems that allow fornoise by avoiding over-fitting of themodel to the data [1,18], and (iii) the combination of the results of weak classifiers to produce abetter classifier (i.e. ensemble methods [16,26]). In this section, we provide empirical evidence that PISA presents an alternativeand effective, agent based approach, to the problem of classifying noisy data.

To assess the robustness of PISA with respect to noise a sequence of experiments was conducted using the datasets fromTable 4. In each case random noise was introduced into the training set (but not to the test set) using the following model: foran N% noise level in a dataset of I records, ((N/100)* I) records were randomly selected and the class label changed to someother randomly selected value (with equal probability) from the set of available classes. The noise levels used in this studywere: 2%, 5%, 10%, 20% and 40%. The operation of PISA in the presence of noise was also compared with noise-free data (asreported above).

The results obtained demonstrated, as expected, that the overall level of accuracy decreased as the noise percentage was in-creased. However, PISA still maintained a good level of accuracy even with high noise. The results are presented in Figs. 10, 11,and 12 where the percentage of noise is given on the X-axis and percentage error rate on the Y-axis (TCV was again applied).

Fig. 7. Reduction in ER for PISA, when using groups of 2, 4 and 5 agents, as a percentage of the original error rate (one agent per group).

Fig. 8. Reduction in BER for PISA, when using groups of 2, 4 and 5 agents, as a percentage of the original error rate (one agent per group).

50 M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

From these figures it is evident that with the increase of noise levels, PISA (and to some extent RDT)starts to outperform all theother classifiers; when the noise level hits 40% the difference in performance between these two classifiers and the rest of theclassifiers included in this study, becomes considerable. With higher levels of noise PISA produces the best results with respectto most of the dataset. Note that the datasets where PISA was outperformed when high levels of noise were introduced arethose with only a small number of records per class, so that each PISA player had only a limited number of cases from whichto mine their arguments. Overall PISA performed consistently well, demonstrating good noise tolerance across the whole collectionof datasets.

10. Related work

With respect to the application of agent techniques to classification, Modi and Shen [37] defined the distributed classificationtask. Basically, their technique involve a number of agents, each provided with a local data set (or database), that had access to asubset of the attributes in the problem domain. However, Modi and Shen assumed that all the agents in the system had the samedata, whereas, PISA (and other subsequent meta-learning systems) assume that the data are either distributed amongst the sys-tem agents or that each agent has its own local data, that may differ from the other agents in the system (different experience).Moreover, the system in [37] assumes that each example in the dataset of any agent has a unique identifier shared by all theagents. PISA has no such constraint. Finally, [37] presents two algorithms to solve the distributed classification task: DVS (DistributedVersion Space) and DDT (Distributed Decision Trees). The only information that both algorithm broadcast to the other agents is theunique example identifiers; thus preserving the privacy of the information of the local databases. PISA also keeps private the local in-formation of each agent. However, instead of sharing unique example identifiers (which is a strong assumption in many domains),the agents broadcast the CARs, that form the content of the moves exchanged in PISA dialogues. These rules present generalisationof the agents' data, and thus does not compromise the privacy of the underlying data. Gorodetsky et al. [28] consider that the mainissue in multiagent classification tasks (and data mining in general) is not the classifier generation algorithms themselves, but themost appropriatemechanisms to allow agents to collaborate. PISA provides such amechanism for agent collaboration based on argu-mentation. Additionally, the literature provides additional evidence that multi-agent approaches to classification yield better resultsthan single-agent approaches (e.g. [47]).

Agent technology has also been employed in meta-learning (classification). This is a technique that deals with the problem ofcomputing a global classifier from large and inherently distributed databases [3]. Meta-learning aims to compute a number ofindependent classifiers (concepts ormodels) by applying learning programs to a collection of independent and inherently distributeddatabases in parallel. The base classifiers computed arethen collected and combined by another learning process. Here meta-learningseeks to compute a meta-classifier that integrates, in some principled fashion, the separately learned classifiers to boost overall pre-dictive accuracy (see for example [50]).

One particular example of a meta-learning technique is ensemble learning (a survey of which can be found in [23], an exper-imental comparison of three main techniques can be found in [24]). Ensemble techniques have been shown to achieve good per-formance, especially in fields where the development of a powerful single learning system requires considerable effort [55].Ensemble learning has been recently applied to multi-agent systems, so that several learning agents collaborate in a distributedenvironment. For example, in [43] the authors propose several ensemble schemas for cooperative case-based learners. PISA, as

Fig. 9. Increase in the execution times (milliseconds) for the datasets, when using groups of 2, 4 and 5 agents, compared with the execution time when each groupcomprises one agent only (Table 8).

51M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

Fig. 10. Increase in the error rate with respect to noise (1) (percentage noise on X = axis, error rate on Y-axis).

52M.W

ardehet

al./Data

&Know

ledgeEngineering

75(2012)

34–57

Fig. 11. Increase in the error rate with respect to noise (2) (percentage noise on X = axis, error rate on Y-axis.

53M.W

ardehet

al./Data

&Know

ledgeEngineering

75(2012)

34–57

Fig. 12. Increase in the error rate with respect to noise (3) (percentage noise on X = axis, error rate on Y-axis).

54M.W

ardehet

al./Data

&Know

ledgeEngineering

75(2012)

34–57

already noted, can be considered to be a multiagent ensemble system. Generally, multiagent ensemble learning can be dividedinto two categories:

1. Competitive ensemble learning, where agents work asynchronously on the same problem and the decision of the best agent isthe group decision.

2. Cooperative ensemble learning, where the group decision is a fusion or aggregation of the individual decisions of all agents.

PISA supports both techniques; the overall argumentation process presents a competitive ensemble approach, whereas theinter-group decision making procedure is more like the cooperative ensemble approach. Olmeda and Fernandez [40] haveshown that an effective ensemble learning system may benefit from the combination of both techniques. These findings werealso supported by the results of the experiments described in Section 9.

A foundational account of argumentation on whichmuch contemporary work in based was given by Dung [25]. There argumentswere entirely abstract, with no structure at all and related only by a defeat relation. Our approach uses arguments with a particularstructure, but at the appropriate level of abstract abstraction our argumentation tree can be seen as a Dung framework using ground-ed semantics, in that only set arguments with no attackers not defeated by somemember of that set are considered acceptable. Mostapproaches adding structure to Dung rely on a set of defeasible rules to generate arguments. An excellent example of providing struc-ture in this way is [49], in which the semantics are grounded in a knowledge base comprising facts and strict and defeasible rules.Persuasion dialogues relying on argumentation structured around rules are well reviewed in [48]. We do not, however, start froma set of rules, but a set of raw, unanalysed examples, so our approach is necessarily different: the dialoguemoves in [48] are intimatelyrelated to knowledge structured as defeasible rules. The same is true of other proposals for legal dialogue systems based on legislationrepresented as rules, e.g. [27]. As discussed in the introduction to this paper,we look for inspiration rather in Legal CBR, in particular tothe style of argumentation found in [7,4,12], adapting themoves tomake them appropriate for usewith a large set of examples ratherthat a small set of analysed cases.

Concerning the application of argumentation techniques to classification [44] articulates an argumentation framework forlearning agents, this framework is similar to that proposed here in taking the experience, in the form of past cases, of agentsinto consideration and focusing on the argument generation process. However, the work in [44] uses CBR techniques to generatearguments from past examples and to provide a means to define the preference relation between the generated arguments. PISAon the other hand implements more straightforward CARM techniques to produce arguments and applies a preference relationbased upon the support/confidence measures. [44] also presents a framework for multiparty argumentation to enable a commit-tee of agents to jointly deliberate about given cases. However, unlike PISA, the communication between the arguing agents isdirect (i.e. there is no mediator agent), and turn taking is tokenised. The protocol consists of a series of rounds. In the initialround, the agents state their individual predictions of the discussed case, and broadcast these predictions to the involved agents.Then, in the subsequent rounds, a token passing mechanism is used so that agents (one at a time) can attack other agents (alsoone at a time) if they disagree with their prediction. When an agent is attacked, it informs the attacker whether it accepts thecounter argument (and changes its prediction) or not. Also, when an agent has the token it can answer to such attacks, bygenerating counter attacks. When all the agents have had the token once, the token returns to the first agent, and so on. The com-munication between the agents continues in the same manner until they all agree on a prediction, or if a given number of roundshas passed and no agent has generated any counter argument. Moreover, if at the end of the argumentation the agents have notreached an agreement, then a voting mechanism, that uses the confidence of each prediction as weights, is used to decide the finalsolution. PISA differs from the framework of [44] in that it relies on an argumentation artefact (the argumentation tree), and amediator agent (the chairperson) to facilitate the argumentation process between a number of agents. Thus, PISA agents canfocus on generating the best arguments rather than on the communication amongst themselves. Additionally, instead of voting,PISA applies a tie-resolution mechanism when agreement cannot be reached in a given number of rounds.

Additional work on the use of argumentation to facilitate classification that can be found in literature include [5,38,45]. In [5]Amgoud and Serrurier propose a formal argumentation framework for classification, in which arguments can be constructed forand against each possible classification of a given example. Similar to PISA, these arguments can then be evaluated and a plausibleclassification of the example suggested. However, the work in [5] assumes that a set of hypothesis is associated with the learningexamples, whereas PISA has no such assumption. Additionally, the strength of each argument is identified such that the argu-ments derived from the set of training examples are stronger than the arguments built from the set of hypothesis, whereasPISA applies support and confidence measures to evaluate the strength of the generated arguments. Other work has tried toimprove the performance of classification methods by combining themwith argumentation techniques. In [38] the idea of argumen-ted examples is introduced to improve the results of the CN2 machine learning algorithm [18]. However, their approach requiresexpert consultation to generate the arguments associated with each example. Additionally, their argumentation process involvestwo options only (two agents), whilst PISA can handle any number of options. In [45], the concept of learning from communicationwas also introduced to show that the accuracy of the learning process can benefit from communication (via argumentation) among anumber of agents.

11. Conclusion

The PISA arguing from experience framework has been described. The framework allows a collection of participating agents toconduct a dialogue concerning the classification of an example. The system progresses in a round-by-round manner. During each

55M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

round agents can elect to propose an argument advocating their own position or attack another agent's position (and thus indi-rectly also promoting their own position). The process is managed by a chairperson agent who is not an advocate of any position.Arguments are dynamically generated (mined) from each participant agent's data set, which is viewed as a repository of thatagent's experience. The arguments are mined and expressed in the form of Classification Association Rules (CARs), which areviewed as generalisations of the individual agent's experience. The dialogue is stored in a storage structure, the argumentationtree, as it progresses. The tree indicates the final “winner” and provides a record of the decision making process.

With respect to argumentation PISA provides the following advantages: (i) it obviates the need to build a static knowledgebase (which often entails a knowledge acquisition bottle neck), (ii) it allows data (experience) of individual agents to be continuouslyadded to without requiring any additional incremental processing, (iii) it provides for multi-party argumentation (most existingsystems are directed at two part argumentation scenarios), and (iv) it allows agents to cooperate in groups (most existing systemsdo not supports this).

In the context of classification the framework provides for a “distributed” classification mechanism that harnesses all theadvantages offered by Multiagent Systems. The effectiveness of PISA, in terms of classification accuracy, is at least comparablewith that of other classification paradigms. PISA outperforms the other paradigms considered when PISA agents operate in groupsand in the presence of high levels of noise. PISA also provides an explanation (with reference to the argumentation tree) on howthe final classification was arrived at. Furthermore the PISA approach to classification can operate with temporally evolving data.

References

[1] R. Agrawal, R. Srikant, Privacy-preserving data mining, Proc. SIGMOD'00, ACM Press, 2000, pp. 439–450.[2] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, Proc. 20th International Conference on Very Large Data Bases, VLDB94,

1994, pp. 487–499.[3] K.A. Albashiri, F. Coenen, P. Leng, EMADS: an extendible multi-agent data miner, Knowledge Based Systems 22 (7) (2009) 523–528.[4] Aleven, V., Teaching Case Based Argumentation Through an Example and Models. PhD Thesis, University of Pittsburgh (1997).[5] L. Amgoud, M. Serrurier, Arguing and explaining classifications, Argumentation in Multi-Agent Systems. Springer LNCS4946, Springer, Heidelberg, 2008,

pp. 164–177.[6] K.D. Ashley, E.L. Rissland, A case-based approach to modeling legal expertise, IEEE Expert 3 (3) (1988) 70–77.[7] K.D. Ashley, Modeling Legal Argument, MIT Press, Cambridge, MA, USA, 1990.[8] P. Baroni, F. Cerutti, M. Giacomin, G. Guida, AFRA: argumentation framework with recursive attacks, International Journal of Approximate Reasoning 52 (1)

(2011) 19–37.[9] F. Berzal, J.-C. Cubero, N. Maŕin, J.-M. Serrano, TBAR: an efficient method for association rule mining in relational databases, Journal of Data and Knowledge

Engineering 37 (1) (April 2001) 4764.[10] E. Bauer, R. Kohavi, An empirical comparison of voting classification algorithms: bagging, boosting and variants, Machine Learning 36 (1999) 105–139.[11] G. Bel-Enguix, D.J. López, Membranes as multi-agent systems: an application to dialogue modelling, Professional Practice in AI, Springer, 2006, pp. 31–40.[12] T.J.M. Bench-Capon, G. Sartor, A model of legal reasoning with cases incorporating theories and values, Artificial Intelligence 150 (1–2) (2003) 97–143.[13] T.J.M. Bench-Capon, P.E. Dunne, Artificial intelligence argumentation and artificial intelligence, Artificial Intelligence 171 (10–15) (2007) 619–641.[14] Neil Benn, Ann Macintosh, Argument visualization for eparticipation: towards a research agenda and prototype tool, Proceedings of 3ed e-Part Conference,

LNCS, 6847, Springer, 2011, pp. 60–73.[15] C.L. Blake, C.J. Merz, UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/mlearn/MLRepository.html, Irvine, CA: University of California,

Department of Information and Computer Science, 1998.[16] L. Brieman, Bagging predictors, Machine Learning, vol. 24, Springer, 1996, pp. 123–140.[17] C. Brodley, M. Friedl, Identifying and eliminating mislabelled training instances, Artificial Intelligence Research, vol. 11, AAAI, 1999, pp. 131–167.[18] P. Clark, T. Niblett, The CN2 induction algorithm, Machine Learning, vol. 3, no 4, Springer, 1989, pp. 261–283.[19] F. Coenen, The LUCS-KDD Decision Tree Classifier Software, http://www.csc.liv.ac.uk/frans/KDD/Software/DecisionTrees/decisionTree.html, Department of

Computer Science, The University of Liverpool, UK, 2007.[20] F. Coenen, P. Leng, Obtaining best parameter values for accurate classification, Proc. 5th Int. Conf. on Data Mining, ICDM'05, IEEE, 2005, pp. 597–600.[21] F. Coenen, P. Leng, The effect of threshold values on association rule based classification accuracy, Jo of Data and Knowledge Engineering 60 (2) (2007)

345–360.[22] F. Coenen, P.H. Leng, S. Ahmed, Data structure for association rule mining: T-trees and p-trees, IEEE Transactions on Data and Knowledge Engineering 16 (6)

(2004) 774–778.[23] T.G. Dietterich, Ensemble methods in machine learning, Proc. Int. Workshop on Multiple Classifier Systems, Springer, Berlin, 2000, pp. 1–15.[24] T.G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization, Machine

Learning 40 (2000) 139–157.[25] P.M. Dung, On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games, Artificial

Intelligence 77 (2) (1995) 321–358.[26] Y. Freund, R. Schapire, Experiments with a new boosting algorithm, Proc 13th Int. Conf. on Machine Learning, Bari, Italy, 1996, pp. 148–156.[27] T.F. Gordon, The pleadings game: formalizing procedural justice, Proc. 4th International Conference on AI and Law, 1993, pp. 10–19.[28] V. Gorodetsky, O. Karsaeyv, V. Samoilov, Multi-agent technology for distributed data mining and classification, Proc. Int. Conf. on Intelligent Agent

Technology, IEEE/WIC, 2003, pp. 438–441.[29] S.A. Gómez, C.I. Chesñevar, Integrating defeasible argumentation and machine learning techniques, Technical Report, Universidad Nacional del Sur, Bahia

Blanca, 2004.[30] Nancy L. Green, Causal argumentation schemes to support sense-making in clinical genetics and law, Proc. 13th International Conference on AI and Law,

2011, pp. 56–60.[31] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H. Witten, The WEKA data mining software: an update, SIGKDD Explorations, vol. 11(1), 2009.[32] J. Han, J. Pei, Y. Yiwen, Mining frequent patterns without candidate generation, Proc. SIGMOD'00, ACM Press, 2000, pp. 1–12.[33] John F. Horty, Reasons and precedent, Proc. 13th International Conference on AI and Law, 2011, pp. 41–50.[34] R. Jacobs, M. Jordan, S.J. Nowlan, G.E. Hinton, Adaptive mixture of local experts, Neural Computation 3 (1) (1991) 79–87.[35] E.B. Kong, T. Dietterich, Error-correcting output coding corrects bias and variance, Proc. 12th Int. Conf. on Machine Learning, 1995, pp. 213–221.[36] P. Melville, R.J. Mooney, Constructing diverse classifier ensembles using artificial training examples, 18th Int. Joint Conf. on AI, 2003, pp. 505–510.[37] P.J. Modi, W.M. Shen, Collaborative multiagent learning for classification tasks, Proc. 5th Int. Conf. on Autonomous Agents, Montreal, Canada, ACM Press,

2001, pp. 37–38.[38] M. Mozina, J. Zabkar, T. Bench-Capon, I. Bratko, Argument based machine learning applied to law, Artificial Intelligence, vol. 13, no. 1, Springer, 2005,

pp. 53–73.[39] M. Olave, V. Rajkovi, M. Bohanec, An application for admission in public school systems, Expert Systems in Public Administration (1989) 145–160.

56 M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57

[40] I. Olmeda, E. Fernandez, Hybrid classifiers for financial multicriteria decision making: the case of bankruptcy prediction, Computational Economics 10(1997) 317–335.

[41] E. Oliva, M. Viroli, A. Omicini, P. McBurney, Argumentation and artifact for dialogue support, Proc. 5th Int. Workshop on Argumentation in MultiagentSystems (ArgMAS'08). Lisbon, Portugal, 2008, pp. 24–39.

[42] E. Oliva, P. McBurney, A. Omicini, Co-argumentation artifact for agent societies, Argumentation in Multi-Agent Systems, Springer, 2007 pp. 31–46.[43] S. Ontaòŭn, E. Plaza, Cooperative multiagent learning, Proc. AAMAS 2002, LNCS, vol. 2636, Springer, Heidelberg, 2003, pp. 1–17.[44] S. Ontaòŭn, E. Plaza, An argumentation-based framework for deliberation in multi-agent systems, Argumentation in Multi-Agent Systems, Springer, 2008,

pp. 178–196.[45] S. Ontaòŭn, E. Plaza, Learning, information exchange, and joint-deliberation through argumentation in multi-agent systems, On the Move to Meaningful

Internet Systems: OTM 2008 Workshops, Springer, 2010, pp. 150–159.[46] D. Opitz, R. Maclin, Popular ensemble methods: an empirical study, Artificial Intelligence Research 11 (1999) 169–198.[47] S. Peng, S. Mukhopadhyay, R. Raje, M. Palakal, J. Mostafa, A comparison between single-agent and multi-agent classification of documents, Proc. 15th Int.

Parallel and Distributed Processing Symposium, 2001, pp. 935–944.[48] H. Prakken, Formal systems for persuasion dialogue, Knowledge Engineering Review 21 (2) (2006) 163–188.[49] H. Praken, An abstract framework for argumentation with structured arguments, Argument and Computation 1 (2) (2011) 93–124.[50] A. Prodromides, P. Chan, S. Stolfo, Meta-learning in distributed data mining systems: issues and approaches, Advances in Distributed and Parallel Knowledge

Discovery, AAAI Press/The MIT Press, 2000, pp. 81–114.[51] J.R. Quinlan, Combining instance-based and model-based learning, Proc. 10th Int. Conf. on Machine Learning, Amherst, MA, Morgan Kaufmann, 1993, pp. 236–243.[52] R. Rak, L. Kurgan, M. Reformat, A tree-projection-based algorithm for multi-label recurrent-item associative-classification rule generation, Journal of Data

and Knowledge Engineering 64 (1) (2006) 171–197.[53] G.J. Simon, V. Kumar, P.W. Li, A simple statistical model and association rule filtering for classification, Proc. 17th ACM SIGKDD Int. Conf. on Knowledge

Discovery and Data Mining, 2011, pp. 823–831.[54] C.M. Teng, Correcting noisy data, Proc. 6th Pacific Rim Intl. Conf. on AI, Springer-Verlag, 2000, pp. 188–198.[55] L. Yu, S.Y. Wang, K.K. Lai, Credit risk assessment with a multistage neural network ensemble learning approach, Expert Systems with Applications 34 (2)

(2008) 1434–1444.[56] M. Wardeh, T. Bench-Capon, F.P. Coenen, Multi-party argument from experience, Proc. 6th Int. Workshop on Argumentation in Multiagent Systems

(ArgMAS'09), Budapest, Hungary, 2009.[57] M. Wardeh, T. Bench-Capon, F.P. Coenen, Arguments from experience: the PADUA protocol, Proc. 2nd Conf. on Computational Models of Argument

(COMMA'08), Toulouse, France, IOS Press, 2008, pp. 405–416.[58] M. Wardeh, T. Bench-Capon, F.P. Coenen, Arguing from experience using multiple groups of agents, Argumentation and Computation 2 (1) (2011) 51–76.[59] G.I. Webb, MultiBoosting: a technique for combining boosting and wagging, Machine Learning, vol. 40(2), Springer, 2000, pp. 159–196.[60] F. Waismann, The Principles of Linguistic Philosophy, St Martins Press, NY, 1965.[61] D. Walton, C. Reed, F. Macango, Argumentation Schemes, Cambridge University Press, 2008.[62] L. Wittgenstein, The Blue and Brown Books, Blackwell, Oxford, 1985.

Maya Wardeh is a PhD student in theUniversity of Liverpool under the supervision of Frans Coenen and Trevor Bench Capon. Her researchfocuses on integrating dataminingandKnowledgeDiscovery inData (KDD)with thefield of Argumentation inMulti Agent Systems. Shehaspublished a number of papers in the fields of: Argumentation, Multi-Agent Systems, Data Mining and AI.

Frans Coenen has a general background in AI, and has been working in the field of data mining and Knowledge Discovery in Data(KDD) for the last 12 years. He is particularly interested in: Social Network Mining; TrendMining; the mining of non-standard datasets such a Graph, Image and document bases; and the practical application of data mining in its many forms. He currently leads asmall research group (13 PhDs and 2 RAs) working on many aspects of data mining and KDD. He has some 200 refereed publica-tions on KDD and AI related research, and has been on the programme committees for many KDD events. Frans Coenen is currentlya senior lecturer within the Department of Computer Science at the University of Liverpool where he is the director of studies forthe department's on-line MSc programmes.

Trevor Bench-Capon read Philosophy and Economics at St John's College Oxford, where he also took a PhD. He worked for 6 years in theDepartment of Health and Social Security, in policy and computer branches before going to Imperial College, London to research into logicprogramming applied to legislation. Hewas appointed lecturer in computer science at the University of Liverpool in 1987, senior lecturer in1992, reader in computer science in 1999, andprofessor of computer science in 2004. His current focus is on dialogue andargument. He haspublished some seventy journal articles and many conference papers, and is the author of several books.

57M. Wardeh et al. / Data & Knowledge Engineering 75 (2012) 34–57