TeLQAS: a Realization of Humanlike Inferences for Knowledge-based Question Answering Systems

1

TeLQAS: a Realization of Humanlike Inferences for Knowledge-based Question Answering Systems

E. Darrudi ECE Department, Faculty of Engineering, University of Tehran, [email protected]

F. Oroumchian Wollongong University, Dubai Campus, [email protected]

M. Rahgozar Control and Intelligent Processing Center of Excellence, ECE Department, Faculty of Engineering, University of Tehran, [email protected] M.S. Mirian

K. Neshatian

B.R. Ofoghi Iran Telecom Research Center, {e_mirian, neshat, br_ofoghi}@itrc.ac.ir

This paper reports on the implementation and evaluation of a knowledge-based domain-specific question answering system called TeLQAS. This system employs a reasoning engine built based on an extended version of Human Plausible Reasoning theory. The knowledge base of the system has been filled manually with logical statements about Fiber Optics. An experiment has been conducted by issuing 70 natural language queries to the system and having the answers judged by three domain experts. Overall, the system has performed reasonably well producing 60% correct answers. Moreover, system’s estimation of certainty on the correct answers it has produced differs only 4% from that of human experts. 1 Introduction In recent years, Question Answering (QA) systems have evolved out of the field of Information Retrieval to meet better the needs of information seekers. Unlike simple keyword-based information retrieval systems, they aim to communicate directly with users through a natural language. They accept natural language questions and return exact answers eliminating the burden of query formulation and reading lots of irrelevant documents to attain the answer.

Open-domain QA systems deal with unrestricted questions upon large-scale text corpora typically by means of statistical approaches whereas restricted-domain systems endeavor to concentrate on a controlled domain of interest (e.g., weather forecast or UNIX technical manuals). Open-domain QA systems follow more or less a common structure composed of query formulation, traditional document or passage retrieval and answer extraction and scoring stages. This architecture is found in all systems compete in the QA track of TREC [16] conferences which are held annually for the evaluation of open-domain QA systems. These systems are successful when there are some lexical correspondences between the question and answer words because they rely primarily on traditional keyword-based and statistical methods.

On the other hand, we witness a significant variety in the architectures of restricted-domain QA systems developed for specific domains using specific methods. Technical domains generally enjoy a simpler syntax which allows for more effective use of NLP-intensive techniques and, thus, better ‘understanding’ of the question and the information resources.

There is another conceivable category of QA systems which rely mostly on knowledge bases for question answering. Although the current state of art in natural language processing (NLP) prevents the development of knowledge bases from open text, however, it is possible to realize such knowledge bases for restricted domains through manual or semi-automatic knowledge engineering. In knowledge-based QA systems the domain knowledge is explicitly expressed in formal representation languages allowing for sound and effective automated reasoning to answer novel (unseen) questions. Moreover, these systems can

2

be equipped with answer explanation and justification modules which produce human readable reasons for their answers. These explanations may be very helpful since users trust a justified answer more than bare answer.

In This paper, we describe our restricted-domain knowledge-based QA system named TeLQAS (for Telecommunication Literature Question Answering System) which exploits the cognitive theory of Human Plausible Reasoning (HPR) [3] in its core. At the end of the project, it is intended to serve beginners and intermediate users of Fiber Optics of Telecommunication. Utilizing HPR as a cognitive theory enables TeLQAS to reason in the semantic layer providing plausible answers and justifications to the end users. Additionally, it rates its answers with certainty parameters which enable the users to take the answers with a grain of salt when the underlying data is uncertain, partial or contradictory. All the QA components, except for the underlying knowledge base, are domain independent so the system can be ported to any potential domain given the appropriate knowledge bases.

The remaining of this paper is structured as follows. In section 2, we review the literature for related work on knowledge-based QA and HPR. A brief introduction to HPR is presented in section 3 which is necessary to comprehend the rest of the paper. The architecture of TeLQAS is discussed in section 4. Section 5 describes the knowledge representation in our QA system and Section 6 explains the question processing unit of TeLQAS. In section 7 the inference engine is presented which is the heart of TeLQAS. Our conducted experiments are discussed in section 8 and section 9 concludes the paper and lists our perceived ideas about future work. 2 Related Work One of the most ambitious recent investments in knowledge-based question answering systems is Project Halo [20], “a staged, long-term research and development initiative toward the development of a ‘Digital Aristotle’ capable of answering novel questions and solving advanced problems in a broad range of scientific disciplines.” In the pilot phase of the project the state of the art in knowledge representation and reasoning was applied for a limited syllabus in chemistry with promising results (see for example [21]). Phase two of the project is going to promote technologies to ease the task of knowledge entering and formulation for domain experts reducing the cost of such knowledge-based systems.

In [19] an application of knowledge-based QA systems is studied for a home agent robot. In [18] authors presents an approach for augmenting online text with knowledge-based question answering capabilities. As they argue, their prototypes have been well received but none have achieved regular usage primary due to the incompleteness of the underlying knowledge bases.

The theory of Human Plausible Reasoning has been employed successfully in several information-based applications so far. In [6], the author describes a pilot version of the theory on a problem in the domain of the chemical periodic table. Kelly [7] developed an expert system for grass identification based on HPR. Oroumchian applied HPR for an experimental information retrieval system for open text called PLIR [8]. HPR has been also bifacial in retrieval of structured data such as XML [26]. In [9], [10] and [11] authors suggest some applications of the theory for adaptive filtering, intelligent tutoring and document clustering, respectively. The theory has also been advantageous in designing intelligent graphical user interfaces which offer pieces of advice to their users [12] [13] in interaction.

In [24] we present a knowledge-based QA system empowered with HPR for a non-technical general domain. The experiments conducted with CIA World Factbook (CWFB) knowledge base [22] [23] showed that system’s answers are analogous to human beings’ replies when the required commonsense knowledge is added to the underlying factual data provided by CWFB. In [25] we explain the pilot phase of a project in developing a practical knowledge-based QA system for the technical domain of Telecommunications. Evaluations with a limited number of HPR’s inferences (four inference types) on a small knowledge base in Fiber Optics and Mobile Technologies reconfirmed the flexibility of HPR for applications involving in realistic reasoning about uncertain and incomplete data. 3 HPR as a theory for QA For 15 years, Collins and his colleagues have been collecting and organizing a wide variety of human plausible inferences people do in everyday’s life [1]. His collaboration with Michalski led to development of a formal system based on Michalski’s variable valued logic [2] that characterized different patterns of plausible inferences people use in reasoning about the world [3] [4]. They attempted to formalize plausible inferences that frequently occur in people’s responses to questions for which they don’t have ready answers. The HPR theory includes a simple notation to express facts about the world, a group of certainty parameters

3

and a set of basic plausible inferences which act upon those facts and their corresponding parameters to draw plausible conclusions. In the formal notation of the theory, a Statement like “we are almost sure that ducks quack” is represented as:

Sound (Duck) = {quack}, γ = 0.90

where Sound is a Descriptor, Duck is an Argument, quack is a Referent, and γ is the certainty in the statement (in this case it shows high confidence). There are several other certainty parameters that give a descriptive power to plausible expressions. In our implementation of HPR we make use of three other certainty parameters: dominance (δ), similarity (σ) and conditional likelihood (α).

The δ parameter specifies the dominance degree of a subset in its superset for hierarchical relations such as ISA and Part-of. For example the dominance of cats is greater than turtles in the set of pet animals. The σ parameter demonstrates the degree of similarity between two concepts in a particular context. The following statement expresses ducks are very similar to geese according to their physical characteristics.

Ducks SIM Goose, CX: Physical Characteristics, σ = high

In addition to above simple statements, Dependencies and Implications can also form logical expressions in HPR. A marked dependency shows that the increase in one side of the relation makes the other side concept decrease (negative dependency) or increase (positive dependency). For example, the following dependency cites that the temperature of air highly affects its water holding capacity. The α parameter specifies the conditional likelihood of the consequence being true, given the antecedent is valid.

Temperature (air) + Water Holding Capacity (air), α = High

Unmarked dependencies just show that one concept has an effect on another, e.g., Terrain (area) Agricultural Products (area). Implications are bounded dependencies capable of representing simple rules in the knowledge base. The below expression mentions: “if a country has a tropical climate, then we can expect mangos in its agricultural products with 70% probability.”

Climate (Country) = {Tropical} Agricultural Product (Country) = {Mangos}, α = 0.7

The theory has a rich set of transforms and inferences to draw plausible conclusions from uncertain premises. Section 7 explains in detail these and some other handy inference patterns we added to HPR to boost the reasoning capabilities of TeLQAS. 4 TeLQAS QA Architecture TeLQAS is composed of two major processes, online and offline, as depicted in figure 1. In brief, the online process interacts with users and answers the questions whereas the function of offline process is to provide the required information for answering the questions. In the offline process, domain experts are trained to extract useful relations from their memory or relevant documents. These relations are filled in predefined templates which can be easily inserted into the Domain Ontology.

The online process carries out the question answering tasks. Users enter their question through a web interface which is passed to the Question Processing unit. The question is then parsed and transformed to the HPR’s notation intelligible for the Inference Engine. The system then launches a time-constrained reasoning process and returns the answer and its justification via web interface back to the users when possible. In case of failure, the Summarization module retrieves a few articles from web using Web Interface and summarizes them to be presented to the user. In recent TREC [16] conferences answers exposed as summarized paragraphs are considered inexact thus incorrect. Therefore to benchmark our system we spared the effect of the summarization component though we found its answers useful for the users in real question answering situations.

4

Figure 1 The Architecture of TeLQAS The Domain Ontology bridges two data flows of offline and online sections. It stores permanently the whole knowledge of the system about the working domain. It is developed and maintained using Protégé [14] ontology construction tools. Using this standard tool we can easily change or expand the working domain of the system by plugging other ontologies developed by Protégé. In the online process, the domain knowledge is however converted to a temporal semantic network representation optimized for high speed reasoning. This Online Knowledge Base is reloaded at intervals to reassure its consistency with the underlying domain ontology.

In the following sections we explain the essential components of TeLQAS involved in real-time question answering. 5 Online Knowledge Base To reason efficiently, one should provide mechanisms to represent effectively the knowledge about the working domain. In TeLQAS, the domain knowledge is loaded into a particular implementation of semantic networks at runtime. In semantic networks concepts are represented with nodes in the knowledge graph. Relations are usually denoted by links between these concepts. We extend this idea in which we treat both concepts and relations as simple nodes. This way, each node in the semantic network may represent, in terms of HPR, an argument, referent or descriptor. This scheme, gives extra power to the knowledge representation, lets relations own properties and relations to other concepts. Indeed, it is a relaxed implementation of conceptual graphs [15] extended to fulfill representation of more complex information such as rules and contextual data. Figure 2 depicts some statements about WDM (Wavelength Division Multiplexer) in this representation. 6 Question Processing The question processing unit parses the natural language question posed via the interface and converts it to a set of partial statements called Plausible Question. Each plausible question is composed of one or more statements, among them one is the main question and the others specify additional information extracted from he question such as its context or time domain. Table 1 shows several natural language questions and their corresponding plausible representations fabricated by the question processing unit.

Question Processing

User Interface

Question Answer

OOfffflliinnee PPrroocceesssseess

Inference Engine

Text Summarization

Experts

Web Interface

Knowledge Extraction

User

Online Knowledge Base

OOnnlliinnee PPrroocceesssseess

Domain Ontology

5

Figure 2 Representation of some typical statements in TeLQAS Table 1 Some natural questions converted to plausible questions

Example Natural Language Question Plausible Question

1 What is fiber channel? Definition (Fiber Channel) = {?}

2 What is the intended environment for fiber cable?

Intended Environment (Fiber Cable) = {?}

3 How is water migration prevented in the cable design?

1) IS Prevented (Water Migration) = {true}

2) Domain (1) = {Cable Design}

3) Method (1) = {?}

4 Which modulation scheme is used in UMTS?

1) ISA (?) = {Modulation Scheme}

2) Used In (?) = {UMTS}

5 How will GSM evolve? 1) Evolve (GSM) = {true}

2) Method (1) = {?}

3) Time Domain (1) = {Future}

In Table 1 the first and second questions simply are mapped to plausible questions with missing referents which are going to be filled by the inference engine. For the third inquiry the composed plausible question states that we know Water Migration is prevented (1) in the Cable Design (2) and we want to know how (3). The forth plausible question specifies that we are looking for a concept which is a Modulation Scheme (1) and it is used in UTMS (2). The fifth question implies that we know in future (3) the GSM will evolve (1) and we want to know how (2). The question processing unit employs a rule-based parser to identify the constituent statements and their known descriptors, arguments and referents. The parser endeavors to understand the natural language structure of the question based on a set of predefined regular expressions. At runtime, the parser tries to match the expressions against the string of part-of-speech (POS) tagged question words. Matching expressions fire corresponding templates to create the plausible question based on known parameters. Table 2 enumerates some natural language questions, their POS tagged representations, corresponding fired regular expressions, PQ builder templates, and finally the resultant plausible questions1.

1 In the table W, R, X, H, N stands for ‘what’ keyword, preposition, auxiliary verb, ‘how’ keyword and noun, respectively.

WDM

Filtering Device

ISA

Definition

a passive fiber optical device used to separate signals of different wavelengths carried on one fiber.

Bandwidth High

SIM

CWDM

CX

Transmission link DIS

DWDM CX

Channel Spacing

σ

80%

Material Dispersio

DEP

δ

?

α 30%

6

Table 2 Converting natural questions to their plausible representations

Sample Question POS Tagged

Question Regular

Expression Template Plausible Question

What is fiber optics? What “Auxiliary Verb” Noun?

WX*N Definition (Noun1) = {?}

Definition (fiber optics) = {?}

What are the key features of fiber cable?

What “Auxiliary Verb” Noun1 Prep. Noun2?

WXNR*N N1 (N2) = {?}

key feature (fiber cable) = {?}

How do dual loops improve performance?

How “Auxiliary Verb” Noun1 Verb Noun2?

HXNVR*N

1) Verb (Noun1) = {Noun2} 2) Method (1) = {?}

1) Improve (dual loops) = {performance} 2) Method(1) = {?}

To extract representative regular expressions, we examined more that 200 representative FAQs in the domain of telecommunications mined from web. Question templates with greater frequencies and common structures were chosen to form a basis for enriching the rule base. The regular expressions and plausible question templates were drawn out accordingly. The use of regular expressions implies that a question may match more than a rule at the same time. In this case, priorities are assigned to queries and the inference engine will answer them in order of their priorities. 7 Inference Engine The inference engine is the heart of the online process. It accepts a set of plausible questions received from the question processing unit and will deduce the answer using the facts contained in the online knowledge base. A high performance reasoning algorithm based on HPR searches the knowledge space for the missing information by applying basic plausible inferences. Below we first enumerate basic plausible inferences in the systems and then elaborate on the reasoning algorithm built upon those basic inferences. 7.1 Basic Plausible Inferences The theory of HPR provides several basic plausible inferences acting on the statements to produce new statements. There exist 18 inference patterns in TeLQAS which are explained in the following subsections. 7.1.1 Generalization and Specification Transforms It includes generalization and specification transforms applicable to arguments, referents, or descriptors providing a sum of six inferences2. These transforms traverse inheritance hierarchies such as ‘part of’, ‘is a’ (called Gen & Spec in the original theory of HPR) and ‘instance of’ to produce new statements. Figure 3 demonstrates ISA-based Argument Specification Transform along with an example from the TeLQAS’s KB. The generalization transforms for referents and descriptors are done quite the same. Specification transforms are similar to Generalizations except for traversing ISA hierarchies downward. The certainty in the conclusion, in all inferences, is a function of all involved parameters which is denoted with a γcon in the figure 3. In our implementation it is the normal product of all concerned parameters allowing uncertainty to propagate from the premises to the conclusions.

2 The original theory of HPR supports only transforms on arguments and referents. Nonetheless we found it useful to extend this idea to descriptors too.

7

Figure 3 Argument Specification Transform In the original theory of HPR, ISA transforms happen in some contexts just like similarity and dissimilarity transforms. Nonetheless we had to simplify these transforms by removing contextual data to make the knowledge extraction phase straightforward as it is discussed in section 8. 7.1.2 Similarity and Dissimilarity Transforms Similarity is one of the most important transforms in HPR which tries to capture the analogy inference in human beings. As for generalization and specification, it is applicable to arguments, referents and descriptors in TeLQAS. Figure 4 shows the arguments similarity transform with an example.

Figure 4 Argument Similarity Transform To be valid, the similarity transform has to find a dependency between the question’s descriptor and the context of similarity (reasoning line 3 in figure 4). However, when there isn’t such a dependency in the KB the transform postulates that the dependency holds and continues while decreasing the certainty in the conclusion to reflect its uncertainty. These assumptions (in terms of ‘concept 1 must depend on concept 2’) are posed to the user for verification along with the answer. It is explicitly stated that the answer is correct only if the assumption is true (conditional answer).

1. Des1 (Arg1) = {Ref} : γstat 2. Arg1 SIM Arg2, CX: Des1 : γsim, σ 3. Des1 Des2 : γdep, α ----------------------------------------------------------- 4. Des2 (Arg2) = {Ref} : γcon = Π(γstat, γsim, γdep, σ, α) Example: how is the dispersion in Single Mode Fiber? 1. Dispersion(NZ_DSF)={Limited value at operating wavelength} : [Certainty = .99] 2. SMF is similar to NZ_DSF, in the context of their Transmission Medium : [Certainty = .99 , Similarity = .7] 3. Dispersion depends on Transmission Medium: [Certainty = ?] :::Argument Similarity: 4. Dispersion(SMF)={Limited value at operating wavelength} : [Certainty = .68]

1. Des (Arg1) = {Ref} : γstat 2. Arg1 ISA Arg2 : γisa, δ --------------------------------------------------------------------------- 3. Des (Arg2) = {Ref} : γcon = Π(γstat, γisa, δ) Example: What is Metallic Layer? 1. Definition(Armor)={A protective layer, usually metal, wrapped around a cable.} : [Certainty = .99] 2. Armor is a Metallic layer : [Certainty = .99 , Dominance = .7] :::Argument Specification: 3. Definition(Metallic layer)={A protective layer, usually metal, wrapped around a cable.} : [Certainty = .69]

8

Figure 5 Argument Dissimilarity Transform Dissimilarity Transform follows the same pattern however it leverages dissimilarities between two concepts to produce evidence against a potential answer. Figure 5 shows this inference together with an example. Similarity and dissimilarity transforms for referents and descriptors are done similarity. 7.1.3 Derivation from Marked Dependency These inferences take advantage of positive or negative dependencies to produce qualitative answers whenever possible. Figure 6 shows a Derivation from Positive Dependency inference in conjunction with an example. The negative variety follows the same pattern.

Figure 6 Derivation from Positive Dependency 7.1.4 Derivation from Implication Basically, derivation from implication is the parameterized equivalent of modus ponens in classical logic as depicted in figure 7.

1. Des1 (Arg1) = {Ref} : γstat 2. Arg1 DIS Arg2, CX: Des1 : γdis, σ 3. Des1 Des2 : γdep, α ----------------------------------------------------------- 4. Des2 (Arg2) ≠ {Ref} : γcon = Π(γstat, γsim, γdep, σ, α) Example: What is the channel spacing of DWDM? 1. Channel Spacing(CWDM)={20 nanometer} : [Certainty = .99] 2. DWDM is dissimilar to CWDM, in the context of their Channel Spacing and Application : [Certainty = .99 , Similarity = .1] 3. Channel Spacing depends on Channel Spacing : [Certainty = 1.0, Likelihood = 1.0] :::Argument Dissimilarity: 4. Channel Spacing(DWDM)≠{20 nanometer} : [Certainty = .88]

1. Des1 (Arg1) = {High | Medium | Low} : γstat 2. Des1 (Arg) + Des2 (Arg) : γdep, α ------------------------------------------------------------------------------ 3. Des2 (Arg1) = {High | Medium | Low} : γcon = Π (γstat, γdep, α) Example: How much is the distortion in a Large Effective Area Fiber? 1. Nonlinearity(Large Effective Area Fiber)={Low} : [Certainty = .99] 2. Nonlinearity positively affects Distortion : [Certainty = .99 , Likelihood = .99] :::Derivation from Positive Dependency: 3. Distortion(Large Effective Area Fiber)={Low} : [Certainty = .97]

9

Figure 7 Derivation form Implication 7.1.5 Dependency-based Analogy Dependency-based Analogy is a relaxation of the similarity transform which allows finding contexts for similarities at runtime. It is not an original HPR’s inference and has been adopted from [7]. As figure 8 shows, many premises and parameters are involved in this inference thus the certainty in the conclusion is not usually high.

Figure 8 Dependency-based Analogy 7.1.6 Comparison Our initial mining of frequently asked questions in the field of Telecommunications revealed that there are recurring questions asking for similarities and differences of two concepts or for information about advantages/disadvantages of a product over another one. It persuaded us to include comparison inference in the systems though it is not in HPR. The inference compares common properties (direct or inherited) of two concepts and produces explanations accordingly.

1. Des2 (Arg1) = Ref2 : γstat1 2. Des1 (Arg) Des2 (Arg) : γdep, α 3. Des1 (Arg2) = Ref1 : γstat2 4. Des1 (Arg1) = Ref3 : γstat3 ------------------------------------------------------------------------------ 5. Des2 (Arg2) = {Ref2} : γcon = Π (γstat1-3, γdep, α) Example: Tell me about the transmission distance in water media. 1. Transmission distance(Free Space)={short} : [Certainty = .65] 2. Energy loss has an effect on Transmission distance : [Certainty = .9 , Likelihood = .8] 3. Energy loss(Water media)={considerable} : [Certainty = .99] 4. Energy loss(Free Space)={considerable} : [Certainty = .99] :::Dependency-based Analogy: 5. Transmission distance(Water media)={short} : [Certainty = .45]

1. Des1 (Arg1) = {Ref1} : γstat 2. Des1 (Arg) = {Ref1} Des2 (Arg) = {Ref2} : γimp, α 3. Arg1 ISA Arg : γisa --------------------------------------------------------------------------------- 4. Des2 (Arg1) = {Ref2} : γcon = Π(γstat, γdep, γisa, α) Example: Can you tell me about the signal distortion in malfunction systems? 1. Dispersion(Malfunction System)={High} : [Certainty = .99] 2. If Dispersion(System)={High} Then Signal Distortion(System)={excessive} : [Certainty = .99 , Likelihood = .7] 3. Malfunction System is a System : [Certainty = .99] :::Derivation from Implication: 4. Signal Distortion(Malfunction System)={excessive} : [Certainty = .68]

10

7.1.7 Active/Passive Conversion Inference Another addition to the original set of HPR’s inferences is the Active/Passive Conversion inference. It simply converts active questions to passives, and vice versa, by changing the descriptor and displacing the argument and referent. It is only applicable to statements whose descriptors represent a verb. The inference is helpful when the form of the question is different from the prospective answer available in the KB. 7.2 The Reasoning Algorithm In general, invoking a basic inference may not answer immediately the question at hand. The HPR theory does not specify how the basic inferences may be combined together in a controlled manner to search the answer space. Here we propose an algorithm to manage the execution of basic inferences by taking into account several confining criteria to make it tractable. In this algorithm, basic plausible inferences can join together to form a chain of reasoning as the preconditions of each basic inference can be the queries for another inferences. The reasoning stops when either an exact answer is found in the KB (backward chaining reasoning) or some confining conditions are met such as exceeding the maximum reasoning depth or the number of found answers to that point.

The reasoning engine accepts a plausible question converted from a natural language question by the question processing unit. At first, it searches for the answer in the KB. If the answer could not be found explicitly the reasoning engine tries to infer it by applying all the possible basic inferences. Each inference creates a new plausible question to work on, for example the generalization transforms make questions with generalized arguments, referents or even descriptors. The inference may launch, in turn, other inferences if the answer for the new question doesn’t exist in the KB and this process continues recursively. In each depth of recursion, after performing all possible inferences, the reasoning engine combines available evidences to produce a list of top answers. Evidence combination is done using the Dempster-Shafer theory of evidence (DST) [27] [28]. The certainty (γ) values of the answers are used as ‘support’ values in DST. DST’s ‘plausibility’ values are set to one and the combination is done using Dempster’s rule. The answer with its combined certainty is retuned back to the calling inference in the prior depth of recursion.

All inferences have been implemented as functions of C# programming language. This way, inferences (functions) are executed optimally using the facilities of the operating system. Figure 9 illustrates the pseudo-code of the Recall function which is the entry point of the reasoning algorithm. The Recall function is the heart of the reasoning algorithm invoked repeatedly by almost all other inferences. Figure 10 presents the pseudo-code for the argument generalization transform. As it is shown, after transforming the original plausible question the inference simply invokes the Recall function and computes the certainty value of the answers, if any, based on the formulas described in section 7.1.

Benchmarks on a big knowledge base (WordNet [29] having more than 1 million relations) revealed that the systems is capable of examining 40-70 thousand basic inferences per second on a 2.4 GHz P4 CPU. This reasoning speed seems substantial considering that each basic plausible inference reasons in the semantic layer and the reasoning result is comparable to those of humans.

Figure 9 The pseudo-code for Recall function

Function: RecallInput: A Plausible Question (PQ) Output: Answers (including corresponding certainties) 1. Reasoning Depth += 1 2. if Reasoning Depth > MAX_DEPTH then return nil 3. if the answer exists explicitly in the KB then return it 4. Answers = nil 5. Answers += Generalization Transforms(PQ) 6. Answers += Specification Transforms(PQ) 7. Answers += Similarity Transforms(PQ) 8. Answers += Dissimilarity Transforms(PQ) 9. Answers += Derivation from Dependency Inference(PQ) 10. Answers += Derivation from Implication Inference(PQ) 11. Answers += Dependency-based Analogy Inference(PQ) 12. Answers += Active/Passive Conversion Inference(PQ) 13. Answers = Dempster-Shafer Combination(Answers) 14. Return top MAX_ANSWERS Answers

11

Figure 10 The pseudo-code for argument generalization transform 8 Experiments This section explains our experiments conducted to verify the validity of TeLQAS’s responses in real question answering circumstances. First we describe the process in which the underlying QA knowledge base was created and enriched. Then we present our question answering experience with TeLQAS followed by a failure analysis. 8.1 The Creation of the Domain Ontology Collecting and encoding the requisite knowledge for knowledge-based systems is a cumbersome task. Classical expert systems required years to be crafted by perfect and highly skilled knowledge engineers. For TeLQAS we were able to eliminate the need for knowledge engineers and let our domain experts encode the knowledge directly with modest training. It is mainly because the concepts and inferences of HPR are founded on the actual reasoning process which takes place in human’s dialogues so they are not unfamiliar for naïve people.

The domain experts were asked to fill simple templates with triple relations they were familiar with (from different aspects of Fiber Optics technology). The basics of HPR were explained to them in advance to make them understand what types of relations were needed. It was emphasized that the main source of information would be their memory. One difficulty arouse in this phase was the problem of specifying contexts for ISA relations. Unlike the similarity relations, people tend to memorize exceptional contexts for ISA relations, that is, to learn those context where the ISA relation doesn’t hold. For example to say puppies are dogs one may consider the context ‘all properties’ except for the size, age, etc. Our experience with the experts showed that the set of ‘all properties’ and the list of exceptions are often open and difficult to extract and encode so we spared the contexts for ISA relations.

Our three part-time domain experts worked for about one month and the resulting ontology contained about 4,700 concepts (descriptor, argument or referent) [30]. There are about 100 distinct descriptors (relations) in the ontology, of which ISA has the highest frequency followed by similarity (SIM), definition, contextual data (CX), abbreviations, cause & effect, and other relations. 8.2 The Evaluation of TeLQAS Question-Answering (QA) evaluation efforts have largely been tailored to open-domain systems. The TREC QA test collections contain newswire articles and the accompanying queries cover a wide variety of topics. These shared test beds and the common architecture of participating systems in TREC QA track make the systematic evaluation possible. On the other hand, restricted-domain QA systems generally work in different domains which means their knowledge sources are generally dissimilar. For this reason and the lack of a common structure among restricted-domain QA systems, forces these systems to restrict themselves to domain specific evaluations. To build the test question set for evaluating TeLQAS we mined several FAQ lists on Internet pertaining to Telecommunications. Though the FAQs were helpful to capture the type of questions asked by users, which served developing the Question Processing unit, they were not apposite for the evaluation since the

Function: Argument Generalization Transform Input: A Plausible Question (PQ) Output: Answers (including corresponding certainties) 1. Parents = in the KB, find all ISA parents of PQ.Argument 2. Answers = nil 3. for each Parent in Parents 4. NewPQ = PQ 5. NewPQ.Argument = Parent 6. Answer = Recall(NewPQ) 7. Answer.Certainty = Π(γanswer, γisa, δisa) 8. Answers += Answer 9. end of the loop 10. Answers = Demspter-Shafer Combination (Answers) 11. Return top MAX_ANSWERS Answers

12

range of involving concepts for answering those questions were too broad and often out of the field of Fiber Optics. Only the questions that TeLQAS knew at least something about them were used in the test-bed. Besides those, new qualitative questions which didn’t exist in the original FAQs were added. The final question set included 70 questions covering various aspects of Fiber Optics. These questions were fed to the system and the answers were analyzed. Three domain experts (the same ones who created the knowledge base initially) judged the answers. We used a majority vote of experts to decide whether an answer is correct or not. We also asked the judges to express the degree of correctness when the answers were true. For conditional answers (answers with unsatisfied suppositions made by similarity transforms) the conditions were checked, too. In case of failure the referees were obliged to check the system’s justifications and discover the reasoning errors. As the system’s justifications were rather close to the natural language little training was required to accomplish this task. Table 3 shows the evaluation results. Table 3 The evaluation Results of TeLQAS

Questions Parsed Incorrectly 20 Correct Answers 42 Incorrect Answers 6 Invalid Questions 1 Unknown Answers 1 Total 70

From 70 questions, the question processing unit failed to parse correctly 20 questions which produced an “I don’t understand your question” answer. There were 42 correct answers and 6 answers categorized as “Incorrect Answers” by more than one expert. There was also an invalid query when the asked question by the user was meaningless with regard to the context. For Q46: “how is the scattering in Copper Wires?” the experts stated that the concept of scattering is meaningful only for fibers and not for copper wires. When none of the experts were certain about the correctness of an answer we marked it as Unknown Answer. The 42 correct answers out of 70 correctly represent an accuracy of 60%. For the correct answers we calculated the average difference between the system’s stated degree of certainty in its answers and those of the experts. The average was only 4% which shows TeLQAS’s belief in correctness of answers is very close to those of human experts.

To compute the exact accuracy of the reasoning algorithm, we removed those questions unrecognizable by the question processing unit plus those which the system had found their answers directly in the KB without engaging in deep reasoning. As depicted in table 4 the accuracy of the reasoning algorithm is about 82%. Table 5 depicts the percentage of usage in this experiment for each kind of inferences. Table 4 The evaluation of the reasoning engine

Correct Answers 28 Incorrect Answers 6 Total 34

8.2 Analysis of Failure Most of the TeLQAS’s errors are rooted in the inability of the question processing unit to parse the questions correctly. Though the accuracy of the question processing unit was fine for the original development questions, however, this error rate for test questions proved the variety of natural language questions are more than we expected at first.

13

Table 5 The percentage of usage for each type of inferences

Inference Percentage of usage Argument Similarity 15.2 Referent Specification 12 Derivation from Negative Dependency 11.2 Argument Generalization 9.6 Active/Passive Conversion 8.8 Referent Generalization 8 Descriptor Generalization 8 Dependency-based Analogy 8 Derivation from Positive Dependency 5.6 Referent Similarity 4 Comparison 2.4 Argument Dissimilarity 2.4 Descriptor Specification 1.6 Descriptor Similarity 1.6 Derivation form Implication 1.6 Referent Dissimilarity 0 Descriptor Dissimilarity 0

For the incorrect answers we were anxious to know the conditions that led the systems to draw wrong conclusions. The analysis of the system’s justifications for the wrong answers exposed the fact that most of the reasoning errors are caused by erroneous underlying axioms in the KB. Although there were about 2600 statements in the KB, only 112 statements were used to generate answers for the 70 questions. From these 112 statements from 1 to 12 statements were marked wrong by different judges (who each initially had produced a number of those statements when building the domain ontology). One explanation for this inconsistency can be the disagreement among the experts’ viewpoints on the domain facts. Another reason is that in HPR simple statements are kept without context. Therefore the knowledge extractor may assume a particular context for his/her statement which may be inconsistent with the question’s context at runtime. For example for question Q11: “What is the Regenerators' Number in Optical Systems?” the system’s answer is “high”. The system has justified its answer as below:

1. Jitter(Optical Systems)={Low} : [Certainty = .99] 2. Jitter negatively affects Regenerators' Number : [Certainty = .99 , Likelihood = .99] :::Derivation from Negative Dependency: 3. Regenerators' Number(Optical Systems)={High} : [Certainty = .97]

One judge declared that the second statement was wrong in which jitter negatively affects the regenerations’ spacing and not their number (disagreement about the fact) while another judge asserted the first statement is not always true and depends on the optical component (mismatching context). Other sources of faults are the incompleteness of parameters in the dependency-based inferences. For example for question Q13: “how much is the link budget of SM Fiber?” the system’s answer is “Low” and the system justified it as:

1. Transmission Loss(SM Fiber)={Low} : [Certainty = .99] 2. Transmission Loss positively affects Link Budget : [Certainty = .99 , Likelihood = .99] :::Derivation from Positive Dependency: 3. Link Budget(SM Fiber)={Low} : [Certainty = .97]

For the second statement, one judge declared that Transmission Loss is not the only factor which affects Link Budget and there many other issues such as the length of the link involved. Because of this fact the domain expert should have entered much smaller value than .99 for the likelihood parameter. Finally, excluding contexts from ISA relations caused the system to use them in wrong contexts. Approximately 50% of the reasoning errors are associated with this problem.

14

9 Conclusion In this paper we reported on a knowledge-based domain-specific question answering system. By employing the cognitive theory of Human Plausible Reasoning the system can respond to the domain questions using uncertain, and sometimes contradictory, information extracted by domain experts. In addition, the reasoning engine of the system produces intelligible justifications for its answers. The systems achieved 60% accuracy for 70 natural language questions in the field of Fiber Optics. Moreover, on average the system’s certainties in its answers differed only 4% from domain experts. The majority of reasoning errors has been caused by inaccurate natural language query processing, wrong or contradictory facts, and no context attached to domain facts. In the next step, we will be working on NLP algorithms for converting text directly to logical statements and adding a learning component so the system can learn from its experience.

We believe as the expectations of QA users rise, it becomes inevitable to employ more sophisticated AI techniques in QA systems. At the far end, an ideal QA system will converse with users to fully understand their information needs and with its huge repository of human knowledge it will bear much resemblance to the machine that would finally pass the Turing Test [5]. Acknowledgement This study is funded in part by Iran Telecom Research Center (ITRC). We would like to express our gratitude to Dr. Abbas Zarifkar, the dean of Optical Technology Group of ITRC, for his support through the project, and to our domain experts Amirhosein Tehranchi, Mahdi Hashemi and Reza Ehsani. References

[1] A. Collins, “Fragments of a Theory of Human Plausible Reasoning,” Theoretical Issues in Natural Language Processing II, pp. 194-201, 1978. [2] R.S. Michalski, “Variable-Valued Logic and Its Applications to Pattern Recognition and Machine Learning,” Computer Science and Multiple Valued Logic Theory and Applications, D.C. Rine (Ed.), Amsterdam: North-Holland, 506-534, 1975. [3] A. Collins, R. Michalski, “The logic of Plausible Reasoning A core theory,” Cognitive Science, vol. 13, pp. 1-49, 1989. [4] R.S. Michalski, K. Dontas, D. Boehm-Davis, “Plausible Reasoning: An Outline of Theory and Experiments,” Proceedings of the Fourth International Symposium on Methodologies for Intelligent Systems, pp. 17-19, Charlotte, NC, October 1989. [5] A.M. Turing, “Computing Machinery and Intelligence,” Mind 49: 433-460, 1950. [6] Kejitan Dontas, Maria Zemankova, “APPLAUSE: An implementation of the Collins-Michalski theory of plausible reasoning,” Information Science, 52(2): 111-139, 1990. [7] J.D. Kelly, PRS: A System for Plausible Reasoning, Master’s Thesis, University of Illinois, Urbana-Champaign, 1989. [8] F. Oroumchian, R.N. Oddy, “An Application of Plausible Reasoning to Information Retrieval,” SIGIR 1996: 244-252. [9] F. Oroumchian, B.N. Araabi, E. Ashoori, “An Application of Plausible Reasoning and Dempster-Shafer Theory of Evidence in Information Retrieval,” FSKD 2002. [10] F. Oroumchian, B. Khandzah, “Modeling an Intelligent Tutoring System by Plausible Inferences,” FSKD 2002: 529-533. [11] A. Jalali, F. Oroumchian, “An Evaluation of Document Clustering by means of Plausible Inferences,” International Journal of Computational Intelligence, 2004. [12] M. Virvou, K. Kabassi, “IFM: An Intelligent Graphical User Interface Offering Advice,” In proceeding of 2nd Hellenic Conf. of AI, SETN 2002, Greece, Companion Volume, pp. 155-164. [13] K. Kabassi, M. Virvou, “Combination of a Cognitive Theory with the Multi-Attribute Utility Theory,” In V. Palade, R. J. Howlett, L. Jain (eds.): Knowledge-Based Intelligent Information and Engineering Systems – KES 2003, Lecture Notes in Artificial Intelligence, Vol. 2773, Springer, Berlin, Part I, pp. 944-950. [14] N. F. Noy, W. Grosso, & M. A. Musen. Knowledge-Acquisition Interfaces for Domain Experts: An Empirical Evaluation of Protege-2000. Twelfth International Conference on Software Engineering and Knowledge Engineering (SEKE2000), Chicago, IL, 2000. [15] J.F. Sowa, "Conceptual graphs for a database interface," IBM Journal of Research and Development, vol. 20, no. 4, pp. 336-357, 1976.

15

[16] Ellen M. Voorhees (2003), Overview of the TREC 2003 Question Answering Track, In notebook of the 12th Text REtrieval Conference (TREC’03), 14-27. [17] D. Moldovan, C. Clark, S. Harabagiu, and S. Mariorano (2003), A Logic Prover for Question Answering, In NLG versus Templates, pp. 87-93. In proceeding of 7th European Workshop on Natural Language Generation, Leiden, and The Netherlands. [18] P. Clark, J. Thompson, and B. Porter. A Knowledge-Based Approach to Question-Answering. In the AAAI'99 Fall Symposium on Question-Answering Systems, pages 43-51, CA:AAAI. [19] Hoojung Chung et al. (2004) A Practical QA System in Restricted Domains, In Proceedings of the ACL 2004 Workshop on Question Answering in Restricted Domains, Spain, 2004. [20] http://www.projecthalo.com/ [21] J. Angele, E. Monch, H. Oppermann, S. Staab and D. Wenke (2003) Ontology-based Query and Answering in Chemistry: Ontonova@Project Halo. In Proceedings of the Second International Semantic Web Conference (ISWC2003). Berlin: Springer Verlag. [22] http://www.cia.gov/cia/publications/factbook [23] The resultant KB is available for download at http://sourceforge.net/projects/cwfb-pl [24] E. Darrudi, M. Rahgozar, F. Oroumchian, “Human Plausible Reasoning for Question Answering Systems,” In proceeding of Advances in Intelligent Systems - Theory and Applications, Luxembourg, November 2004. [25] E. Darrudi, F. Oroumchian, B.R. Ofoghi, “Knowledge-Based Question Answering with Human Plausible Reasoning,” In proceeding of 5th International Conference on Recent Advances in Soft Computing (RASC 2004), England, December 2004. [26] M. Karimzadegan, F. Oroumchian, J. Habibi, “XML Information Retrieval by Means of Plausible Inferences”, In Proceedings of the 5th International Conference on Recent Advances in Soft Computing, England, December 2004. [27] A.P. Dempster (1967). Upper and Lower Probabilities Induced by a Multivalued Mapping. Annals of Mathematical Statistics, 38:325—339, 1967. [28] G. Shafer (1976). A Mathematical Theory of Evidence. Princeton University Press. [29] C.D. Fellbaum (Ed.) (1998). WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, London, England. [30] The TeLQAS knowledge base, its statistics and the test questions along with TeLQAS’s answers are available for download at http://sourceforge.net/projects/telqaskb/

TeLQAS: a Realization of Humanlike Inferences for Knowledge-based Question Answering Systems

Documents

Transcript of TeLQAS: a Realization of Humanlike Inferences for Knowledge-based Question Answering Systems