Unsupervised and Open Ontology-Based Semantic Analysis

Unsupervised and Open Ontology-based Semantic

Analysis

Amal Zouaq1, Michel Gagnon2, Benoit Ozell2

1

Simon Fraser University, School of Interactive Arts and Technology, 13450 102 Ave.

Surrey, BC V3T 5X3, Canada 2

Ecole Polytechnique de Montréal, C.P. 6079, succ. Centre-ville, Montréal (QC), H3C 3A7

[email protected], {michel.gagnon, benoit.ozell}@polymtl.ca

Abstract. This paper presents an unsupervised and domain independent

semantic analysis that outputs two types of formal representations: discourse

representations structures and flat scope-free logical forms. This semantic

analysis is built on top of dependency relations produced by a statistical

syntactic parser, and is generated by a grammar of patterns named α-grammar.

The interest of this grammar lies in building a clear linguistically-grounded

syntax-semantic interface using a representation (dependencies) commonly

used in the natural language processing community. The paper also explains

how semantic representations can be annotated using an upper-level ontology,

thus enabling further inference capabilities. The evaluation of the α-Grammar

on a hand-made gold standard and on texts from the STEP 2008 Shared Task

competition shows the interest of the approach.

Keywords: semantic analysis, α-Grammar, patterns, upper-level ontology

1 Introduction

Computational semantics aims at assigning formal meaning representations to natural

language expressions (words, phrases, sentences, and texts), and uses these meaning

representations to draw inferences. Given the progress made in computational syntax,

with the availability of robust statistical parsers, it is now possible to envisage the use

of syntactic parsers for a semantic analysis.

This paper introduces a semantic analysis pipeline based on dependency grammars

that generates two types of semantic representations: flat and scope-free logical forms

and discourse representation structures (DRS) [9]. The pipeline itself includes a

syntactic analysis, a semantic analysis and a semantic annotation which extract

respectively dependency relations, meaning representations and ontology-based

annotations of these meaning representations. The pipeline is modular by nature and

enables an easy change and update of the components involved at each step. The

semantic analysis itself is performed through a grammar of patterns called α-

https://www.researchgate.net/publication/242367562_From_Discourse_to_Logic_Introduction_to_Modeltheoretic_Semantics_of_Natural_Language_Formal_Logic_and_Discourse_Representation_Theory?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz

Grammar. One main interest of such grammar is its ability to provide a syntax-

semantic interface between dependency grammars (which gain more and more

importance in the current NLP research [2]) and semantic formalisms, thus enabling

future reuse from a practical and theoretical point of view.

These formalisms are then annotated using an ontology which defines formally a

reusable set of roles and promotes the interoperability of the extracted representations

between semantic analyzers. In particular, we focus here on upper-level ontologies,

which are independent from any domain and define concepts at a high-level.

After explaining the motivation and theory behind our research (section 2), the

paper presents the α-Grammar which outputs our semantic representations, details

some of its patterns called α-structures and gives examples of the obtained

representations (section 3). Section 4 details the annotation of the semantic

representations and briefly explains the word-sense disambiguation algorithms

involved at this step. Finally section 5 evaluates the logical forms and the discourse

representation structures extracted from two corpora. Section 6 analyzes the obtained

results, draws some conclusions and introduces further work.

2 Motivation, Theory and Practice

The goal of this research is to create an open and unsupervised semantic analysis.

Open analysis means that it can be applied on many types of texts and many domains.

Unsupervised analysis means that we do not provide any training example to the

system.

Open information extraction is one recent challenge of the text mining community

[15] and it is also one objective of the computational semantics community. In fact,

Bos [3, 4] underlines that the availability of robust statistical syntactic analyzers

makes it possible to envisage a deep and robust semantic analysis. One way to

perform this analysis is to build a syntax-semantic interface that is to create semantic

representations from syntactic representations, which are generated by a statistical

syntactic parser. Here we focus on dependency grammars. In fact, dependencies are

recognized as the optimal base for establishing relations between text and semantics

as they abstract away from the surface realization of text and they can reveal non-

local dependencies within sentences [12]. Moreover, there are many semantic theories

based on dependency grammars such as DMRS [5] and the meaning-text theory [10].

Thus, developing a formal method to transform dependency formalism into a

semantic representation is desirable from a practical and theoretical point of view.

Now the question is what kind of semantic representation should be adopted. Here we

focus on two types of representations: predicative logical forms and discourse

representation structures. Depending on the depth of analysis required for a particular

application, one can choose flat and scope-free logical forms or discourse

representation structures, which are powerful representations that cover a wide range

of linguistic phenomena in a unified framework [4].

In order to implement these ideas, we used the Stanford dependency parser [7] to

obtain the syntactic representations. The Stanford dependencies have been used

successfully in several areas [6, 15] and they can be distinguished as stated by [6] by

https://www.researchgate.net/publication/254502455_Wide-Coverage_Semantic_Analysis_with_Boxer?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz


https://www.researchgate.net/publication/220946865_Slacker_Semantics_Why_Superficiality_Dependency_and_Avoidance_of_Commitment_can_be_the_Right_Way_to_Go?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz

https://www.researchgate.net/publication/251574306_The_Stanford_typed_dependencies_representation?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz


https://www.researchgate.net/publication/225702572_Dependency_Pattern_Models_for_Information_Extraction?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz

https://www.researchgate.net/publication/200044364_Generating_Typed_Dependency_Parses_from_Phrase_Structure_Parses?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz

https://www.researchgate.net/publication/234825968_Introduction_to_the_shared_task_on_comparing_semantic_representations?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz

their rich grammatical hierarchy (with the possibility of under-specifying the relation

with the label “dep”) and the fine-grained description of NP-internal dependency

relations. This last characteristic enables a better handling of the meaning of NP

phrases. Next, we decided to use Prolog to implement a grammar for semantic

analysis, also called the α-Grammar.

3 The α-Grammar, a Pattern-based Grammar

Following the approach of [15], we propose an α-Grammar which transforms

dependency representations into logical forms and discourse representation structures

in a compositional way. This grammar is based on sound and clear linguistic

principles and identifies patterns in dependency representations (named α-structures).

An α-structure is a representation whose nodes represent variables coupled with

syntactic parts-of-speech and whose edges represent syntactic dependency relations.

These dependency relations and parts-of-speech constitute constraints on the patterns.

Discovering a pattern in a dependency graph means instantiating an α-structure with

the current sentence lexical information. Each pattern is linked to a rewriting rule that

creates its semantic representation.

A rewriting rule implements a series of transformations on the pattern including

node fusion, node destruction and predicate creation. The application of a rule creates

a semantic representation which uses a knowledge model. The knowledge model

defines general and universal categories such as Entity, Named Entity, Supertype,

Event, Statement, Circumstance, Time, Number, Measure and Attribute. The category

is determined by the part-of-speech (POS) and the grammatical relationships detected

in the syntactic structure.

A set of examples representing α-structures and their transformation through

rewriting rules is shown below:

• An example of an α-structure involving a predicative event:

• An example of an α-structure involving potential anaphora resolution through the

predicate resolve:

An α-grammar can be divided into two modules: the first module involves the

transformation of dependency relations into tree-like representations, and the second

module implements a semantic compositional analysis.

entity(id_X, X), event(id_e, Y, id_X)

nsubj Y

/ v X

/ n

det entity(id_Y, Y), resolve(id_Y)

Y

/ n

X

/ d

3.1. A Tree-like representation conversion

In order to ease the subsequent process of compositional analysis, the grammar runs a

tree converter to create a tree-like representation from the dependency relations and

from the parts of speech. This tree converter is currently composed of 14 rules that

modify the structure of dependencies. Each rule specifies a tree segment that should

be modified and implements modification operations, namely removal, add, copy and

aggregate, which can be performed on nodes and/or relations. For example, one α-

structure recognizes clausal complements without a subject such as in the sentence

Paul likes to eat fish. The transformation associated with this rule consists in copying

the subject in the clausal complement as shown below using the add operation on a

node (Paul) and a link (eat, Paul):

�

The interest of this transformation is to facilitate the compositional analysis, since

the clausal complement augmented by its subject can then be interpreted in an

independent manner. In addition to this type of transformations, the tree converter

tackles the processing of various linguistic structures including compound noun

aggregation, verb modifiers aggregation, conjunctions, and negation.

Compound Noun aggregation: they are identified through the “nn” relationship

(noun compound modifier) and are aggregated to form a single entity. For e.g.

Guatemala army in the sentence “the Guatemala army announced …” is considered as

an entity after the aggregation. Noun can also be modified by adverbial modifiers

such as in the example “Genetically modified food”. In this case, there is also an

aggregation operation which is performed between the head and its children.

Verbs and Verb Modifiers: Verbs can be modified by particles “prt” such as

“made up” or “climb up” and by auxiliaries. In general, they are considered as events

predicates such as “banners flap” or as statement predicates such as “the gates of the

city seem getting closer”.

Conjunctions should be identified to build various sub-trees from the input

dependencies. These sub-trees constitute representations of the phrases linked by the

conjunctions. For example, in the sentence “There are hens and pigs in the farm”, the

tree converter extracts two sub-trees « there are hens in the farm » AND « there are

pigs in the farm ». This distributive interpretation of the conjunction can be erroneous

in some sentences depending on the intended meaning, and further versions of the tree

converter will consider the various possible interpretations of a given conjunction.

Negation: In order to handle negation, the tree converter places the negation node

not as a parent of the verb and removes the subject from the scope of the negation.

This way of handling negation is required by the embedding of structures in the

resulting DRS, as shown in the following DRS, which represents the semantics of

« the cat did not eat the mouse »:

--------------------------------- [id1] --------------------------------- resolve(id1) entity(id1,cat) NOT: ------------------------------- [id2,e1] ------------------------------- resolve(id2) entity(id2,mouse) event(e1,eat,id1,id2) ------------------------------- ---------------------------------

The resulting tree-like representation can then be processed by the compositional

semantic analysis. An example of such representation (in Prolog) is: root/tree(token(flap,2)/v,

[nsubj/tree(token(banners,1)/n,[]),

prep/tree(token(in,3)/prep,

[pobj/tree(token(wind,5)/n,

[det/tree(token(the,4)/d,[])])]),

prep/tree(token(outside,6)/prep,

[pobj/tree(token(walls,8)/n,

[det/tree(token(the,7)/d,[]),

prep/tree(token(of,9)/prep,

[pobj/tree(token(city,11)/n,

[det/tree(token(the,10)/d,[])])])])])]).

Banners flaps in the wind outside the walls of the city

3.2. A compositional analysis

An α-grammar uses compositional analysis to output semantic representations. Fig. 1

shows how a compositional analysis coupled with logical forms is performed on the

sentence “Banners flap in the wind”. The grammar starts by examining the children of

the head word “flap”. The pattern nsubj(Verb, Noun) is detected and involves the

creation of an event predicate event(Id,Node,IdAgent) where Node is the verb, Id an

identifier for the event, and IdAgent the identifier of the agent. The agent itself, the

sub-tree nsubj/tree(token(banners,1)/n,[]), should then be explored to identify its

label and eventually its modifiers and determiners. Here, banners is a leaf node which

corresponds to a noun, thus leading to a predicate entity(id1, banners). This predicate

has never been encountered before in the sentence, thus leading to a predicate new

(id1).

As can be noticed, the compositional nature of an α-grammar makes it possible to

use the result of a sub-analysis (namely the created predicates and the variables) and

to refer to these variables in higher level analysis. This is the case for prepositional

relations on events such as in(e1, id2) in fig.1.

Fig. 1. Compositional Semantic Analysis [14]

There are two kinds of α-structures in an α-grammar: core α-structures (table 1)

and modifiers α-structures (table 2). Core α-structures are primary linguistic

constructions that are organized into a hierarchy of rules where more specific rules are

fired first. For instance, the pattern “nsubj-dobj-iobj” is higher in the hierarchy than

“nsubj-dobj”. This avoids the possibility of misinterpreting a particular syntactic

construction by neglecting one essential grammatical relationship. Interpreting “Mary

gave Bill a book” has indeed not the same logical interpretation as “Mary gave Bill”,

which means nothing. Modifiers α-structures (table 2) are auxiliary patterns that

complement the meaning of core α-structures such as temporal modifiers or adverbial

clause modifiers.

Table 1. Core verbal α-structures

Core

α-structures

Examples

Verb-iobj-dobj Mary gave {Bill}iobj a {raise}dobj

Verb-dobj-xcomp The peasant carries {the rabbit}dobj, {holding it by its

ears}xcomp

Verb-ccomp John saw {Mary swim}ccomp

Verb-expletive {There}expl is a small bush.

Verb-acomp Amal looks {tired}acomp

Verb-prep-pcomp

They heard {about {Mia missing classes }pcomp} prep

Verb-dobj {The cat}nsubj eats {a mouse}dobj

Table 2. Modifiers α-structures

Modifiers α-structures Examples

Verb-prep-pobj Banners flap {in {the wind}pobj}prep

Verb-tmod Vincent arrived {last night}tmod

Verb-advcl The accident happened {as the night was falling}advcl.

nsubj

pobj

det

Banners/n

The/d

in/prep

Wind/n

prep

entity(id1, banners), new(id1)

in(e1, id2)

entity(id2, wind), resolve(id2)

event(e1, flap, id1)

Flap/v

https://www.researchgate.net/publication/228947628_Semantic_analysis_using_dependency-based_grammars_and_upper-level_ontologies?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz

Verb-purpcl Benoît talked to Michel {in order to secure the

account}purpcl.

Verb-infmod The following points are {to establish}infmod.

Verb-advmod The grass bends {abruptly} advmod.

For the moment, the grammar is composed of 36 core α-structures and 17

modifiers α-structures for a total of 53 α-structures. There are also some specific

grammatical relations that we would like to further explain:

Determiners: they help to identify the state of the referred object in the discourse.

Some determiners, such as “the” implies that the object has already been encountered

in the discourse, hence leading to a predicate “resolve (id)”. Other determiners such as

“a” describes a new entity and are referred by a predicate “new (id)”. These two

predicates help resolve anaphora and enable to consider the sentence elements at a

discourse level rather than at a sentence level.

Proper Nouns: Proper nouns, which are identified by the nnp relationship, are

transformed into named entities.

Prepositions: Prepositions are generally considered as modifiers patterns when

they modify a verb such as in the sentence “Banners flap in the wind” or a noun such

as “the gates of the city”. In both cases, a predicate representing the preposition is

created such as “of (id1, id2)” where id1 is the identifier of city and id2 refer to gates.

There could also be a predicate between an event id, and an object id (e.g. … flap in

…). Some particular patterns such as the preposition “of” as in the example above

lead to the creation of an attribute relationship. E.g. attribute (city, gate).

Possessive Pronouns: Possessive pronouns enable the creation of implicit

possessive relationships. For example “Benoit washes his car” implies that “Benoit

has a car”, which is represented in the logical form. The same kind of deduction is

used for constructions such as “The peasant’s eyes…” This enables building world

knowledge.

3.3. Examples

The α-grammar outputs either resolved or underspecified representations in the form

of flat scope-free logical expressions and in the form of discourse representation

structures. Underspecified representations means that certain ambiguities are left

unresolved in the semantic output such as for example the predicate “resolve (id)” for

anaphora resolution. An independent component can then be used to deal with these

ambiguities. The following table illustrates the DRS-output of our α-grammar on two

sentences from the competition STEP 2008 Shared Task. This competition is meant to

compare the results of semantic analyzes on a shared corpus of small texts.

As can be seen, the α-grammar correctly identifies the first part of the sentence in

A “An object is thrown with a horizontal speed of 20 meters per second from a cliff”

and discovers correctly the entities, events (with a correct handling of the passive

voice), prepositional relations, attributes and numerical relations. However, the

fragment “that is 125 m high” is ignored by the grammar. We also would like to

emphasize that the relation from(e1,id6) correctly applies the preposition “from” on

the event “thrown”, despite the long distance dependencies. In B, the entire sentence

is correctly analyzed.

4 Ontology-based Semantic Analysis

Once the semantic representations are obtained either through logical forms or

through discourse representation structures, there is a need to annotate these

representations using a formal and interoperable structure. In fact, one of the

drawbacks of current semantic analysis is the multiplicity of the adopted formal

representations, which hinder their comprehension, exchange and evaluation.

Standardizing these representations through ontological indexing may help these

issues. Moreover, one of the goals of computational semantics is the ability to

perform inferences on the obtained representations. Using ontologies to describe the

various predicates, discourse referents and conditions enables further reasoning, and

builds a bridge with the semantic web community and with other semantic analysis

initiatives such as semantic role labeling and textual entailment.

Upper-level ontologies can provide this formal definition of a set of roles and

enable the indexing of semantic representations in an interoperable way. One of these

upper-level ontologies is the Suggested Upper Merged Ontology (SUMO) [11], which

is widely used in the NLP community and which has gone through various

development stages and experimentations, making it stable and mature enough to be

A) An object is thrown with a horizontal

speed of 20 meters per second from a cliff

that is 125 m high.

B) The object falls for the height of

the cliff.

---------------------------------

[id2,e1,id1,id3,id4,id5,id6]

---------------------------------

entity(id2,object)

entity(id1,undefined)

event(e1,thrown,id1,id2)

entity(id3,speed)

entity(id4,meters)

entity(id5,second)

per(id4,id5)

num(id4,20)

of(id3,id4)

attribute(id3,horizontal)

with(e1,id3)

entity(id6,cliff)

from(e1,id6)

---------------------------------

------------------------------

[id1,e1,id2,id3]

------------------------------

---

resolve(id1)

entity(id1,object)

event(e1,falls,id1)

resolve(id2)

entity(id2,height)

resolve(id3)

entity(id3,cliff)

of(id2,id3)

for(e1,id2)

------------------------------

https://www.researchgate.net/publication/243768174_The_Suggested_Upper_Merged_Ontology_AL_arge_Ontology_for_the_Semantic_Web_and_its_Applications?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz

taken as a “standard” ontology. Moreover, SUMO has been extended with a Mid-

Level Ontology (MILO), and a number of domain ontologies, which allow coverage

of various application domains while preserving the link to more abstract elements in

the upper level. One interesting feature of SUMO is that its various sub-ontologies are

independent and can be used alone or in combination. In our current semantic

analysis, we only exploit the upper level, meaning that we take into account only the

SUMO ontology itself. Another interesting point of SUMO is its mapping of concepts

and relations to the WordNet lexicon [8], a standard resource in the NLP community.

The SUMO-WordNet mapping associates each synset in WordNet to its SUMO sense

through three types of relationships: equivalent links, instance links and subsumption

links. One drawback of these mappings is that they are not always consistent:

sometimes verbs are mapped to SUMO relationships and sometimes to concepts.

Although these mappings cannot be considered perfect in their original form, they

constitute an excellent demonstration of how a lexicon can be related to an ontology

and exploited in a semantic analysis pipeline.

To annotate the semantic representations and obtain SUMO-based DRS and/or

SUMO-based logical forms, we tested multiple word sense disambiguation algorithms

mainly inspired from Lesk algorithm and its derivatives such as [1]. We also used the

most frequent sense baseline as this is commonly done in WSD competitions. These

algorithms had to be applied based on a given context. In this respect, we tested

various contexts such as word windows, sentence windows, and graph-based contexts

extracted from the semantic logical representations [13] obtained in the semantic

analysis.

An example of SUMO-based logical form annotation is: outside(e1, id3), of(id3,

id4), entity(id4, SUMO:City), resolve_e(id4), entity(id3, SUMO: StationaryArtifact),

resolve_e(id3), in(e1, id2), entity(id2, SUMO: Wind), resolve_e(id2), event(e1,

SUMO: Motion, id1), entity(id1, SUMO: Fabric), new_e(id1).

5 Evaluation

The evaluation of this research is not a simple task as it involves various modules and

outputs and requires a multi-dimensional evaluation: logical forms evaluation, DRS

evaluation and Word sense disambiguation evaluation. One other issue that faces the

evaluation of semantic analysis is the lack of gold standard on which to compare our

representations. In order to tackle this issue, this evaluation relies on two corpora: 1) a

first corpus of 185 sentences that we have manually analyzed and annotated to build a

complete gold standard. This corpus is extracted from children stories such as Alice in

Wonderland; 2) a corpus of seven texts that have been used in the STEP 2008 shared

task [3] and whose objective is the evaluation of a semantic analysis. However, there

is no defined gold standard on this corpus, and the task consists mainly in defining

criteria to judge the effectiveness of the extracted representations based on the advice

of a human expert. We are aware of the limited size of this evaluation but as can be

noticed in the STEP 2008 shared task, this is a limit common to all semantic analysis

systems.


https://www.researchgate.net/publication/220746830_Can_Syntactic_and_Logical_Graphs_help_Word_Sense_Disambiguation?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz

https://www.researchgate.net/publication/2563220_Extended_Gloss_Overlaps_as_a_Measure_of_Semantic_Relatedness?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz

https://www.researchgate.net/publication/242400207_WordNet_An_Electronic_Lexical_Database?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz

5.1. Logical Form Evaluation

Our logical form evaluation was carried on the first corpus. This corpus helped us in

performing the logical form evaluation as well as the semantic annotation evaluation.

Two metrics from information retrieval were used: precision and recall.

Precision = items the system got correct / total number of items the system generated

Recall = items the system got correct / total number of relevant items (which the

system should have produced)

Here the items designate entities and events (Table 3).

Table 3. The logical form analysis results in terms of entities and events

Precision % Recall %

Entities 94.98 80.45

Events 94.87 85.5

From these experiments, it is clear that our semantic analysis is promising. Most of

the time, the incorrect entities and events are due to a wrong syntactic parsing from

the Stanford Parser. There are also some patterns that are not yet identified which

make the recall lower. These results should be later completed with an evaluation of

the whole logical representation and not be limited to entities and events.

5.2. DRS Evaluation

The DRS evaluation was carried on the STEP 2008 shared task corpus. This corpus

was enriched by two texts taken randomly from Simplepedia leading to a total of 58

sentences. Overall, 51 sentences were semantically analyzed by our grammar and 7

were ignored due to an erroneous syntactic analysis and to the lack of the appropriate

patterns in the grammar. These 51 sentences were analyzed using 38 of our 53 α-

structures, resulting in a rate of 72% for α-structures effective usage.

In order to evaluate the obtained DRS, we first calculated a metric of precision for

the conditions of each DRS in the following way:

Precision = number of correct conditions / overall number of generated conditions in

a DRS

Second, the expert assessed the recall of each DRS by extracting the conditions of

a DRS that should have been generated but that were missing due to a wrong analysis.

Recall = number of correct conditions / overall number of conditions in a DRS

Table 4 summarizes the average score per sentence of overall conditions, correct

conditions and missing conditions and presents the obtained precision and recall

values.

Table 4. Mean values per sentence

#conditions # correct

conditions

# missing

conditions

Precision

(%)

Recall

(%)

6,7 5,5 3,6 81 67

We can notice a very reasonable value of precision on these real-world examples

considering that our grammar is in its first development phase. Our results also show

that half of the sentences obtain more than 90% precision. However, the recall is still

to be improved.

Table 5 shows a more fine-grained analysis of the obtained DRS by calculating the

precision obtained on the various condition categories especially entities, events and

attributes. All the other categories are classified under the label “other”. These results

are those obtained after the analysis of the 51 sentences.

Table 5. Mean values by DRS condition categories

# conditions # correct

conditions

Precision %

Entities 152 139 91

Events 56 43 77

Attributes 44 37 84

Others 81 54 64

Total 333 273 82

We can notice that entities are generally well-identified, followed by attributes and

then by events. The errors made by our grammar in event recognition are mostly due

to missing α-structures (7 cases over 13), to errors in the syntactic analysis (3 cases)

or to an erroneous conversion into a tree structure (2 cases). Regarding attributes, all

the errors are made on the same text and are related to particular modifiers wrongly

interpreted (e.g. “The other gas giants are Saturn and Uranus”, “only…”, etc.). Finally

the results of the “other” label indicate that further development should be made to

enhance our grammar.

6 Conclusion and further work

This paper presented the α-Grammar, a semantic analysis pattern-based grammar that

produces discourse representation structures and logical forms from free texts. With

the increase in the use of dependency grammars as a syntactic formalism, building a

conversion process from dependency relations to semantic representations is justified

from a practical point of view. Moreover, our approach proposes a semantic analysis

pipeline where the various modules (the syntactic analysis, the semantic analysis and

the ontology-based annotation) are independent, meaning that they can be easily

evolved or replaced from a software engineering perspective. The other interest of our

work is the ability to standardize the generated semantic representations through the

use of an upper-level ontology. This also enhances the inference capabilities over the

extracted representations. Finally, our computational semantics approach is domain-

independent and unsupervised, which enables better reuse in multiple domains and

applications.

In future work, we plan to enhance the grammar by discovering new patterns using

manual analysis of texts but also automatic pattern learning approaches. This will help

us improve the precision and recall of the semantic analysis. We also plan to handle

more complex discourse structures and anaphora resolution. Finally, we would like to

extend the scale of the corpora used for the evaluation and to compare our DRS with

the DRS extracted by Boxer [4] on the same corpora.

Acknowledgements. The authors would like to thank Prompt Quebec, UnimaSoft

Inc. and the FQRNT for their financial support. Amal Zouaq is funded by a

postdoctoral fellowship from the FQRNT.

References

1. Banerjee, S. and Pedersen, T. (2003). Extended gloss overlaps as a measure of semantic

relatedness. In Proc. of the 18th Int. Joint Conf. on AI, Mexico, pp.805-810.

2. BONFANTE G., GUILLAUME B., MOREY M. & PERRIER G. (2010). Réécriture de

graphes de dépendances pour l’interface syntaxe-sémantique. In Proc. of TALN 2010,

Montreal.

3. Bos, J. (2008a). Introduction to the Shared Task on Comparing Semantic Representations.

In STEP 2008 Conference Proceedings, pp.257–261. College Publications.

4. Bos, J. (2008b). Wide-Coverage Semantic Analysis with Boxer. In STEP 2008 Conference

Proceedings, pp 277–286, Research in Computational Semantics, College Publications.

5. Copestake A. (2009). Slacker semantics : Why superficiality, dependency and avoidance

of commitment can be the right way to go. In Proceedings of the 12th Conference of the

European Chapter of the ACL (EACL 2009), p. 1–9, Athens, Greece : Association for

Computational Linguistics.

6. De Marneffe, M-C and Manning, C.D. 2008. "The Stanford typed dependencies

representation." In COLING Workshop on Cross-framework and Cross-domain Parser

Evaluation.

7. De Marneffe, M-C, MacCartney, B. and Manning. C.D. (2006). Generating Typed

Dependency Parses from Phrase Structure Parses. In Proc.of LREC, pp.449-454.

8. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database, MIT Press.

9. Kamp, H. and Reyle, U. (1993). From Discourse to Logic. Introduction to Model-theoretic

Semantics of Natural Language, Formal Logic and Discourse Representation Theory,

Studies in Linguistics and Philosophy.

10. Melcuk I. (1988). Dependency Syntax : Theory and Practice. Albany : State Univ. of New

York Press.

11. Pease, A., Niles, I., and Li, J. (2002). The Suggested Upper Merged Ontology: A Large

Ontology for the Semantic Web and its Applications. In Proc. of the AAAI Workshop on

Ontologies and the SW, Canada.

12. Stevenson, M. and Greenwood, M.A. 2009. Dependency Pattern Models for Information

Extraction, Research on Language & Computation, pp.13-39, Springer.

13. Zouaq, A., Gagnon, M. & Ozell, B. (2010): Can Syntactic and Logical Graphs help Word

Sense Disambiguation? In Proc. of LREC 2010.

14. Zouaq, A., Gagnon, M. & Ozell, B. (2010): Semantic Analysis using Dependency-based

Grammars and Upper-Level Ontologies, International Journal of Computational

Linguistics and Applications, 1(1-2): 85-101, Bahri Publications.

15. Zouaq, A. (2008). An Ontological Engineering Approach for the Acquisition and Exploi-

tation of Knowledge in Texts, PhD Thesis, University of Montreal (in French).



























https://www.researchgate.net/publication/242400207_WordNet_An_Electronic_Lexical_Database?el=1_x_8&enrichId=rgreq-8182bc38c679ee1fafa5fdcc7c62584a-XXX&enrichSource=Y292ZXJQYWdlOzIyMTM5NjIwMztBUzo5NzE5MzczODM3NTE3MEAxNDAwMTg0MjAxMTEz




Unsupervised and Open Ontology-Based Semantic Analysis

Documents

Transcript of Unsupervised and Open Ontology-Based Semantic Analysis