Authoring Semantic and Linguistic Knowledge for the Dynamic Generation of Personalized Descriptions

18
Authoring semantic and linguistic knowledge for the dynamic generation of personalized descriptions Stasinos Konstantopoulos, Vangelis Karkaletsis, Dimitrios Vogiatzis, and Dimitris Bilidas Abstract We present the ELEON/NATURALOWL system, an application of Seman- tic Web and Natural Language Generation technologies that combines a conceptual representation of cultural heritage objects with linguistic and adaptation resources. This combined model is used to automatically generate multi-lingual and person- alized textual descriptions of cultural heritage objects represented as instances of an OWL domain ontology annotated by RDF linguistic and adaptation resources. Metadata and annotations are created using an authoring environment, which con- siderably reduces the effort required to port the system to a new domain. Keywords semantic web technologies, personalized natural language generation 1 Introduction Cultural heritage organizations create and maintain repositories comprising exten- sive metadata about cultural objects, such as artifacts, artists, and locations. The repositories are typically used to catalogue, index, and classify the cultural content, for the purpose of providing semantic searching and browsing facilities to profes- sional users as well as to the general public. A further opportunity, however, which we consider in this chapter, is to automatically generate textual descriptions of cul- tural objects from a repository, descriptions which are customized to a variety of audiences, in multiple languages, and which serve different presentation objectives. In this chapter, we present the combined ELEON/NATURALOWL system and how it links a conceptual representation of cultural heritage objects (the domain) with the linguistic and adaptation resources necessary to realize elements of this representa- All authors are at the Institute of Informatics & Telecommunications, NCSR ‘Demokritos’, Ag. Paraskevi GR-15310, Athens, Greece e-mail: {konstant,vangelis,dimitrv,dbilid}@iit.demokritos.gr 1

Transcript of Authoring Semantic and Linguistic Knowledge for the Dynamic Generation of Personalized Descriptions

Authoring semantic and linguistic knowledge forthe dynamic generation of personalizeddescriptions

Stasinos Konstantopoulos, Vangelis Karkaletsis,Dimitrios Vogiatzis, and Dimitris Bilidas

Abstract We present the ELEON/NATURALOWL system, an application of Seman-tic Web and Natural Language Generation technologies that combines a conceptualrepresentation of cultural heritage objects with linguistic and adaptation resources.This combined model is used to automatically generate multi-lingual and person-alized textual descriptions of cultural heritage objects represented as instances ofan OWL domain ontology annotated by RDF linguistic and adaptation resources.Metadata and annotations are created using an authoring environment, which con-siderably reduces the effort required to port the system to a new domain.

Keywords semantic web technologies, personalized natural language generation

1 Introduction

Cultural heritage organizations create and maintain repositories comprising exten-sive metadata about cultural objects, such as artifacts, artists, and locations. Therepositories are typically used to catalogue, index, and classify the cultural content,for the purpose of providing semantic searching and browsing facilities to profes-sional users as well as to the general public. A further opportunity, however, whichwe consider in this chapter, is to automatically generate textual descriptions of cul-tural objects from a repository, descriptions which are customized to a variety ofaudiences, in multiple languages, and which serve different presentation objectives.

In this chapter, we present the combined ELEON/NATURALOWL system and howit links a conceptual representation of cultural heritage objects (the domain) with thelinguistic and adaptation resources necessary to realize elements of this representa-

All authors are at theInstitute of Informatics & Telecommunications, NCSR ‘Demokritos’,Ag. Paraskevi GR-15310, Athens, Greecee-mail: {konstant,vangelis,dimitrv,dbilid}@iit.demokritos.gr

1

2 Stasinos Konstantopoulos et al.

tion as personalized text. ELEON is an authoring environment for creating domainsin the form of OWL ontologies, OWL being the standard formalism to specify on-tologies on the Semantic Web [4], as well as the linguistic and adaptation resources.1

NATURALOWL is a natural language generation (NLG) engine that exploits the lin-guistic and adaptation resources authored via ELEON to dynamically produce textsfrom OWL ontologies.2 The advantages of this approach are manifold:

• OWL ontologies constitute machine-readable and reusable models of the culturalrepository’s content. Besides supporting natural language generation, such mod-els can be used for the semantic indexing and searching of the repository. Thiscan also be seen from the reverse perspective: the natural language descriptionscan be derived from existing conceptual models originally created for the purposeof semantic indexing and searching.

• The conceptual representations are realized as texts using reusable linguisticmodules with clearly separated and configurable domain-dependent linguisticand profiling resources. By clearly separating the conceptual representationsfrom the linguistic modules and their domain-dependent resources, the same con-ceptual descriptions can be realized in different languages and the same linguisticmodules can be used to realize conceptual representations of different reposito-ries.

• The dynamic generation of textual descriptions is driven by user adaptation re-sources that personalize the descriptions for different audiences, but also adaptthem to different contexts and situations. Furthermore, ELEON invokes automaticreasoning systems that assist the author by automatically inferring missing profileparameters, alleviating the burden of explicitly providing all necessary parame-ters for large numbers of objects and audience types.

Although the system can be used in a variety of domains and human-computerinteraction applications, it is particularly pertinent to cultural heritage content, es-pecially when the content has to be presented in different situations and interactioncontexts (e.g., via portable devices or remotely over the Web) and to audiences withwide ranges of age groups, levels of expertise, cultural and educational backgrounds.

In the rest of this chapter, we first focus our attention to the representation usedby the ELEON/NATURALOWL system, including representing the domain (Section 2)and the linguistic and adaptivity annotations that complement it (Section 3). We thenproceed to discuss how these are used by NATURALOWL to generate descriptions ofthe objects of the domain (Section 4) and how ELEON facilitates the creation of theseresources (Section 5). The chapter closes with comparison to previous approaches(Section 6) and some concluding remarks (Section 7).3

1 ELEON was developed at the Institute of Informatics and Telecommunications, NCSR ‘Demokri-tos’ and is publicly available as open-source software; see http://www.iit.demokritos.gr/˜eleon for more information.2 NATURALOWL was developed by the Natural Language Processing Group, Department of In-formatics, Athens University of Economics and Business. It is publicly available as open-sourcesoftware; see http://nlp.cs.aueb.gr/software.html for more information.3 Section 4 is based on text by Ion Androutsopoulos and Gerasimos Lampouras, adapted and usedwith their permission. Please see also Acknowledgements.

Knowledge authoring for personalized description generation 3

Fig. 1 ELEON screenshot showing the concept hierarchy of the ontology and the instances of eachconcept (left), the fields of the currently selected instance (right top), and a preview of the textualdescription generated for the instance (right bottom). The preview language and profile can be seenon (and selected from) the bar at the bottom of the screen.

2 Authoring Domain Ontologies

ELEON assumes that its users, called authors, are persons who have domain exper-tise, but not expertise in knowledge representation and NLG. It helps them configurethe system for a new application domain by defining the domain ontology, as wellas the domain-dependent language and adaptation resources.

The language resources allow the NLG engine to turn facts of the ontology intocoherent natural language texts, whereas the adaptation resources are consulted toadapt the generated texts to each visitor’s preferences and presumed backgroundknowledge, but also to the interaction goals set by the author. ELEON also enablesthe authors to generate text previews using the NLG engine, in order to examine theeffect of their updates to the domain ontology, the language, and the adaptation re-sources. In this section, we focus on the functionality that ELEON provides to createor edit the domain ontology. The linguistic and adaptation resources are discussedin the following sections.

ELEON assumes that the domain ontology encodes knowledge in the form ofentity types (concepts), entities (instances of concepts), relations between entities,and attributes that connect entities to datatype values; this assumption is compatiblewith OWL ontologies, one of the types of ontologies supported by ELEON, wherethe corresponding terms are classes, individuals, object properties, and datatypeproperties. Figure 1 illustrates part of a domain ontology that encodes knowledge

4 Stasinos Konstantopoulos et al.

about the Ancient Agora of Athens. This ontology was used in the INDIGO project,where ELEON/NATURALOWL was embedded in a mobile robot acting as a guide ina cultural centre; in that setting, monuments of the Agora were displayed on wall-mounted screens.4

ELEON ontologies encode domain knowledge in the form of entity types (con-cepts), entities (instances), and relations between them. Figure 1 illustrates part ofsuch an ontology that encodes knowledge about the Ancient Agora of Athens. Thisontology was used in the INDIGO project to implement a use case where the systemguides visitors through an exhibition on the Ancient Agora of Athens, introducingthe buildings to them before they attend a virtual 3D tour of the Agora hosted at theFoundation of the Hellenic World. The examples used in this paper are drawn fromthis domain.

In the example of Figure 1, stoa-of-attalus is an instance of the entity type Stoa,a sub-type of Building, which is in turn a sub-type of ArchitecturalConstruction, a sub-type of PhysicalObject. Attributes and relationships are expressed using fields. Atany entity type, it is possible to introduce new fields, which then become availableto all the entities that belong to that type and its subtypes. In Figure 1, the fieldlocatedIn was introduced at the ArchitecturalConstruction entity type, and it was de-fined (previously, not shown in the screenshot) as expressing a relationship betweenArchitecturalConstruction entities and Place entities. Similarly, the using-period fieldintroduced at the PhysicalObject entity type, as expressing a relationship betweenPhysicalObject entities and Period entities; consequently all entities of type Phys-icalObject and its subtypes, including, e.g., ArchitecturalConstruction and ArtObject,inherit the using-period field.

OWL domains ontologies can be created in ELEON from scratch, or by importingand editing existing OWL ontologies; this facilitates the use of well-established con-ceptual models in the cultural heritage domain. The CIDOC Conceptual ReferenceModel (CRM), for example, is available as an OWL ontology.5 Work is in progress tocreate linguistic resources and to, in general, extend support for CIDOC CRM beyondthe current capability of defining a domain with the CIDOC CRM framework. Mostother cultural heritage vocabularies, thesauri, and classification schemes that useXML or relational database data models are compatible with the Simple KnowledgeOrganization System (SKOS) and can be automatically converted to ontologies.6 Forthe Ancient Agora ontology, we have adopted the Generalized Upper Model as on-tological foundation, a general, task and domain-independent upper level ontology,

4 See http://www.ics.forth.gr/indigo for more information about INDIGO. A videoof the robotic guide in action is available at http://www.youtube.com/watch?v=qCzBx4LzGak.5 See http://cidoc.ics.forth.gr/official_release_cidoc.html.6 See http://www.w3.org/2004/02/skos/ about SKOS. A variety of tools exist forconverting SKOS data models to, or aligning them with, ontological models. See, for example,http://www.heppnetz.de/projects/skos2gentax/ and http://annocultor.sourceforge.net/.

Knowledge authoring for personalized description generation 5

geared towards defining ontological classes appropriate for flexible expression innatural language.7

3 Description Adaptation

Besides modelling the cultural heritage domain itself, ELEON supports annotatingthe objects, classes, and properties of the domain with adaptation and linguisticinformation. Such information is used by NLG engines to (a) plan the descriptionthat will be generated, adapting it to the current audience and circumstance, and(b) realize the planned description in a particular language.

Realization is based on clause plans (micro-plans) that specify how an ontologi-cal property can be expressed in each supported natural language. The author speci-fies the clause to be generated in abstract terms, by specifying, for example, the verbto be used, the voice and tense of the resulting clause, etc. Similar annotations forinstances and classes specify how they should be realized as noun phrases that fillslots in the property-generated clauses. Micro-plan annotations also comprise sev-eral other language-specific parameters, such as whether the resulting clause can beaggregated into a longer sentence or not, its voice and tense, and so on, as describedin more detail by Androutsopoulos et al. [3, Sect. 3].

Adaptive planning, on the other hand, operates at the abstract level and does notinvolve specifics of the target language. It is rather aimed at reflecting a syntheticpersonality in the description, as well as personalizing it for a particular audience.Adaptation parameters are provided in the form of profile attributes that controlaspects of the text plan such as how many and which of the facts known about anobject should be used to describe it, as discussed in more detail below.

3.1 Personalization and personality

The system supports authoring the adaptation profiles that control the dynamicadaptation of the generated descriptions. Profiles permit the author to specify, forexample, that technical vocabulary be used when generating for experts, or thatshorter and simpler sentences are generated for children, but also to gear the systemtowards achieving different interaction goals, i.e., aiming at the assimilation by thevisitor of certain (types of) facts are assimilated. Adaptivity is achieved by providinga variety of generation parameters though adaptation profiles, including a numericalinterest attribute of the properties of the ontology. Isard et al. [9] describe how inter-est is used to impose a preference ordering of the properties of ontological entities,controlling which facts will be used when describing each entity.

In ELEON, we have extended interest models in two respects:

7 Please see http://www.fb10.uni-bremen.de/anglistik/langpro/webspace/jb/gum for more information.

6 Stasinos Konstantopoulos et al.

• by generalizing interest into arbitrary, author-defined profile attributes; and• by permitting profile attributes to apply not only to ontological properties, but

also to individuals and classes.

Using these extensions, authors can define personality profiles for generating text,managing dialogue, and simulating emotional variation in a way that reflects a cer-tain personality on behalf of the system.

In the INDIGO project we used these profiles in a human-robot interaction appli-cation, where a robotic tour guide that gives the impression of empathizing with thevisitor is perceived as more natural and user-friendly. But the methodology is gener-ally interesting in any context of generating descriptions of cultural heritage content,especially if the individual descriptions are aggregated in a tour of the collection. Insuch contexts, dialogue-management adaptivity can vary the exhibits included inpersonalized tours and emotional state variation can match the described contentand make the tour more engaging and lively.

The way in which personality profiles are used in INDIGO to estimate the pref-erence towards exhibits and their properties and parametrize dialogue managementand simulated emotions are discussed in more detail elsewhere [10, 11], so we shallonly briefly outline it here: In INDIGO preference is calculated based on a logicmodel of the robot’s personality traits and also on ground facts regarding objectiveattributes of the content—such as the importance of an exhibit—but also subjectiveattributes that reflect the robot’s perception of the content—such as how interestingan exhibit is. Importance, in this context, reflects the interaction goals set out by theauthor and how the system is to adapt to achieving them; interest, on the other hand,reflects adaptivity to the visitors’ interests, possibly counter to the ‘objective’, orrather curator-defined, importance.

Emotional variation is achieved by using the personality profile to estimate theemotional appraisal of dialogue acts and update the mood and emotional state ofartificial agents. Dialogue management is affected both directly, by taking exhibitpreference into account when deliberating over dialogue acts, and indirectly, bybeing influenced by the artificial agent’s current mood.

The detailed profiles required by INDIGO are, however, difficult to author andmaintain. In the work described here, we alleviate the burden of manually providingall the ground parameters, exploiting the fact that these parameters are stronglyinter-related and can, to a large extend, be automatically inferred. More specifically,ELEON backs the profile authoring process by reasoning over manually providedexhibit attributes in order to infer what the values of the missing attributes should be.The author can inspect the explicitly provided as well as the automatically inferredvalues and make corrections where necessary (Figure 2). Manual corrections triggera re-estimation of the missing values, so that after each round of corrections theoverall model is a closer approximation of the author’s intention.

Knowledge authoring for personalized description generation 7

Fig. 2 Screen fragment, showing the pop-up window for providing profile attribute values for anexhibit. Automatically inferred attribute values are displayed in red, to stand out from explicitlyprovided ones which are displayed in black.

3.2 Representation and interoperability

Linguistic and profile annotations are represented in the Resource DescriptionFramework (RDF) [12], a knowledge representation technology built around theconcept of using subject-predicate-object triples to describe abstract entities, re-sources. RDF triples assign to their subject resource the property of being relatedto the object through the predicate resource. Predicates can be data properties, inwhich case their objects are concrete values (numbers, strings, time periods, and soon), or object properties, in which case their objects are abstract resources.

Although OWL is not formally defined in RDF, it is defined in such a way that itcan be represented within RDF. In fact, the OWL specification itself provides a se-rialization of OWL ontologies as RDF for transport and data interchange purposes.This motivates our usage of RDF to represent linguistic and profile annotations,since it allows us to directly represent those as RDF triples of extra-ontologicalproperties of the OWL ontology. In this manner, standard RDF tools can directlyaccess the link between domain instances and their respective annotations, while atthe same time annotations remain ‘invisible’ to OWL inference tools.

The RDF vocabulary used defines a property that relates ontological entities (in-dividuals, classes, and properties) with profile attribute nodes that involve:

• the profile to which they are pertinent, e.g., ‘expert’;• the attribute, e.g., ‘interest’ or ‘importance’; and• the numerical value of the attribute for this entity in this profile.

When applied to ontology properties, profile attribute nodes can be further elabo-rated to apply only to properties of instances of a particular class. For example, one

8 Stasinos Konstantopoulos et al.

can express that users find it more interesting to know the architectural style whendiscussing temples than when discussing stoas.

4 Adaptive Natural Language Generation

NATURALOWL is based on ideas from ILEX [17] and M-PIRO [9]. It adopts a typicalpipeline NLG architecture [18] to produce text in three sequential stages: docu-ment planning, micro-planning, and surface realization. In document planning, thesystem first selects the logical facts of the domain ontology that will be conveyedto the visitor and it specifies the document’s structure. In micro-planning, it con-structs abstract forms of sentences, it aggregates them into abstract forms of longersentences, and it produces appropriate referring expressions (e.g., pronouns, propernames, noun phrases). Finally, in surface realization the abstract forms of the sen-tences are transformed into a real text.

The system can also opportunistically include in the generated texts comparisonsto previously encountered objects (e.g., ‘Unlike all the vessels that you saw, whichwere decorated with the black-figure technique, this amphora was decorated withthe red-figure technique.’), as well as comparisons to similar objects of the entirecollection (e.g., ‘This is the only vessel of the collection that was decorated with theblack-figure technique.’). We do not discuss comparisons here, but related methodscan be found elsewhere [8, 13, 15].

4.1 Document planning

To produce a natural language description of an object (an entity of the domainontology), NATURALOWL begins by selecting from the domain ontology, assumedto be in OWL, all the logical facts that are directly relevant to that object. For ex-ample, when describing an exhibit whose identifier is exhibit24, it might select thefollowing facts, which associate exhibit24 with the class (concept) aryballos and theentities archaeological-delos (the Archaeological Museum of Delos), iraion-delos (anarchaeological site), and archaic-period.

<exhibit24, rdf:type, aryballos><exhibit24, current-location, archaelogical-delos><exhibit24, location-found, iraion-delos><exhibit24, creation-period, archaic-period>

OWL facts can be represented as RDF triples [12], which is why facts are shownhere as triples. The triples correspond to fields of ELEON; the first element of a tripleis the owner of the field, the second element is the name of the field, and the thirdone is the field’s filler.

Knowledge authoring for personalized description generation 9

NATURALOWL may also be instructed to include facts that are indirectly rele-vant to the described object, i.e., facts (triples) connected to the directly relevantfacts. In that case, the selected facts of our example might also include facts like thefollowing:

<archaic-period, covers, archaic-period-duration><aryballos, rdfs:subclassOf, vessel>

The set of selected facts is subsequently finalized by first removing already as-similated facts, as indicated by a user model maintained for each particular visitor.If a directly relevant fact is removed, then the indirectly relevant facts that dependon it are also removed. Furthermore, the user adaptation resources, to be discussedin the following sections, specify an interest score for each fact of the ontology.Among the remaining facts, those with the lowest interest scores are removed, untilthe remaining number of facts (direct and indirect) does not exceed the maximumallowed number of facts per text. The latter parameter is also provided by the useradaptation resources.

The directly relevant facts are then ordered by consulting a (partial) order (seebelow) of the ontology’s properties (fields); the order may indicate, for example,that the current location of any exhibit should be mentioned first, followed by thelocation where it was found, and then the period when it was created. The indirectlyrelevant facts are placed right after the corresponding directly relevant facts, orderedagain by consulting the property order. This arrangement produces texts like ‘Thisis an aryballos. It was created during the Archaic Period. The Archaic Period lastedfrom. . . ’ In the application domains we have considered, this ordering scheme wasadequate, although in other domains more elaborate text planning approaches maybe needed.

4.2 Micro-planning

Each selected fact is of the form 〈S,P,O〉, where S corresponds to the subject ofa sentence, P is a property of the domain ontology that is typically expressed byusing a verb, and O is the value of the property, typically expressed as the verb’sobject. For each property P of the domain ontology, one or more micro-plans needto be specified per language to indicate how to express as a sentence any fact thatinvolves that property. Each micro-plan can be thought of as a sequence of slots,along with instructions specifying how to fill the slots in. Each slot can be filled inby:

• An expression referring to the S of the fact (to exhibit24 in the case of <exhibit24,current-location, archaeological-delos>).

• An expression referring to the O of the fact (to archaeological-delos in the case of<exhibit24, current-location, archaeological-delos>).

• A fixed string. If the string is a verb, it is also tagged with tense and voice; thesetags are used in sentence aggregation.

10 Stasinos Konstantopoulos et al.

<owlnl:property rdf:about="...#current-location"><owlnl:order>1</owlnl:order><owlnl:EnglishMicroplans ...>

<owlnl:microplan ...><owlnl:aggrAllowed>true</owlnl:aggrAllowed><owlnl:slots ...><owlnl:owner>

<owlnl:case>nominative</owlnl:case><owlnl:retype>re_auto</owlnl:retype>

</owlnl:owner><owlnl:verb>

<owlnl:voice>active</owlnl:voice><owlnl:tense>present</owlnl:tense><owlnl:val>is located</owlnl:val>

</owlnl:verb><owlnl:text>

<owlnl:val>in</owlnl:Val></owlnl:text><owlnl:filler>

<owlnl:case>accusative</owlnl:case><owlnl:retype>re_auto</owlnl:retype>

</owlnl:filler></owlnl:slots>

</owlnl:microplan></owlnl:EnglishMicroplans><owlnl:GreekMicroplans ...>...

</owlnl:Property>

Fig. 3 Micro-plan example.

Micro-plans are specified as RDF annotations of the corresponding properties ofthe domain ontology. The following RDF triples, for example, provide an Englishmicro-plan for facts involving the current-location property; they also set the prop-erty’s order score to 1, i.e., the resulting sentence should be placed before any othersentence that expresses a fact involving a property with a larger order score. Themicro-plan of the example has four slots. The first one must be filled in by a refer-ring expression for S (the owner of the corresponding field in ELEON terminology)in nominative case. The re_auto in owlnl:retype lets the system select automat-ically among using S’s name in natural language (if there is one in the lexicon, seebelow), a noun phrase (e.g., ‘this aryballos’), or a pronoun to refer to S, dependingon the context. The second slot is to be filled in by the string ‘is located’, whichis marked up as being a verb form in present tense and active voice. The third slotwill be filled in by the string ‘in’, and the fourth slot by an accusative case automat-ically selected referring expression for O (the filler of the corresponding field). Themicro-plan in Figure 3 produces sentences like ‘It is located in the ArchaeologicalMuseum of Delos.’

Knowledge authoring for personalized description generation 11

<owlnl:owlClass rdf:about="...#aryballos"><owlnl:hasLexEntry rdf:resource="#aryballos-lexicon"/>

</owlnl:owlClass>

<owlnl:lexEntry rdf:ID="aryballos-lexicon"><owlnl:LanguagesLexEntry ...><owlnl:EnglishLexEntry>

<owlnl:gender>nonpersonal</owlnl:gender><owlnl:singular ...>aryballos</owlnl:singular><owlnl:plural ...>aryballoi</owlnl:plural>

</owlnl:EnglishLexEntry><owlnl:GreekLexEntry>

<owlnl:gender>masculine</owlnl:gender><owlnl:singularForms><owlnl:nominative ...>...</owlnl:nominative><owlnl:genitive ...>...</owlnl:genitive><owlnl:accusative ...>...</owlnl:accusative>

</owlnl:singularForms>...

</owlnl:lexEntry>

Fig. 4 Lexicon entry example, listing the various forms of the noun aryballos and providing genderinformation.

NATURALOWL currently employs a very simple algorithm for generating refer-ring expressions: once the object being described has been introduced by mention-ing its class (e.g., ‘This is an aryballos’), it uses pronouns to refer to that object (e.g.,‘It was decorated with the red-figure technique. It was created during the ArchaicPeriod.’) until the focus moves to another entity via an indirect fact. The new focusis first referred to by its name and then by pronouns (‘The Archaic Period lastedfrom 700 till 480 B.C. It was when the Greek city-states. . . ’). Then, when the fo-cus returns to the original object, a demonstrative is used (‘This aryballos is madeof. . . ’). Some property values (O) may contain long canned strings (e.g., anecdotesabout exhibits), and there are special annotations to flag canned strings that changethe focus, so as to avoid using a pronoun in the next sentence. More elaborate refer-ring expression generation algorithms could in principle be added in future versionsof the system.

To generate referring noun phrases, like ‘this aryballos’, NATURALOWL re-quires OWL classes and entities to be associated with lexicon entries of nounsor proper names. This is again achieved by using RDF annotations. In the RDFtriples of Figure 4, the class aryballos is associated with a noun lexicon entryaryballos-lexicon. The lexicon entry contains the various forms of the noun,provides information on gender etc. In practice, all the RDF annotations, includinglexicon entries and micro-plans, are constructed by using ELEON, instead of directlyediting RDF statements.

12 Stasinos Konstantopoulos et al.

4.3 Surface realization

In surface realization, the system simply concatenates the slot values of the filled-inmicro-plans to produce actual sentences. Each micro-plan gives rise to a single sen-tence (e.g., ‘This is an aryballos. It was decorated with the red-figure technique.’)These sentences are then aggregated to form longer ones (e.g., ‘This is an aryballosdecorated with the red-figure technique.’) using domain-independent aggregationrules based on those of M-PIRO [14]. Space does not permit a more detailed descrip-tion of the aggregation stage, which is actually part of micro-planning and operateson (filled-in) micro-plans, before surface realization. NATURALOWL’s surface real-izer can also add syntactic or semantic markup. For example, each sentence may bemarked up with the corresponding OWL triples, leading to texts readable by bothhumans and computer applications; the latter would rely on the semantic markup.

5 Intelligent Authoring Support

As already discussed above, the detailed profiles required to adapt NLG are difficultto manually author and maintain. In this section we describe two automations thatsupport the authoring process, an inference approach exploiting that these parame-ters are strongly inter-related and can, to a large extend, be inferred from each other;and a data mining approach that constructs user models from interaction logs.

5.1 Profile Completion

We have previously discussed how profile attributes are represented as RDF annota-tions. While is advantageous from the perspective of containing the ontology withinthe Description Logics complexity fragment, this choice leaves profile attributesoutside the scope of reasoning tools.

In order to be able to efficiently reason over and draw inferences about profileattributes themselves, we have chosen to interpret profile attributes within many-valued description logics. Using description logics has the advantage of direct ac-cess to the domain ontology; using many-valued valuations has the advantage ofproviding a means to represent and reason over numerical values.

Profile attributes of individuals are captured by normalizing in the [0,1] rangeand then using the normalized value as a class membership degree. So, for example,if interesting is such an attribute of individual exhibits, then an exhibit with a(normalized) interest level of 0.7 is an instance of the Interesting class at a degreeof 0.7.

Attributes of classes are reduced to attributes of the members of the class, ex-pressed by a class subsumption assertion at the degree of the attribute. So, if theclass of stoas is interesting at a degree of 0.6, this is expressed by asserting that

Knowledge authoring for personalized description generation 13

Resource Property Value InterestStoa of Attalus style Doric 0.8Stoa of Attalus style Ionic 0.7Stoa of Attalus style Pergamene 0.3Stoa of Attalus orderedBy Attalus 0.9

Table 1 Ontology and profile fragment, showing the interest factors of the fillers of properties ofthe Stoa of Attalus.

being a member of Stoa implies being a member of Interesting. The implication isasserted at a degree of 0.6, which, under Łukasiewicz-Tarski semantics, means thatbeing a stoa implies being interesting at a loss of 0.4 of a degree. Thus individu-als that are members of the Stoa class at a degree of 1.0, are implicitly interestingat a degree of 0.6. Although this is not identical to saying that the class itself isinteresting, it clearly captures the intention behind the original RDF annotation.

Profile attributes can also characterize properties, such as style, orderedBy, orcreationEra, encoding the information that it might, for example, be more interestingto describe the artistic style of an exhibit rather than provide historical data aboutit. This is interpreted as the strength of the connection between how interestingan exhibit is, and how interesting its properties are. In other words, if having aninteresting filler for style also makes the exhibit interesting, this is taken to meanthat the style relation itself is an interesting one. Formulated in logical terms, havinginteresting relation fillers implies being interesting, and the implication holds at adegree provided by the interest level of the relation itself.

For example, let us assume that the style property has an interest factor of 0.8 andthe orderedBy property an interest factor of 0.4. We interpret this as:

Interesting v ∃style.Interesting : 0.8Interesting v ∃orderedBy.Interesting : 0.4

That is to say, the class of things that are related to at least one Interesting instancewith either style or orderedBy property, are themselves Interesting; however the levelof interest that is ‘transferred’ from the filler to the resource that has the property ishigher for the style property than it is for orderedBy.

Given an ontology and profile fragment like the one in Table 1, Stoa of Attalus hasan interesting style at a degree of 0.8, which is the maximum among the three ar-chitectural styles found in the stoa (Doric, Ionic, and Pergamene). Since style fillerstransfer interest at a loss of 0.2, style contributes 0.6 to the stoa’s Interesting-ness.By contrast, the filler of orderedBy (which is more interesting in this profile thanany of the architectural styles) only contributes 0.3 of a degree, because orderedByis annotated as uninteresting and interest transfers across it at a heavy loss.

We have so far discussed how to infer profile attribute values for the individualsof the domain. Classes and relations receive the value of the minimal instance of theclass (or relation). That is to say, the individual (or pair of individuals) for whichnothing else is known, except that it is a member of the class (or relation).

14 Stasinos Konstantopoulos et al.

As an example, consider a DoricBuilding class which is a subclass of Building thatonly admits instances that have a style relation with Doric. The minimal instanceof this class is a supposed and unnamed member of Interesting through having aninteresting property as discussed above, even though nothing else is known about it.This membership degree in Interesting is taken to be an attribute of the class itselfrather than any one of its members, and is used as the attribute value for the classitself.

For relations, two minimal instances of the relation’s domain and range are cre-ated. The attribute value for the property is the degree of the implication that havingthis property makes the domain individual have the attribute. For example, in orderto infer how interesting the property devotedTo is, we first observe that it relates Tem-ple instances with MythicalPerson instances, and create bare instances of these twoclasses. The implication that having a devotedTo relation to an Interesting individualleads to being member of Interesting holds to a degree that can be calculated, giventhe Interesting degrees of the Temple and MythicalPerson instances involved in the re-lation. The degree of the implication is then used as the value of the interesting

attribute.

5.2 Interaction log mining

The NCSR Personalization Server (PSERVER) is a domain and application indepen-dent system for storing and processing user profiles. User profiles are static proper-ties, numeric or symbolic, like age, gender, level or expertise, etc. as well as dynamicfeatures. Features are name-value pairs and different features can be defined as re-quired by each applications, with feature values representing an estimation of theuser’s affinity to the corresponding feature.

More specifically, one can define a set of features, which ‘describe’ the appli-cation, with a default value for each feature, based on user properties. Whenevera visitor reacts to a feature (by, for example, accessing a exhibit that is describedby the feature), PSERVER increases the value of that feature for the specific user.Whenever queried by the application, PSERVER predicts the interest that user has ona given exhibit, based on the values of the user model for the features that describethe exhibit.

PSERVER ‘sketches’ user interests without any prior information about the user,but only relying on the interaction between features of exhibits and the interest levelfor each feature found in the user model. PSERVER implements clustering methodsfor creating user communities (groups of users with common profiles) as well asfeature groups (groups of features which describe common domains).

In the case of INDIGO visitors, features relate to the INDIGO domain ontology.More specifically, PSERVER features correspond to the ontological properties ofexhibits—creator, place of origin, current location, historic period, artistic style,etc. Each interaction between the visitor and INDIGO (i.e. the user’s querying the

Knowledge authoring for personalized description generation 15

robot about an exhibit) increases the value of the feature or features the object of theinteraction (the exhibit) relates to.

6 Related Work

ELEON/NATURALOWL is based on ideas from ILEX [17] and M-PIRO [9]. The ILEXproject developed an NLG system that was demonstrated mostly with conceptualrepresentations of museum exhibits; it did not support, however, OWL.8 In sub-sequent work, the M-PIRO project produced a multilingual extension of the ILEXsystem, which was tested in several domains, as well as a precursor to ELEON [3].However, attempts to add support for OWL in the M-PIRO generation system ran intoproblems, because of incompatibilities between OWL and the ontological modelused in M-PIRO [2]. By contrast, NATURALOWL was especially developed for OWLontologies, which are also supported by ELEON. Both systems inherit from theirprecursors the core idea of separating the domain ontology from the linguistic anduser adaptation resources, which has significant advantages, as already discussed.

Previous versions of ELEON also featured authoring facilities such as using an ex-ternal description logic reasoner to catch logical errors by checking the consistencyof the authored ontology [5]. In the work presented here, the intelligence behind thetool is substantially extended by using logical inference to predict values that havenot been explicitly entered by the user, alleviating the need to manually providelarge volumes of numerical data.

7 Conclusion

In this chapter we have presented ELEON/NATURALOWL, an integrated authoringand NLG system that can be used to create domain ontologies in OWL, annotatethem with linguistic and user adaptation resources, and generate textual descriptionsof the ontology’s entities. We have also discussed the use of ELEON/NATURALOWLto generate descriptions of cultural heritage objects.

The advantages of using ELEON instead of generic knowledge authoring tools,such as Protege,9 stem from the ability to couple ELEON with external engines thatprovide important conveniences to the author, such as the semantic profile comple-tion and usage log mining facilities discussed in Section 5.

These facilities considerably reduce the effort required to create a fully functionalmodel as the author can start out with initial profiles which can be refined by iter-ating through cycles of providing information, previewing the generated text, andonly elaborating the model where the text is unsatisfactory. This iterative process

8 Dale et al. [6] describe a similar museum system.9 See http://protege.stanford.edu

16 Stasinos Konstantopoulos et al.

converges to satisfactory descriptions much faster than having to manually enter alladaptation parameters, especially for large and complex domains.

Our future plans include enhancing PSERVER with a recommender module whichwill suggest items of possible interest to the current user. It can provide contentbased recommendations, that is suggestions that are similar to past user preferences.In doing so, the system tries to discover commonalities between exhibits seen in thepast which have been positively rated. In particular, some of following features ofexhibits can be used to detect commonalities: author, historical period, type of ar-tifact, etc. Also, a collaborative recommender system could suggest interesting ex-hibits to a user based on items judged as interesting by a group of similar mindedusers [1]. For instance, it might be discovered that visitors interested in architectureare also interested in sculpture. Thus, a new visitor, after having received informa-tion about an architectural exhibit, might be offered advice for seeing the collectionof sculptures.

To enhance the system’s acceptance by humans, it would be desirable to justifythe exhibits it suggests by revealing its reasoning. Such justifications or explanationsare classified as opaque or transparent. Opaque explanations [7] are neighbour-based and essentially provide a statistics of the similar users’ preferences. This typeis more pertinent in collaborative based recommendations, exemplified by the fol-lowing dialogue excerpt:

User: Why did you guide me to the Tholos?Robot: 75% of visitors similar to you chose to visit it.

Transparent explanations [16], on the other hand, are based on reciting the fea-tures of the suggested exhibit. In case there is a follow-up question with the userrequesting further explanation, then some of the features of the current exhibit canbe associated with features of a previously visited exhibit:

User: Why did you suggest to view the temple of Hephestos?Robot: It was build in the Ancient Agora and was constructed

by PhediasUser: How did you guess that I am interested in Phedias?Robot: Considering you have expressed interest in the Acropolis,

which was also built by Phedias.

Given the nature of the descriptions generated by ELEON/NATURALOWL, collabo-rative filtering and opaque explanations are expected to be less relevant; it is rathertransparent explanations that would mostly contribute to the persuasiveness of thesystem.

Finally, NATURALOWL is also being further developed to generate descriptionsof both classes and entities, engaging richer language resources and more complexaggregation mechanisms. Extensions are also being made to support ontologies inOWL 2 [19], the most recent OWL specification, and the various complexities ofevent-based ontologies, such as CIDOC CRM. Lastly, the domain-specific languageresources are being reviewed and redefined as OWL ontologies instead of RDF an-notations.

Knowledge authoring for personalized description generation 17

Acknowledgements Much of the work reported here was carried out in the context of the Greekproject XENIOS and the subsequent European project INDIGO, where human-robot interactiontechnology was developed.10

We are grateful to Ion Androutsopoulos11 and Gerasimos Lampouras12 for allowing us to in-clude their text on NATURALOWL, and on Natural Language Generation in general, as well as fortheir corrections and contributions throughout this chapter. Naturally, all errors and omissions areour own.

Finally, we wish to acknowledge the help of the Foundation of the Hellenic World, whose staffused the ELEON/NATURALOWL system to create the initial version of the Ancient Agora of Athensontology.

References

[1] G. Adomavicius and A. Tuzhilin. 2005. Toward the next generation of rec-ommender systems: A survey of the state-of-the-art and possible extensions.IEEE Trans. on Knowledge and Data Engineering, 17(6).

[2] Ion Androutsopoulos, S. Kallonis, and Vangelis Karkaletsis. 2005. Exploit-ing OWL ontologies in the multilingual generation of object descriptions. InProceedings of the 10th European Workshop on Natural Language Generation(ENLG-05), Aberdeen, August 2005.

[3] Ion Androutsopoulos, Jon Oberlander, and Vangelis Karkaletsis. 2007. Sourceauthoring for multilingual generation of personalised object descriptions.Journal of Natural Language Engineering, 13(3):191–233.

[4] Grigoris Antoniou and Frank van Harmelen. 2008. A Semantic Web Primer.MIT Press, second edition.

[5] Dimitris Bilidas, Maria Theologou, and Vangelis Karkaletsis. 2007. Enrich-ing OWL ontologies with linguistic and user-related annotations: the ELEONsystem. In Proc. 19th IEEE International Conference on Tools with ArtificialIntelligence (ICTAI-2007), Patras, Greece, October 29–31, 2007, volume 2.IEEE Computer Society.

[6] Robert Dale, Stephen J. Green, Maria Milosavljevic, Cecile Paris, CorneliaVerspoor, and Sandra Williams. 1998. Dynamic document delivery: gener-ating natural language texts on demand. In Proceedings of the 9th Interna-tional Conference and Workshop on Database and Expert Systems Applica-tions, pages 131–136. Vienna, Austria.

[7] J. L. Herlocker, J. A. Konstan, and J. Riedl. 2000. Explaining collaborative fil-tering recommendations. In Proceedings of the ACM conference on Computersupported cooperative work (CSCW 2000), pages 241–250. ACM, New York,USA.

10 Please see http://www.ics.forth.gr/xenios (in Greek) and http://www.ics.forth.gr/indigo for more information.11 Department of Informatics, Athens University of Economics and Business, and Digital CurationUnit - IMIS, Research Centre ‘Athena’12 Department of Informatics, Athens University of Economics and Business

18 Stasinos Konstantopoulos et al.

[8] Amy Isard. 2007. Choosing the best comparison under the circumstances.In Proceedings of the International Workshop on Personalization EnhancedAccess to Cultural Heritage (PATCH07), 11th International Conference onUser Modeling (UM07), June 2007, Corfu, Greece.

[9] Amy Isard, Jon Oberlander, Ion Androutsopoulos, and Colin Matheson. 2003.Speaking the users’ languages. IEEE Intelligent Systems, 18(1):40–45.

[10] Stasinos Konstantopoulos. 2010. An embodied dialogue system with person-ality and emotions. In Morena Danieli, Bjorn Gamback, and Yorick Wilks,editors, Proc. of Workshop on Companionable Dialogue Systems, ACL 2010.Association for Computational Linguistics (ACL), Uppsala, Sweden.

[11] Stasinos Konstantopoulos, Vangelis Karkaletsis, and Colin Matheson. 2008.Robot personality: Representation and externalization. In Proc. of Interna-tional Workshop on Computational Aspects of Affective and Emotional Inter-action (CAFFEi 08), Patras, Greece, July 21st, 2008.

[12] Frank Manola and Eric Miller. 2004. RDF Primer. W3C Recommendation, 10February 2004. URL http://www.w3.org/TR/rdf-primer/.

[13] Matthew Marge, Amy Isard, and Johanna Moore. 2008. Creation of a new do-main and evaluation of comparison generation in a natural language generationsystem. In Proc. of the Fifth International Language Generation Conference(INLG08), June 2008, Salt Fork, Ohio, USA.

[14] Alexander Melengoglou. 2002. Multilingual aggregation in the M-PIRO sys-tem. Master’s thesis, School of Informatics, University of Edinburgh.

[15] Maria Milosavljevic. 1999. The automatic generation of comparison in de-scriptions of entities. Ph.D. thesis, Department of Computing, Macquarie Uni-versity, Australia.

[16] R. J. Mooney and L. Roy. 2000. Content-based book recommending usinglearning for text categorization. In Proceedings of the 5th ACM conference onDigital libraries (DL 2000), San Antonio, Texas, USA, pages 195–204. ACM,New York, USA.

[17] Michael O Donnell, Chris Mellish, Jon Oberlander, and A. Knott. 2001. ILEX:an architecture for a dynamic hypertext generation system. Natural LanguageEngineering, 7(3):225–250.

[18] Ehud Reiter and Robert Dale. 2000. Building natural language generationsystems. Cambridge University Press.

[19] W3C OWL Working Group. 2009. OWL 2 web ontology language. W3CRecommendation, 27 October 2009. URL http://www.w3.org/TR/owl2-overview/.