An ontology engineering methodology for DOGMA

28
FACULTEIT WETENSCHAPPEN Vakgroep Computerwetenschappen Semantics Technology and Applications Research Lab An ontology engineering methodology for DOGMA STAR Lab Technical Report 2008 affiliation: corresponding author: Peter Spyns keywords: ontology engineering methodology number : STAR-2008-x1 date: 05/12/2008 status: final reference: Journal of Applied Ontology 1-2 (3): 13 - 39 Peter Spyns, Yan Tang & Robert Meersman Pleinlaan 2, Gebouw G-10 B-1050 Brussel | Tel. +32 (0)2 629 12 37 | Fax. +32 (0)2 629 38 19 | http:// www.starlab.vub.ac.be

Transcript of An ontology engineering methodology for DOGMA

FACULTEIT WETENSCHAPPEN Vakgroep Computerwetenschappen Semantics Technology and Applications Research Lab

An ontology engineering methodology for DOGMA STAR Lab Technical Report

2008

affiliation: corresponding author: Peter Spyns keywords: ontology engineering methodology number : STAR-2008-x1 date: 05/12/2008 status: final reference: Journal of Applied Ontology 1-2 (3): 13 - 39

Peter Spyns, Yan Tang & Robert Meersman

Pleinlaan 2, Gebouw G-10 B-1050 Brussel | Tel. +32 (0)2 629 12 37 | Fax. +32 (0)2 629 38 19 | http:// www.starlab.vub.ac.be

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 1

Applied Ontology 00 (2008) 1–27 1DOI 10.3233/AO-2008-0047IOS Press1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

An ontology engineering methodologyfor DOGMA

Peter Spyns ∗, Yan Tang and Robert MeersmanSTAR Lab., Vrije Universiteit Brussel, B-1050 Brussel, BelgiumE-mail: {first_name.second_name}@vub.ac.be

Abstract. Although ontologies occupy a central place in the Semantic Web and related research domains, there are currently notmany fully fledged ontology engineering methodologies available. In this paper, we want to present an integrated methodologyfor ontology engineering from scratch, inspired by various scientific disciplines, in particular database semantics and naturallanguage processing.

Keywords: Methodology, ontology engineering, semantics, knowledge modelling

1. Introduction

1.1. Preliminary remarks

The main goal of this paper is to describe an integrated ontology engineering methodology based onthe research efforts as performed at VUB STAR Laboratory over the past ten years. Depending on ap-plication domains, project contexts and scientific inspirations, several methods and tools for ontologyengineering have been explored. We felt it has become worthwhile to consolidate them into an overarch-ing methodological framework. In addition, we want to present the ontology engineering methodologyin a didactic way that any skilled computer scientist is able to learn and apply it him/her self. Therefore,we want the methodology to be teachable and repeatable (Meersman, 2001b).

1.2. DOGMA: What’s in a name

DOGMA1 (Developing Ontology-Grounded Methods and Applications) as an ontology engineeringframework has its roots in database semantics and model theory. Some of its important underlying as-sumptions coincide with some of Guarino’s insights on formal ontology engineering – cf., e.g. Guar-ino & Giaretta (1995). The roots of DOGMA can be retraced to five initial key publications that al-ready at the moment of publication contain the core vision of the research programme of the following

*Corresponding author: Peter Spyns, STAR Lab., Vrije Universiteit Brussel, Pleinlaan 2, Gebouw G-10, B-1050 Brussel,Belgium. E-mail: [email protected].

1It goes without saying that as a researcher one cannot be dogmatic in the religious sense of the word – even if we distinguishseven dogmas in this section. Any researcher of the Free University of Brussels has subscribed literally to the fundamentalprinciple of free inquiry and research, namely that science should be based only on facts, thus rejecting any preconceived idea,passion, interest or dogma – although at STAR Laboratory we make an exception for DOGMA.

1570-5838/08/$17.00 © 2008 – IOS Press and the authors. All rights reserved

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 2

2 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

decade (Meersman, 1996, 1999a, b, 2001a, b). The fundamental ideas put forward in these papers stillstand although they underwent refinements and extensions as a result of research results and the suc-cession of (senior) researchers with various interdisciplinary backgrounds in the laboratory over thepast 10 years. A recent formalisation of DOGMA can be found in (De Leenheer & Meersman, 2005).A short-list of the most important DOGMA principles is given below.

1. The meaning of data is to be decoupled from the data itself, stored in a specific repository separatedfrom the data repository, and managed by proper methods. This is the most important “dogma”,based on the transposition of the principle of data independence (as applied in modern databases)into the principle of meaning independence (to be applied for ontologies) (Meersman, 1999b). Anapplication commits its local vocabulary to the meaning of the ontology vocabulary.A similar three-tier structure (as used for database design) holds for ontology management: aconceptual level to model the application domain or domain of discourse, a logical level thatfits the conceptual model into some modelling paradigm (e.g., TopicMaps, the triple-model, . . . )and an implementation level that contains the actual implementation in some specific languageand/or product (e.g., KIF, KL-ONE, OWL languages, F-Logic, DOGMA (meta-)lexons, Concep-tual Graphs, . . . ).

2. An ontology is based on many interpretation-independent plausible fact types about a universeof discourse. They constitute the DOGMA ontology base. In DOGMA, lexons are a more formalbut still linguistically determined representation of propositions about a domain to be modelled.They can result from different – even conflicting – situations and points of view on the universeof discourse. A lexon can be added to the DOGMA ontology base at any time. Hence, except foran intuitive linguistic interpretation, lexons have no formal interpretation or semantics (Meersman,2001b). “Contexts” have been introduced in the ontology base as an organising principle, groupingrelated lexons. A context can be considered as an identifier of a possible world, leading to specificpossible world semantics (e.g. Kripke models) and mappings between them (see also Sowa, 2000).

3. It is possible to have multiple views on and uses of the same stored conceptualisation. The “mean-ing space” can be subdivided in different ways according to the needs of a specific application thatwill add its particular semantic restrictions according to its intended usage (= an instance of a firstorder interpretation) (Meersman, 1999a). A DOGMA ontology consists of an ontology base layerand a commitment layer (Jarrar et al., 2003). Fuzzy intuitive meaning is separated from preciseformal meaning.This ontology engineering principle allows for scalability in representing and reasoning about for-mal semantics, as a generative approach can be adopted. With a reduced set of meaning buildingblocks, a large space of meaning combinations can be generated. In analogy with a principle stem-ming from the linguistics fields, this has been dubbed the double articulation principle (Spyns,2005b; Spyns et al., 2002). This practical distinction between an ontology base layer and a com-mitment layer corresponds to a certain extent to Guarino’s theoretical distinction between the con-ceptual and ontological levels. The ontological level is the level of commitments, i.e. formal restric-tions on the semantics of the primitives while the definite cognitive interpretation of a primitive issituated on the conceptual level (Guarino, 1995; p. 633).

4. An ontology is language neutral. We concur with Guarino’s definition that:

An ontology is a logical theory accounting for the intended meaning of a formal vocabulary, i.e.its ontological commitment to a particular conceptualisation of the world. The intended modelsof a logical language using such a vocabulary are constrained by its ontological commitment.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 3

P. Spyns et al. / An ontology engineering methodology for DOGMA 3

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

An ontology indirectly reflects this commitment (and the underlying conceptualisation) by ap-proximating these intended models (Guarino, 1998; p. 7).

However, we state that an ontology is language neutral – as put forward by Nirenburg & Raskin(2001) – rather than language dependent as stated by Guarino. The logical (and hence unambigu-ous) vocabulary of a formal KR language should be clearly distinguished from potentially ambigu-ous natural language vocabulary expressing the intended meaning (Spyns & De Bo, 2004). In theDOGMA framework, together with a context identifier (Meersman, 1999a), the language identifierplays a disambiguating role in determining which concept is linked with which word or naturallanguage term (De Bo et al., 2003). Meta-lexons are the language-neutral and disambiguated coun-terparts of the language dependent lexons.

5. As foundation of an ontology modelling methodology, we choose the Object Role Modelling(ORM) (Halpin, 2001) method complemented with Nijssen’s Information Analysis Method(NIAM – the precursor of ORM) (Wintraecken, 1990) because of the latter’s emphasis on naturallanguage as starting point and elicitation vehicle of the modelling exercise (Meersman, 2001b). Thepreference for an analysis method that starts from natural language is a consequence of the point ofview that all meaning (semantics) must be described and rooted in natural language; otherwise hu-mans cannot agree on common semantics (Meersman, 2001b). In addition, domain experts shouldnot be bothered with how to think in a new formal language: adequately capturing and organisingdomain knowledge is a task sufficiently demanding as it is.

6. We focus on intensional semantics instead of extensional semantics. An ontology is not affected bya change on the instance level as long as the corresponding concept and conceptual relationships arenot modified. As a corollary, value types (as used in ORM (Halpin, 2001)) or lexical object types(LOTs in NIAM parlance (Wintraecken, 1990)) are not part of an ontology. Said in another way,an ontology does not contain instances. An ontology is in principle application independent andrather represents a domain (Meersman, 1999b). An ontology can be described as a “fat” domainmodel compared to a parsimonious and optimised application model of a database schema (Spyns,2005b). Hence, applications first make a selection of the ontology vocabulary they want to committheir local vocabulary to. Note that ORM and NIAM had to be adapted at some points to serve asontology modelling methodology (Spyns, 2005a).

7. Meaning will be negotiated between the relevant stakeholders (or imposed by an authoritativeparty and accepted as such by the other stakeholders) before any meaning exchange can happen.Absolute meaning does not exist. Hence, research on negotiation (de Moor, 2005a), interactionpatterns (de Moor & Weigand, 2005), pragmatics (de Moor, 2005b) and the like (mostly stem-ming from social sciences) have been taken into account. A community layer has been included inDOGMA. It focuses on the efficient negotiation of the meaning of concepts relevant to a commu-nity of stakeholders (de Moor et al., 2006). Experience has shown that it is easier to agree on simpleconcepts and conceptual relationships rather than on semantic constraints or other more complexstructures (Meersman, 1996). Constraints are much more context and application dependent thanthe concepts.

The above mentioned principles have been orthogonally combined and applied to modelling domainssuch as financial fraud, European privacy and VAT directives, customer complaints management, ad-ministrative and health record patient data, vocational competencies and job announcement matching,professional bakery education, innovation information look-up, medical and biomedical data mediationto mention the most important ones.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 4

4 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

One can deduce from the paragraphs above that the theoretical basis (and related practical experiences)of DOGMA is too broad to be extensively described in this paper alone. The focus on the remainder ofthis paper consequently lies on the ontology engineering methodology and work flow (Section 2). Therelated work and discussion Section (3), for the same reason of space limitations, will be rather general.2

A conclusion (Section 4) ends the paper.

2. A collaborative ontology engineering methodology

2.1. Preliminary remarks

In the following sections, we present the outline of our ontology engineering methodology. The sectionis structured along the work flow an ontology engineer has to adopt when modelling in the DOGMA-style. As it is impossible to describe in full detail all the ins and outs of this methodology in the limitedspace of this paper, a (seemingly arbitrary) selection has been made of certain parts of the methodologythat are described more extensively. The methodology consists of two main steps:

1. Preparatory stages

– Section 2.2 (formulate vision statement): define the overall purpose,– Section 2.3 (conduct feasibility study): find out if the overall effort is worthwhile,– Section 2.4 (project management): organise and plan the effort,– Section 2.5 (preparation and scoping): collect the required “materials”.

2. Ontology engineering stages (in accordance with dogma 3)

– Section 2.6 (domain conceptualisation): collect, define and describe (cf. dogma 4) in agreement(cf. dogma 7) the set of plausible domain fact types (cf. dogma 2),

– Section 2.7 (application specification): select and formally constrain plausible fact types (cf.dogma 5) as needed by an application committing to their agreed meaning (cf. dogma 1).

The bulk of the description will be devoted to preparatory activities (Section 2.5), the actual domainconceptualisation methods (Section 2.6) and the application specification techniques (Section 2.7) asthese constitute the core of the ontology engineering effort. The project management (Section 2.4) ismore a continuous meta-activity. The vision statement formulation (Section 2.2) and feasibility study(Section 2.3) create and investigate the possibilities, necessities and enabling conditions. The latter threeactivities are only succinctly mentioned to complete the picture. As the arrows in the various figuresbelow show, the process is not a mere linear one.

2.2. Formulation of the vision statement

A vision is a compelling and inspiring view of a desired and possible future. The stakeholders developa shared vision and common values. The vision statement is drawn on a high “executive” level. It showsthe expectations from the stakeholders and provides a picture of the long-range results of the plannedprocess and the future accomplishments when the strategies put forward are actually implemented. Sev-eral questions can be involved when drawing the vision statement. Table 1 shows the (empty) form thatshould contain the result of the ontology vision statement phase.

2A comparison of our methodology with other ontology modelling methodologies will be the subject of a follow-up paper.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 5

P. Spyns et al. / An ontology engineering methodology for DOGMA 5

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Fig. 1. High level decomposition of the ontology engineering methodology.

Table 1

Empty ontology vision statement form

Ontology vision statement formTitle <Title of this document>Author <Name of the author of this document>Creation date <Creation date and time of this document>Last update date <Date and time of the last modification>

Responsible person(s) <List of responsible persons involved>

Coverage time <Estimate time needed for the whole project>Requirements <List of stakeholder requirements>Domain <Description of the domain of the ontology>

Ontologies <Name and short description of the ontologies involved>

Applications <Name, type and description of applications involved>

Technologies <Technologies that have to be applied>

– Why do we want to build an ontology?– What is the domain of the ontology? What will be its purpose, its context?– Which are the applications involved?– What is the necessary/basic technology (technologies) that shall we consider?– Who are the (responsible) people involved in the whole project?– What is the timeframe foreseen to create an ontology?

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 6

6 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

2.3. Feasibility study

The feasibility study is a mechanism to refine the vision statement and to find out whether a projectis actually worthwhile in terms of expected costs and benefits, technological feasibility and needed (andcommitted) resources (Schreiber et al., 1999; p. 26). Finally, a go or no-go decision is to be taken.A feasibility report not only provides recommendations of how to refine the vision statement, but alsothe material and rationale underpinning it.

2.4. Project management

This stage is actually the initialisation of management activities. Project management activities willbe on-going during the whole period of the ontology capture and development process. A project plan(including a Gantt chart) should be produced during this stage detailing at least the following items:

– Time frame of every activity;– Task resource assignment for every activity;– Project team and responsible person(s) of every activity;– Planning management.

2.5. Preparing and scoping

The preparation and scoping activity delineates more sharply the domain of the future ontology. Itscales down the problem domain in order to reach the final goal more easily. As it has a strong impacton the quality of the final ontology, we should pay sufficient attention to this activity. The steps shownin Fig. 2 result in a completed ontology scoping form (see Table 2).

Fig. 2. Steps of the knowledge preparation and scoping stage.

Table 2

Empty ontology scoping form

Ontology scoping formTitle <Title of this document> ID <ID of this document>Theme <Theme of the collection of resources>Author <The author name of this document, the contact Date <Creation date of this document>

information of the author can be attached here>Purpose <Short description of the purpose>Requirements <Reference to the collection of user requirement documents>Experts <Reference to the list of domain experts and their characteristics>Sources <The list of source materials that will be used to create the ontology>

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 7

P. Spyns et al. / An ontology engineering methodology for DOGMA 7

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Fig. 3. Activities of the knowledge scoping substage.

Of the steps displayed by Fig. 2, only the knowledge resource scoping step will be presented in moredetail. It consists of two subprocesses, namely the selection of relevant passages (Section 2.5.1) and thedefinition of scenarios (Section 2.5.2).

It is possible that, once the domain experts have been identified, “classical” brainstorming and knowl-edge elicitation techniques (e.g., see Schreiber et al., 1999, Chapter 6) are preferred during the domainconceptualisation phase (see Section 2.6). In this case the knowledge scoping step can be skipped.

The way how his step is executed greatly depends on the type of material or resources that are availableto the modellers. E.g., a collection of texts will be handled in a different way than when a databaseschema constitutes the working material. Especially, if the material will be processed automatically (e.g.,by means of natural language processing technologies), this step is meant to prevent or at least reduce thegeneration of irrelevant information. In the following sections, we will mainly describe methods suitedfor creating ontologies on the basis of textual material.

2.5.1. Select relevant passagesBy including irrelevant material in a corpus to be processed automatically, there is a greater chance

that irrelevant but formally correct output will be generated. Tests using an unsupervised text miner togenerate triples (see Spyns & Hogben, 2005, for the details) have been evaluated by domain experts.They have formulated the following method to weed out superfluous material.

– Separate the document into “core text” and “explanatory text”;– Separate out definitions;– Separate out paragraphs that are clearly not relevant to any potential application;– From the “core text” try to pick out any parts which might possibly put a constraint or obligation

on an application (without having any particular application in mind);– Find exceptions to these constraints. The exceptions should be derived from the text (or possibly

from the expert’s knowledge).

If different experts are involved, their selections of passages are to be merged and processed together(the overlapping parts and term duplications serve as an indication of the importance of certain terms).3

The output of this step is a collection of text passages. Getting rid of irrelevant passages is also helpfulin case of manual processing (see below).

2.5.2. Define scenariosThe concept of a NS (stands for Narratological Scheme coming from the literary analysis research

domain) is derived from stories (sometimes also called storytelling) that have existed for hundreds of

3Experiments on the EU Privacy Directive using selected and merged passages have been performed. A report on this issuewill follow.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 8

8 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

years as a means of exchanging (cultural) information. A story has become a favoured technique amongmanagement scientists and consultants. A NS acts a “container” or frame representing many aspects of astory in a structured way. NSs allow tracing changes. They communicate complicated ideas and conceptsin an easy-to-understand form and convey tacit knowledge that is difficult to articulate.

By drafting a NS,4 or detailed use cases, the domain experts decompose the domain of interest intoa number of smaller and more manageable micro-domains. In order to limit the cognitive overhead ofthe domain experts and to facilitate communication, stories are specified in structured natural language.NSs should be compiled by referring to the reference knowledge resources. Of course, the tacit andimplicit knowledge of a domain expert will also influence how a NS is completed. This allows ontologyengineers to focus on smaller but well-scoped descriptions.

Since the ontology engineer will be concentrating on NSs alone, domain experts need to ensure thatthe micro-domains are described in a fashion that is thorough and complete (anything not mentioned,will not be included in the ontology). The aim is to explicitly define boundaries for the conceptual scopeof the micro-domain being described. In addition to the cognitive advantage of dividing the domainof interest into more manageable parts, it is believed that such micro-domains also support distributedcollaboration: different (and distributed) groups can work on compiling different stories.

A scenario embodies the logic of a (complex) ontology processing application including expert sys-tems and software agents. A scenario concerns the main part of the NS, e.g. the episodes slot, the settingsslot and the characters slot (see Table 3). The NS includes the requirements, purpose and logic of theontology in one domain.

A narratological scheme consists of many elements as shown in Table 3. We mention in particular:

– Setting(s): the background of the NS.– Character(s): actors and stage-props of the NS. Every item that will be involved in the NS should

be mentioned here.– Episode(s): the scenario is broken down in a logical sequence. A sequence of episodes acts as the

‘plot’ of the NS. If the original material does not have a clear logical sequence or chronologicalorder, the NS is considered as a simple document accumulator.

There exists a clear difference between scenarios and episodes. Episodes mostly reveal or express theinternal logic of a scenario, although sometimes the only logically grouping factor is the fact to belongto the same (application) domain. Scenarios are considered as “raw” episodes or the original material todistil structured episodes out of. Scenarios can be chapter-alike. Table 5 shows an example of an episodederived from an e-health sample story source text (shown in Table 4). Table 5 is the episode part (hencethe identifier starting with an E) of the narratological scheme form as shown by Table 3.

2.6. Domain conceptualisation

The domain conceptualisation phase is the most crucial step in the ontology engineering methodology.During this phase the domain of interest is analysed and finally represented as a DOGMA-style ontol-ogy. This phase basically corresponds to building the DOGMA ontology base (= a pool of plausibledomain fact types rooted in intuitive natural language meaning descriptions). Several important subac-tivities result in specific deliverables. There are various (complementary) ways of collecting knowledge(Section 2.6.1, see Fig. 4). The preference for one of these depends on the available knowledge sources

4The NS forms the core of a method reported on earlier as Application Knowledge Engineering Methodology (AKEM) (Zhaoet al., 2004a).

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 9

P. Spyns et al. / An ontology engineering methodology for DOGMA 9

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Table 3

Empty narratological scheme

Narratological scheme formTitle <Title of this narratological scheme> ID <ID of this document>Theme <Theme of this narratological scheme>Author <The author name of this NS, the contact information

of the author can be attached here>Date <Creation date of this NS>

Scope <The reference to the ontology scoping form (= the resources to build this NS>

Purpose <Short description of the purpose>

SettingsS1 <Settings of the NS>

S2 <Settings of the NS>

. . . . . .

CharactersC1 <Character of the NS>

C2 <Character of the NS>

. . . . . .

EpisodesE1 <Episode or scenario of the NS>

E2 <Episode or scenario of the NS>

. . . . . .Notes <Extra information of the NS>

Table 4

Excerpt of an e-health sample story source text

Source document of e-health situation description (or story)The patient can select one “primary” GP, which represents the preferred GP of the patient. Further, the patient can select oneor more “secondary” GP’s, which can be consulted if the primary GP for example is absent. Also, the patient must assignan “emergency” GP in cases of emergencies. The emergency GP can access the patient record in case of emergencies. Whenaccessing patient records, the GPs authenticate themselves to the HC portal using an electronic ID.When the GPs perform some clinical test, such as a blood test, the patient has the freedom to use the laboratory service of LISthat is managed by the same HC portal to process the test samples. He can also choose any of the laboratories external to thesystem to carry out the prescribed tests. If the patient decides to use the LIS services to execute the laboratory tests, the HCportal generates a token and attaches it to the clinical test prescription along with the required information needed for carryingout the laboratory tests. This information is then sent to the LIS in a pseudonymous way, and the patient can generate a printoutof the token or can visually inspect it. Next, the Laboratory doctor affiliated to LIS receives the patient’s prescribed tests to beperformed. Note here that the civil identity of the patient remains anonymous to the internal actors and systems of the LIS. Afterthis, the patient provides the required samples needed for testing to the laboratory and the same token represents his identity thatmaps his clinical test prescription. The Laboratory doctor manages the tests execution (including the samples taken from thepatient’s physical presence) using analytical instruments and puts the results report back to the portal in the patient’s personalwork space.After the test results have been saved in the patient’s personal works space, it is the responsibility of the patient to present thisinformation to his GP. If the patient for some reason does not want his primary GP to know about the results, he can choose topresent the test result to his secondary GP. The patient has two ways of presenting the test results to the GP, either by taking aprintout of the laboratory results using for example a kiosk in a (pseudo) public place (e.g. a pharmacy) and present them to theGP himself or by granting permission to the GP to have an access to his personal work space on the portal. After studying thelab results, the GP has the possibility to make changes in the patient’s health profile on the portal.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 10

10 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Table 5

Episode example derived from the e-health sample story of Table 4

EpisodeE1.1 Patient authenticates to HCPE1.2 Patient consults GP1, he can choose GP2 if GP1 is not availableE1.3 Patient performs a lab testE1.3.1 HC Portal generates token with pseudonym not linked to civil identityE1.3.2 Lab receives non-civil pseudonym from PatientE1.3.3 Lab analyses samplesE1.3.4 Test results uploaded to Portal under token pseudonymE1.4 Patient reads and prints test resultsE1.5 Patient consults GP2 (not necessarily = GP1) about test resultsE1.6 Patient gets result printout from a pseudo place and present them to the GP himself or have an access to his personal

workspace by granting permission to GPE1.7 GP has the possibilities to change patient’s health profile on the portal

Fig. 4. Activities of the domain conceptualisation stage.

and overall context – as described in the ontology scoping form (see Table 2). E.g. if no textual sourcesare available (for knowledge discovery), consultation of experts (knowledge elicitation) will be an alter-native. E.g., if the ontology is issued by an official standard committee, involving many stakeholders ina distributed setting (knowledge negotiation) may be less relevant than working with a few authoritativeexperts (knowledge breakdown). Depending on whether or not lexons (see below) have been produced,the intermediary step of transforming natural language descriptions into elementary sentences is needed(Section 2.6.2) or not. In turn, these are stepwise converted into a formal domain conceptualisation(Section 2.6.3).

2.6.1. Knowledge collectionKnowledge discovery. Linguistic engineering methods offer a variety of tools that can assist the humanmodeller in his/her tasks. Under this denomination, we range tools and methods that take a text as input,process it and deliver simple phrase chunks or triples as output. In essence, one can distinguish thefollowing steps in the process of automatically learning ontologies from texts (that are in some way oranother common to the majority of methods reported):

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 11

P. Spyns et al. / An ontology engineering methodology for DOGMA 11

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Table 6

Automatically generated lexons for the privacy domain

ID Subject Predicate Object1 working_party take decision_by_simple_majority2 access_in_particular where process_transmission_of_data3 accordance protect fundamental_right4 application_of_directive take account_of_specific_ characteristic5 appropriate_publication order block_erasure6 association represent person_protection_of_right7 authority receive data_in_framework8 authority seek view_of_data_subject9 authorization regard process_such_checking_place_in_course

1. Collect, select and pre-process an appropriate corpus (done during the previous scoping activity),2. Discover sets of equivalent words and expressions,3. Validate the sets (establish concepts) with the help of a domain expert,4. Discover sets of semantic relations and extend the sets of equivalent words and expressions,5. Validate the relations and extended concept definitions with the help of a domain expert,6. Create a formal representation.

Not only terms, roles, concepts and relationships are important, but equally the circumscription (gloss)and formalisation (axioms) of the meaning of a concept or relationship. On the question how to carry outthese steps, a multitude of answers can be given. Many methods require a human intervention before theactual process can start (labelling seed terms – supervised learning, compilation/adaptation of a semanticdictionary or grammar rules for the domain, . . . ) (Gao & Zhao, 2005; Zhao et al., 2004b). Unsupervisedmethods do not need this preliminary step (Reinberger et al., 2004). However, the quality of their resultsis not that high (Spyns & Hogben, 2005) – see Table 6 for output from a privacy domain corpus. A textcorpus can preclude the use of some techniques: e.g., machine learning methods require a corpus to besufficiently large – hence, some use the Internet as additional source. Some methods require a corpus tobe pre-processed (e.g., adding POS tags, identifying sentence ends, . . . ) or are language dependent (e.g.compound detection). In short, many linguistic engineering tools can be used.

Selecting and grouping terms can be done by means of tools based on distributional analysis, statis-tics, machine learning techniques, neural networks, and others. An interesting (and recent) alternativeare social tagging tools that provide usage frequencies of tags, URLs per tag and tags per URLs.5 In thesame way as natural language processing methods cluster words of a corpus based on their distribution,social bookmarking tags can be “clustered” around the URL tagged. In both cases, it is assumed thatthe words or tags express somehow a similar meaning. Experience shows that this kind of informationprovides an interesting bottom-up complement (or maybe even alternative) to the traditional top-downor middle-out (manual) approach of engineering ontologies – see also Mika (2005). All the above men-tioned techniques, even if greatly reducing the knowledge acquisition bottle-neck, still require humanquality control and post-editing.

Current social tagging systems in general still lack a supporting semantic infrastructure to be used asa direct source of new emergent semantics – see Spyns et al. (2006) for our position in this. These socialbookmarking services closely correspond, albeit it in a more global and networked environment, to the

5See Spyns et al. (2006) for description of a DOGMA ontology-based social tagging tool.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 12

12 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

situation in which a database schema modeller invents table and column names and expects other mod-ellers to understand them as such as natural language words are used. In this case, the shared meaning,however, only resides in the brain of each individual. Well-known issues like ambiguity, cryptic abbre-viations, multilinguality etc. inhibit database interoperability and are bound to also manifest themselvesin the area of social bookmarking. Instead of language dependent words with implicitly agreed uponmeaning, social bookmarking tags should be language neutral concepts that have been explicitly definedto allow its meaning to be shared (Spyns & De Bo, 2004).

To discover semantic relationships between concepts, one can rely on specific linguistic knowledge,already established semantic networks or ontologies, co-occurrence patterns, machine readable dictio-naries, association patterns or combinations of all these. In (Karanikas & Theodoulidis, 2002) a conciseoverview is offered of commercially available tools that are useful for these purposes. To our knowledgeno comparative study has been published yet on the efficiency and effectiveness of the various tech-niques applied to ontology learning, even if overviews on ontology learning methods in general do exist(e.g. Buitelaar et al., 2005; Shamsfard & Barforoush, 2003).

Knowledge elicitation. This includes the well-known way of gathering domain experts or stakeholderswho produce on the spot a domain conceptualisation only based on their expertise and the materialavailable at hand. The ontology engineer uses all sorts of direct communication techniques, mainlythrough interviews and group meetings, to “extract” knowledge from the experts. These activities can berepeated as many times as felt needed by the stakeholders to arrive at a complete and commonly agreedupon set of results. These should be sufficiently adequate to allow verbalisation (see Section 2.6.2). Thisstage traditionally includes brainstorm meetings, abstraction exercises and sessions to jointly compile abaseline taxonomy (see Fig. 5).

Knowledge negotiation. Recently alternative collaborative techniques have been studied in our labto create a bakery ontology. As the domain experts have primarily practical skills and no experiencewhatsoever in modelling, a very down to earth method had to be devised. Moreover, the fact that domainexperts belong to their respective organisations resulted in a “community of use” – inspired approachthat stresses the importance of the elicitation context and the continuous process of aligning locallyevolving interpretations with the common ontology.6 This elicitation method has been called DOGMA-MESS (DOGMA Meaning Evolution Support System) (de Moor et al., 2006). A specific tool to supportDOGMA-MESS has been implemented (Christiaens & de Moor, 2006) using Conceptual Graphs (Sowa,

Fig. 5. Activities of the knowledge elicitation stage.

6This research has additionally resulted in a formal framework for context dependency management (De Leenheer et al.,2006).

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 13

P. Spyns et al. / An ontology engineering methodology for DOGMA 13

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Fig. 6. Cycles in the knowledge negotiation stage – reproduced from (de Moor et al., 2006).

1984) (CG) as intermediary representation formalism and a Prolog + CG implementation.7 Typicallyof DOGMA-MESS is the granular definition of roles assumed by the stakeholders of the modellingexercise.

Knowledge engineers perform a high level domain analysis (also taking into account the availableNSs of the previous steps) and create an initial set of domain templates (upper part of Fig. 6). Theseconsist of interconnected triples (similar to a binary fact type) that represent a high level view on thedomain. The templates use elements from a general upper concept type hierarchy as well as from anupper ontology. A template describes a common knowledge definition most relevant to the commoninterest of the stakeholders. Domain experts are asked to further specialise these templates (into lowercommon ontologies) as considered valid for their respective organisations (lower part with up-goingarrows in Fig. 6). This is a highly iterative process resulting in many successive versions of templates –represented in Fig. 6 by the dotted line pointing from one version to a next. Cycle after cycle, theconceptualisation is converging towards a commonly agreed upon one by retaining the most relevanttriples of the templates. Additions, modifications, and terminations of concepts and relationship andtheir meaning are the main operations performed by the domain experts (middle part of Fig. 6). To speedup the convergence process, a (still crude) relevancy score is computed. When the process stops, theremaining triples (represented as CGs) can be easily transformed into DOGMA lexons (see below). Onecould consider this iteratively converging process as a variant of the Delphi method8 applied to meaningnegotiation. The process is mainly asynchronous (hence the need for good supporting tools), while theknowledge elicitation process (Section 2.6.1) is mainly a synchronous one with many of the experts andstakeholders physically sitting together.

Knowledge breakdown. The aim of this activity is to gradually decompose the NSs defined earlier intoa hierarchical structure. The structured nature of the NSs should assist in this endeavour. Segmentationand highlighting are the major activities – see Fig. 7.

7Maintained by Ulrik Petersen.8See http://en.wikipedia.org/wiki/Delphi_method.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 14

14 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

– Segmentation (Zhao & Meersman, 2005) is used to trim a long sentence into several atomic sen-tences. The segmentation technique reduces the problem complexity. It can be applied to split a‘big’ problem into smaller and thus more easily manageable subproblems.

– Highlighting (Zhao, 2004) is mainly used as a supporting technique. Although a traditional way ofinformation management, it is still a useful mechanism even if old-fashioned and eating up experttime.The domain experts should highlight all and only the relevant terms and/or expressions representingkey concepts in the text. Three types of phrases are often highlighted: noun phrase, verbal phrase andprepositional phrase. Highlighting should be done consistently and in compliance with the visionstatement (cf. Section 2.2) and user requirements and purposes as identified in previous steps.

During this step (see Fig. 7), the knowledge engineer can rephrase certain wordings or formulationsto achieve a higher order of generalisation or abstraction. These abstractions will of course be the primecandidates to be highlighted (otherwise it is not really worthwhile to do an abstraction effort). For trace-ability reasons, it is advised to note down the abstractions made (and for which reasons). According tothe expertise level of the knowledge engineer, it is possible that (s)he segments, abstracts and highlightsin the same pass (see e.g. Table 7). For the sake of clarity, we present highlighting and segmentationseparately. Part of speech taggers combined with word frequency analysis (e.g., the TF/IDF measure)could be potentially of help when the volume of text to highlight is becoming large.

2.6.2. Verbalising elementary sentencesThe creation of basic elementary sentences or fact types is linguistic in nature.9 Based on the out-

comes of the previous activities, simple sentences representing elementary plausible fact types mustbe produced. It is possible that the domain experts and knowledge engineers have already (partially)

Fig. 7. Activities of the knowledge breakdown stage.

Table 7

Segmentation and highlighting of a source sentence from the EU privacy directive

Appropriate technical and organisational measures must be implemented to protect personal data against. . .. . .article 17 of the Directive 95/46 EC

1. Appropriate technical measures must be implemented .

2. Appropriate organisational measures must be implemented .

3. Appropriate technical measures are to protect personal data .

4. Appropriate organisational measures are to protect personal data .

9In earlier publications, most of the following steps have been dubbed DOGMA Ontology Modelling Method (DOM2).

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 15

P. Spyns et al. / An ontology engineering methodology for DOGMA 15

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

produced sufficiently fine-grained results (ORM-style binary fact types or RDF-like triples) so that itis no longer necessary to explicitly perform this activity. On the other hand, depending on the naturallanguage processing techniques and tools used, it is possible that the sentences or triples produced au-tomatically should be further split. The main characteristic of an “elementary fact type” is that it cannotbe split into smaller constituent parts. These sentences should correspond, as far as possible, to a simplesubject-verb-object structure.

2.6.3. Lexon engineeringIn the following steps, the informal conceptualisation, under the form of verbalised fact types in natural

language, is progressively transformed into language independent formal statements rooted in informalmeaning descriptions, according the STAR Lab DOGMA ontology engineering formal framework. Fig-ure 8 shows the steps involved. Even if one wants to implement an ontology in directly in RDF(S) orOWL-DL, these steps are useful as they impose a methodological rigour (resulting in e.g. the inclusionof meaning descriptions in the comment field of a resource identified by a URI – in many cases nowlacking).

Create lexons. This activity transforms the verbalised fact types into lexons. The results are formallydescribed as < (λ, ζ): term1 role1 co-role2 term2 >, where ζ denotes the context, used to grouplexons that are intuitively related to each other in the conceptualisation of the domain (Meersman,2001b). Currently, the context ζ refers to the particular section or input document from which the lexonhas been extracted. Informally we say that a lexon is a fact type that may hold for some domain, express-ing that within the context ζ and for the natural language λ the term1 may plausibly have term2 occurin role with it (and inversely term2 maintains a co-role relation with term1) (Spyns, 2005b). Lexonsoperate on the linguistic level. They are independent of specific applications and should cover relativelybroad domains (Spyns et al., 2002). They form a lexon base, which is constituted by lexons grouped bycontext and language (Spyns, 2005a). Table 8 illustrates how the verbalised plausible fact types fromsentences (e.g., S4.1) are turned into lexons.10

Refine lexons. The lexons newly created should undergo a quality check. We say that a lexon is a‘good’ one when it

– is highly re-usable,– is as simple as possible,– represents the correct information,– is binary.

Fig. 8. Activities of the lexon engineering stage.

10We omit the language identifier λ as the examples in this paper only concern English.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 16

16 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Table 8

Refined resulting lexons from the source paragraph shown in the upper part

. . . . . .S4 A data controller collecting data about a data subject. Many citizens desire not disclose their complete

personal health information in an uncontrolled way. Accurate personal health data are crucial for highquality and personalised health care services but can also be misused to deny people services.

Segmentation of S4

S4.1 A data controller collecting data

. . . . . .Lexon of S4.1 context ID term1 role1 role2 term2

Original Setting 4.1 NSID:1 DataController collect beCollected DataAfter refinement Setting 4.1 NSID:1 DataController collect beCollected PersonalData

Setting 4.1 NSID:1 DataController isAbout appliedTo Data

Table 8 shows lexons resulting from the refinement step. Text describing the setting (cf. Table 3) hasbeen segmented. Segment S4.1 has been transformed into a lexon, which has resulted in two lexonsafter refinement. We are aware that in the current state, it is not so simple to define precise and verifiablemetrics. This is an area where (DOGMA) ontology modelling too a large extent still is an art. Practicalexperience has to lead to quality requirements: e.g. avoid building lexons without a co-role. In particularautomated procedures suffer from this. Manual intervention and completion is needed. Current work onontology evaluation could be a good starting point for defining objective measures. As the lexons can beconsidered as paths in a graph (or forest), the metrics proposed in (Gangemi et al., 2005) to measure thestructural dimension of an ontology are excellent candidates to start with.

Note that the creation of elementary sentences (see Section 2.6.2) should almost automatically leadto good lexons. Nevertheless, it could happen that some elementary sentences are to be represented bymore than one lexon and vice versa. It mostly concerns knowledge implied by an elementary sentence.

Ground lexons. Natural language terms are associated, via the language and context combination, to aunique word sense represented by a concept label (e.g. the WordNet identifier person#2). With eachword sense, a gloss or explanatory information is associated that describes that particular notion. To copewith synonymy, homonymy and translation equivalents, we take a m : n relationship between naturallanguage terms and word senses into account. Lexon grounding is a conceptual exercise that links theterms and roles that constitute a lexon to existing entries of dictionaries, lexica or standards, whichcontain commonly accepted meaning. WordNet (Fellbaum, 1998; Miller, 1995) is commonly used tothis end as this resource contains many senses and associated unique definitions. An equivalent for themedical domain is the Unified Medical Language System (UMLS) (Humphreys & Lindberg, 1992).

If no adequate definitions exist,11 new definitions should be drafted by hand following terminologicalprinciples by the domain experts and other stakeholders. Table 9 shows some newly made explanationswith their term – e.g. the term PersonalData (see Table 8) has now received an explanation. In this waythe logical vocabulary of the ontology (in the form of terms and roles) receives semantics. As a result,synonyms are easily detectable (i.e. they point to the same definition). New labels have to be chosen for aset of synonyms or expressions having the same meaning. These labels are preferably (slightly) differentfrom natural language words to indicate that they operate on the conceptual level rather than the languagelevel (Spyns & De Bo, 2004). In the DOGMA ontology engineering framework, the definitions, the

11WordNet mostly contains general non technical terms.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 17

P. Spyns et al. / An ontology engineering methodology for DOGMA 17

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Table 9

Examples of grounded terms

ID Label Term Explanation1. datacontroller#1 DataController Someone who maintains and audits privacy data2. measure#1 OrganisationalMeasure Any manoeuvre that fits the organisational strategy made as part of

progress towards a goal3. personal#3datum#1 PersonalData That data relating to a living individual which if in the possession of a

data controller could by itself or with other data already in the possessionof the data controller easily identify the living individual

labels and the synonyms are entered in the DOGMA concept definition server. This is indeed a laboriousexercise, but the only way to achieve common understanding and explicitly defined semantics. Hence theimportance of re-using existing lexicographic or terminological resources as they provide descriptionsof meaning commonly agreed upon by large communities of users. Tools to exploit these resources andseamlessly enrich them (e.g., Benassi et al. (2004) will become more and more valuable).

Create meta-lexons. Meta-lexons are language-neutral and context-independent (conceptual level).A term or role (or sometimes an expression) used in a specific context and language in principle pointsto a non ambiguous meaning (or sense). The result of the preceding lexon grounding activity (descrip-tions for unique senses linked to terms) most naturally provides the basis to create meta-lexons. TheEnglish <DataController, collect, beCollected, PersonalData> lexon in the con-text identified by “setting 4.1 NSID:1” (see Table 8) is transformed using WordNet and thegrounded terms of Table 9 into the following meta-lexon: <datacontroller#1, collect#4,personal#3datum#1>.

It can happen that equivalent or synonymous lexons or triples have been produced in the course ofthe domain conceptualisation and lexon engineering steps. As it is senseless to keep such lexons (alsopossibly across language borders), meta-lexons are created. Meta-lexons are an abstraction of equivalentlexons and synonymous lexons (see the example below). Two lexons or triples are equivalent when therespective terms and roles point to the same concepts. This also holds when the sequence of terms androles is inversed. In case of an inversed sequence, words can be antonyms. Let us look at three naiveexamples (supposedly used in the same context and language):

– <bike, follow, be_followed_by, car> & <bicycle, go_after, be_fol-lowed_by, automobile>. This example illustrates that two lexons are synonymous whentheir composing parts have the same sense. Here, bike is bicycle, follow is to go after, and a car isan automobile.

– <dog, eat, be_eaten, meat> & <meat, be_eaten, eat, dog>. This sample il-lustrates that two lexons are equal when their terms and roles are the same (possibly in inversesequence).

– <bike, follow, be_followed,c ar> & <automobile,precede, is_pre-ceded_by, bicycle>. This example (equivalence) is the combination of two previous exam-ples. The terms are synonyms and inversed while the role names are antonyms.12

12Also generalisation and specialisation relationships can hold between lexons – DOGMA-MESS iteratively refines whatcould be called template lexons. This resembles ontology patterns (cf. Gangemi, 2005), but has not yet been explored in theDOGMA context.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 18

18 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Fig. 9. Activities of the lexon engineering stage.

2.7. Application specification

This stage of the modelling process corresponds to creating the first-order instantiation of the inter-pretation of (parts of) the ontology base by an application (Spyns et al., 2002). Stated otherwise, an ap-plication maps (or commits) its local vocabulary to a domain conceptualisation and meaning commonlyagreed upon as represented by a selection of meta-lexons in the ontology base. The application speci-fication activities are shown in Fig. 9. They include structuring the application domain (Section 2.7.1),adding application specific semantic restrictions on the domain conceptualisation (Section 2.7.3) andpreparing for (Section 2.7.2) and performing a validation step (Section 2.7.4). Quite some novel ontol-ogy validation and task-based evaluation methods have been proposed in the recent literature (e.g. Porzel& Malaka, 2005) that can be interesting for the DOGMA methodology as well.

2.7.1. StructuringThe knowledge structuring activity is applied on every section of the NS in natural language. Based

on several NSs, a knowledge constituent analysis is done. This takes on the form of structured nat-ural language (due to the hierarchical nature of the NS documents) where details are provided for everyknowledge constituent. Logical connections inside the episodes, more precisely on the meta-lexons mod-elled on basis of the segmented episodes, are to be expressed in terms of the more traditional semanticconstraints. A pseudo high level language is used to support the structuring mechanism (Zhao, 2004).These have no formal definition but can help to prepare the material at hand to be more easily trans-formed into specific (business) rule languages (Tang & Meersman, 2007b), expert systems formalismsor plain programming language constructs.

2.7.2. Definition of competency questionsA set of application specific competency questions (Uschold & Gruninger, 1996) is defined. Such a

set of competency questions functions as a base line reference of domain knowledge that an ontology isexpected to contain from the perspective of an application. Competency questions should be structured

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 19

P. Spyns et al. / An ontology engineering methodology for DOGMA 19

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

in a hierarchical fashion, proceeding from general to more specific. Complex questions should be builtfrom simpler ones. It is recommended that domain experts, ontology engineers as well as end users beinvolved in this activity.

2.7.3. Definition of semantic constraintsAdditional restrictions on the meta-lexons representing a specific application conceptualisation are sit-

uated in the commitment layer (see also Section 1.2). This layer, with its formal constraints, is meant forinteroperability issues between information systems, software agents and web services. These kinds ofconstraints are mathematically founded and concern rather typical DB schema constraints (cardinality,optionality, etc.) as defined in the Object Role Modelling methodology (Halpin, 2001). They also cor-respond largely to OWL restrictions (Bach et al., 2007). As the ORM methodology has been designedto model database schemas, it cannot be ported as such to the field of ontology engineering (Spyns,2005a, b). Nevertheless, we have borrowed the following semantic constraints from ORM.

1. Internal uniqueness constraints: this indicates that in a particular instance of a binary fact type,there can be only one instance of the concept involved in the relationship. For instance, if weconsider the meta-lexon <x, has, y>, then an internal uniqueness constraint on the role has,is verbalised as each x has at most one y.

2. External uniqueness constraints: this specifies that in each instantiation of a combination (or “join”)of two or more meta-lexons, there can only be one instance of the concept with the constrained rolesinvolved. Consider the meta-lexons <x, has, y> and <x, has, z>. If there is an externaluniqueness constraint imposed on the role has in both meta-lexons, this may be verbalised aseach y and z combination has at most one x.

3. Simple mandatory constraints: a simple mandatory constraint indicates that every instance of aconcept must necessarily participate in the relationship on which the constraint is defined. Takingthe meta-lexon <x, has, y> a simple mandatory constraint defined on the relationship has isverbalised as every x has at least one y.

4. Disjunctive mandatory constraints: a disjunctive mandatory constraint defined on a concept occur-ring in one or more meta-lexons implies that the concept must necessarily be involved in at leastone (but possibly more) of the relationships specified. Consider the meta-lexons <x, has, y>and <x, has, z> and suppose that a disjunctive mandatory constraint has been defined on x.Verbalising this yields x has either y or z (or both).

5. Subsets: this constraint specifies that the existence of one meta-lexon implies the existence of an-other. That is, if an instance of the “implying” meta-lexon exists, then an instance of the “implied”meta-lexon also necessarily exists. For in-stance, if the meta-lexon <x, has, y> has a subsetconstraint on the meta-lexon <x, has, z>, this may be verbalised as if x has y, thenx has z.

6. Equality: the equality constraint indicates that two instance populations are equal.7. Exclusion: this specifies that if a concept occurs in more than one meta-lexon, it cannot be involved

in more than one of the roles. Suppose that the meta-lexons <x, has, y> and <x, has, z>have an exclusion constraint defined on the role has. This is verbalised as no x has both yand z.

8. Subtypes: subtyping specifies that one concept type is a subtype of another. In many case, subtypingis combined with disjunctive mandatoriness (defining partitions).

9. Occurrence frequency: an occurrence frequency constraint is a generalised version of an internaluniqueness constraint, namely when n > 1. That is, an occurrence frequency constraint specifies

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 20

20 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Fig. 10. ORM graphical representation of commitments for a DOGMA-style ontology.

that n instances of a particular concept must be involved in a particular relationship. It is possibleto specify the cardinality as exactly n, greater than or equal to n, or greater than n.

10. Final checks imply checking for consistency in the commitments that have been specified. Thisincludes ensuring that different constraints do not contradict one another. Some patterns have beenexamined in (Jarrar & Heymans, 2006).

An alternative commitment language, called Ω-RIDL (Verheyden et al., 2005), is being designed. It isinspired by the RIDL language developed earlier (Meersman, 1982). The current version, supporting asubset of the ORM constraints, is being extended (Trog et al., 2007) to support a better conversion toand from OWL-DL (Bach et al., 2007).

An example of the result of adding semantic constraints is illustrated by Fig. 10. It displays in theORM graphical notation some of the commitments as captured from the source text of Table 4 andepisode represented by Table 5. A HC provider can manage several different HPs (health profile) whileone HP can be managed by several different HC providers (the combination of an HP and HC providermust be unique – indicated by a line with arrows above the two boxes). A HCP can access severaldifferent HPs while one HP can grant access to several different HCPs (the combination of an HP andHCP must be unique – indicated by a line with arrows above the two boxes). General practitioners(GP) can grant access to HPs (m : n relationship, no specific graphical indication). HCP (health careportal) accesses at least one HP and at least one Personal Workspace (two simple mandatory constraintsindicated by a black dot in Fig. 10). A health status is applied to at most one HP (internal uniquenessconstraint on one role indicated by the line with arrows above “applied to”).

2.7.4. Validation by means of competency questionsThe application-specific requirements are met if the competency questions defined earlier can be ade-

quately answered. An advantage of ORM is that constraints can be quite easily and naturally verbalised.Consequently, a check should be done to determine whether all competency questions may be answeredthrough some verbalisation based on the commitments defined. If this is not possible, it may be necessaryto update and redefine the commitment rules of the previous step.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 21

P. Spyns et al. / An ontology engineering methodology for DOGMA 21

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

2.8. Final remarks

The methodology described above offers a general framework for ontology engineering encompass-ing different ways of tackling the problem. Therefore, several steps and stages have been identified, eachwith its own preferred technique or method all having pros and cons. It is up to the ontology engineer toselect the most appropriate track and related methods depending on his/her particular assignment. Nev-ertheless, after the domain conceptualisation step (cf. Section 2.6), the various “tracks” come togetherand the intermediary data and material should result ideally in an equivalent (not identical) ontology,no matter which way has been preferred and which specific ontology language is chosen for an actualimplementation.

Lexons generated by machine learning most probably will differ from the ones obtained from brain-storming experts, not only in the terms or labels used but also in model itself. With respect to the formerissue, the subsequent lexon engineering phase constitutes an abstraction and selection exercise resultingin meta-lexons and grounded concepts. What counts is the explicit description of the agreed upon mean-ing, and not so much the labels and terms representing the concepts. Regarding the latter issue, we referto database schema transformations that solve a similar problem – even if this avenue of research is stilllargely unexplored for ontologies.

What remains (and is not discussed in this paper) is the concrete implementation of the conceptualisa-tion into an ontology (in the sense of Guarino & Giaretta, 1995). In our case, the lexons and meta-lexonsare stored in the OntologyServer while the concepts and relationships are entered in the ConceptDefini-tionServer.

3. Related work and discussion

A general survey on ontology modelling methodologies is provided by Gómez-Pérez et al. (2004,pp. 113–153), including short descriptions of the methods. A very recent overview is given by Sim-perl & Tempich (2006), who do not provide many details of the various methods but rather suc-cinctly situate them. As such it is an excellent good starting point for further bibliographic research.An older, but still relevant, overview can be found in (Jones et al., 1998). Some novel methodol-ogy developments in ontology engineering that concern collaborative and community aspects (Hol-sapple & Joshi, 2002; Karapiperis & Apostolou, 2006; Kotis & Vouros, 2005) are shortly discussedin (Tempich, 2006, pp. 212–215). This new trend should not be a surprise seen the overall rising in-terest in combinations of the Web2.0 with the Semantic Web – also for ontology modelling – e.g.,see Braun et al. (2007). An overview of overviews can be found on the following OntoWorld wikipage: http://ontoworld.org/wiki/Ontology_Engineering. Some of the surveys mentioned there (e.g.,Fernández-López et al., 2002) have too a large extent become obsolete with the publication of Gómez-Pérez et al. (2004).

Almost all the overviews contain to a very large extent descriptions of some “usual suspects” (Methon-tology (Fernández et al., 1997), OnToKnowledge (Sure, 2003), TOVE (Grueninger & Fox, 1995), En-terprise Ontology (Uschold & King, 1995), as well the “unified methodology” (Uschold & Gruninger,1996) of the latter two). In addition, the development of the influential knowledge engineering method-ology CommonKADS (Schreiber et al., 1999) has been an inspiration for other ontology engineeringmethods. The DOGMA modelling methodology partly draws on some best practices from some of thesemethods:

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 22

22 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

– TOVE: the inclusion of competency questions as a way to scope the domain of interest and toevaluate its conceptualization.

– Enterprise Ontology: the notions of brainstorming, the middle-out approach, and the grouping ofterms. The thorough documentation of the entire development process is also included.

– the “unified methodology”: the specification of target users as a way to scope the domain of interest.Also, the generalization of competency questions that are not rigorously formal is included in thesuggested methodology. A number of additional suggestions with regard to brainstorming have beentaken into account.

– Methontology: the inclusion of management activities.– OnToKnowledge: the inclusion of a feasibility study before any ontology development commences.– CommonKADS: the focus on specific documented deliverables for every activity in the ontology

development process.

Nevertheless, we state that general, comprehensive, didactic and scientifically based methodology‘cookbooks’ (e.g. Benjamin et al., 1994; Mizoguchi, 2003) that cover how to actually create from scratchand deploy a multilingual ontology hardly exist. In (Garcia Castro et al., 2006), a similar considerationled to designing a novel modelling methodology for bio-medical ontologies. There are the ONIONS andONIONS-II methodologies for ontology creation and merging that have been successfully applied toseveral domains (bio-medical, legal, fishery) (Gangemi et al., 2002). Even though they contain guide-lines on how to engineer an ontology, they seem to be not yet completely engineered. Some didacticpapers are tuned (or biased?) towards specific ontology engineering tools (e.g. Noy & McGuinness,2001). The well documented OnToKnowledge methodology (Sure, 2003) is a centralised ontology de-velopment method that risks becoming too much geared toward a single application. This could alsobe a potential problem for the UPON methodology that claims as its distinctive feature to be primarilyuse case driven (Missikoff & Navigli, 2005). In our view, an ontology should cover a domain ratherthan a single application. A recent methodology that puts the focus on collaborative and distributiveaspects is Diligent (Tempich, 2006). It applies practices of “meeting techniques” (argumentation theory)to steer the process of collaboratively modelling an ontology supported by internet tools (e.g., a wikienvironment) (Sure et al., 2004). A variant of Diligent has been adapted to the legal domain (Casanovaset al., 2007). The DOGMA modelling methodology, and in particular its community-based extensionDOGMA-MESS, also explicitly addresses distributed meaning negotiation and consensus building, butwithout neglecting a centralised approach either. Unlike the Methontology and UPON methods, whichare primarily building on software engineering project management methods, the DOGMA project aimsat refining and applying an integrated methodology based on solid DB-experience, insights from lin-guistics and collaborative aspects from social sciences. Just like UPON builds on the Unified SoftwareDevelopment Process, UML and related tools, DOGMA largely builds on the ORM methodology andrelated technologies.13 Also others have used UML for ontology construction – e.g. Kogout et al. (2002).However, we consider that using UML as such, without a methodology, is not that particularly helpfulfor making ontology modelling a science rather than an art. As far as we know, DOGMA is still uniquein its methodological double articulation (Spyns et al., 2002).

The main endeavour for DOGMA is to continue to grow the methodology, apply it to many more newfields and modelling situations and maintain its consistency and internal structure. Other topics consistof coping with heterogeneity (ontology merging and alignment), managing changes (ontology evolution)

13An experiment using the UML use case and activity diagrams within the DOGMA framework has been tried out aswell (Tang et al., 2006)

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 23

P. Spyns et al. / An ontology engineering methodology for DOGMA 23

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

and supporting quality assurance (ontology evaluation). These items are needed for a complete ontologylife-cycle engineering methodology. They are being researched in our lab, but the corresponding methodsare not yet integrated in the overall methodology.

A particular challenge is to link up ontologies graphically represented by ORM with popular ontol-ogy implementation languages, in particular RDFS and OWL-DL. Evidently ORM, being a conceptualmodelling language, does not compete with RDFS and OWL as these latter are W3C standards ontologyimplementation languages. In fact, a graphical conceptual model representation language should be partof any good general ontology modelling methodology. Converters are to be defined that produce RDF(S)or OWL statements when processing the conceptual model (in ORM). In analogy with the way UMLmodels are eventually transformed into in Java or C++ software modules. Note that ORM is already longtime in use as a DB schema modelling language.14

An RDFS or OWL-Lite implementation of an ontology (without instances) basically corresponds to(part of) a DOGMA ontology base. Ontology restrictions encoded with OWL-DL statements correspondto the constraints of the DOGMA commitment layer. Converting DOGMA-style ontologies into OWL-DL (and vice versa) has been taken up recently at the lab. Intermediary results show that DOGMA-styleontologies can be too a large extent transformed into RDF and OWL implementations (Bach et al., 2007).

4. Conclusions

In this paper we have described an ontology engineering methodology that integrates best practicesof many worlds (database modelling, social sciences, linguistics, knowledge representation, . . . ). Thishas been the result of various modelling experiences and exercises in multiple application domains anddifferent settings. We have presented parts of a practical and didactic methodology in the hope that otherresearchers take it up and try it out on their own examples and material. Inspired by the CommonKADSbook (Schreiber et al., 1999), and to be practical, we have introduced for each important modelling stepthe expected output (up to the point of designing typical forms). This methodological rigour allowed usto introduce various paths in the engineering process, characterised by the use of particular techniquesor material, all eventually contributing to the same goal.

Finally, we want to point out that other aspects of ontology engineering research at our lab have notbeen mentioned in this paper due to a lack of space, such as ontology evolution (De Leenheer & Mens,2008), the study of context (De Leenheer et al., 2006), ontology visualisation (Pretorius, 2005), on-tology mining (Reinberger & Spyns, 2005; Reinberger et al., 2004) and automatic evaluation (Spyns& Reinberger, 2005; Spyns & Hogben, 2005), mediation (Deray & Verheyden, 2003), emergent se-mantics (Cudré-Mauroux & Aberer, 2006), scalability (Jarrar & Meersman, 2002; Zhao & Meersman,2005), ontology-based business intelligence (O’Riain & Spyns, 2006) and business rules modelling (De-mey et al., 2002; Tang et al., 2007; Tang & Meersman, 2007a). For many of these research topics, toolsare being implemented (Christiaens & de Moor, 2006; Oberle & Spyns, 2004; Pretorius, 2005; Spynset al., 2006; Tang & Meersman, 2007b; Trog et al., 2006). Most of the tools are (or will be) integratedwith DogmaStudio,15 the lab’s central ontology workbench. So, next to having presented our ontologyengineering methodology, we have also provided a bird’s eye view on past work at VUB STAR Lab atthe occasion of its 10th anniversary.

14ORM models are being mapped into SQL DDL statements – see also Halpin (2001), resp., Demey et al. (2002), on theadvantages of using ORM for database modelling resp., ontology modelling.

15http://www.starlab.vub.ac.be/website/dogmastudio.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 24

24 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Acknowledgements

We like to thank in particular Dr. Aldo de Moor and Dr. Gang Zhao for their inspiring ideas aswell as our other past and present colleagues at VUB STAR Lab who have all contributed their partto the research described in this paper as well as different organisations for funding projects. Wegratefully acknowledge the support of the IST (FF POIROT, PRIME, KnowledgeWeb, PROLIX) andLeonardo da Vinci programmes (Co-Drive) of the European Union, the IWT-Vlaanderen (OntoBasis,SCOP, PoCehrMOM), the Vrije Universiteit Brussel (OMTFI, DOGMA), and the FWO-Vlaanderen(BonaTema)).

References

Bach, D.B., Meersman, R., Spyns, P. & Trog, D. (2007). Mapping OWL-DL into ORM/RIDL. In R. Meersman, Z. Tari &P. Herrero (eds), On the Move to Meaningful Internet Systems 2007: OTM Workshops, LNCS (Vol. 4805, pp. 742–751).Springer-Verlag.

Benassi, R., Bergamaschi, S., Fergnani, A. & Miselli, D. (2004). Extending a lexicon ontology for intelligent informationintegration. In R.L. de Mántaras & L. Saitta (eds), 16th European Conference on Artificial Intelligence Proceedings, Frontiersin Artificial Intelligence and Applications (Vol. 110, pp. 278–282). IOS Press.

Benjamin, P., Menzel, C., Mayer, R., Fillion, F., Futrell, M., de Witte, P. & Lingineni, M. (1994). Idef5 method report. TechnicalReport F33615-C-90-0012, Knowledge Based Systems Inc., Texas.

Braun, S., Schmidt, A. & Walter, A. (2007). Ontology maturing-a collaborative web 2.0 approach to ontology engineering. InProceedings of WWW 2007.

Buitelaar, P., Cimiano, P. & Magnini, B., eds (2005). Ontology Learning from Text: Methods, Applications and Evaluation.Amsterdam: IOS Press.

Casanovas, P., Casellas, N., Tempich, C., Vrandecic, D. & Benjamins, R. (2007). Opjk modeling methodology. In J. Lehmann,M.A. Biasiotti, E. Francesconi & M.T. Sagri (eds), Legal Ontologies and Artificial Intelligence Techniques, InternationalConference on Artificial Intelligence and Law (ICAIL) Workshop Series (Vol. 4, pp. 121–134). Wolf Legal Publishers.

Christiaens, S. & de Moor, A. (2006). Tool interoperability from the trenches: the case of DOGMA-MESS. In ConceptualStructures Tool Interoperability Workshop (CS-TIW 2006) at the 14th International Conference on Conceptual Structures.Aalborg University Press.

Cudré-Mauroux, P. & Aberer, K. (2006). Viewpoints on emergent semantics. Journal of Data Semantics, VI, 1–27 (LNCS 2880,Springer-Verlag).

De Bo, J., Spyns, P. & Meersman, R. (2003). Creating a “DOGMAtic” multilingual ontology infrastructure to support a semanticportal. In R. Meersman et al. (eds), On the Move to Meaningful Internet Systems 2003: OTM 2003 Workshops, LNCS (Vol.2889, pp. 253–266). Springer-Verlag.

De Leenheer, P., de Moor, A. & Meersman, R. (2006). Context dependency management in ontology engineering: a formalapproach. Journal of Data Semantics, VIII, 26–56 (LNCS 4380, Springer-Verlag).

De Leenheer, P. & Meersman, R. (2005). Towards a formal foundation of DOGMA ontology Part I: Lexon base and conceptdefinition server. Technical Report STAR-2005-06, STAR Lab, Brussel.

De Leenheer, P. & Mens, T. (2008). Ontology evolution: State of the art and future directions. In M. Hepp, P. De Leenheer,A. de Moor & Y. Sure (eds), Ontology Management for the Semantic Web, Semantic Web Services, and Business Applications(Chapter 5, pp. 131–176). Heidelberg: Springer-Verlag.

de Moor, A. (2005a). Ontology-guided meaning negotiation in communities of practice. In Proceedings of the Workshop on theDesign for Large-Scale Digital Communities at the 2nd International Conference on Communities and Technologies.

de Moor, A. (2005b). Patterns for the pragmatic web. In Proc. of the 13th International Conference on Conceptual Structures(ICCS 2005), LNAI (Vol. 3596, pp. 1–18). Springer-Verlag.

de Moor, A., De Leenheer, P. & Meersman, R. (2006). DOGMA-MESS: A meaning evolution support system for interorgani-zational ontology engineering. In Proc. of the 14th International Conference on Conceptual Structures (ICCS 2006), LNCS(Vol. 4068, pp. 189–203). Springer-Verlag.

de Moor, A. & Weigand, H. (2005). Communication pattern analysis in communities of practice. In Proc. of the 10th Interna-tional Working Conference on The Language-Action Perspective on Communication Modelling (pp. 23–29).

Demey, J., Jarrar, M. & Meersman, R. (2002). A markup language for orm business rules. In M. Schroeder & G. Wagner (eds),Proceedings of the ISWC02 International Workshop on Rule Markup Languages for Business Rules on the Semantic Web(pp. 107–128).

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 25

P. Spyns et al. / An ontology engineering methodology for DOGMA 25

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Deray, T. & Verheyden, P. (2003). Towards a semantic integration of medical relational databases by using ontologies: a casestudy. In R. Meersman (ed.), On the Move to Meaningful Internet Systems 2003: OTM 2003 Workshops, LNCS (Vol. 2889,pp. 137–150). Springer-Verlag.

Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. MIT Press.Fernández, M., Gómez-Pérez, A. & Juristo, N. (1997). Methontology: from ontological art towards ontological engineering. In

Proceedings of the AAAI97 Spring Symposium Series on Ontological Engineering (pp. 33–40), Stanford, USA.Fernández-López, M., Gómez-Pérez, A., Euzenat, J., Gangemi, A., Kalfoglou, Y., Pisanelli, D., Schorlemmer, M., Steve, G.,

Stojanovic, L., Stumme, G. & Sure, Y. (2002). A survey on methodologies for developing, maintaining, integrating, evaluat-ing and reengineering ontologies. OntoWeb Deliverable #1.4, Universidad Politécnica de Madrid.

Gangemi, A. (2005). Ontology design patterns for semantic web content. In Y. Gil, E. Motta, R. Benjamins & M. Musen (eds),Proceedings of 4th International Semantic Web Conference (ISWC 2005), LNCS (Vol. 3729, pp. 262–276). Springer-Verlag.

Gangemi, A., Catenacci, C., Ciaramita, M. & Lehmann, J. (2005). A theoretical framework for ontology evaluation andvalidation. In Proceedings of SWAP 2005, the 2nd Italian Semantic Web Workshop, CEUR Workshop Proceedings.http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-166/9.pdf.

Gangemi, A., Guarino, N., Masolo, C., Oltramari, A. & Schneider, L. (2002). Sweetening ontologies with dolce. In Ontolo-gies and the Semantic Web, Proceedings of the 13th International Conference on Knowledge Engineering and KnowledgeManagement (EKAW 2002), LNCS (Vol. 2473, pp. 166–181). Springer-Verlag.

Gao, Y. & Zhao, G. (2005). Knowledge-based information extraction: A case study of recognizing emails of nigerian frauds.In A. Montoyo, R. Munoz & E. Métais (eds), Natural Language Processing and Information Systems, Proceedings of the10th International Conference of Applications of Natural Language to Information Systems (NLDB 2005), LNCS (Vol. 3512,pp. 161–172). Springer-Verlag.

Garcia Castro, A., Rocca-Serra, P., Stevens, R., Taylor, C., Nashar, K., Ragan, M. & Sansone, S.-A. (2006). The use of conceptmaps during knowledge elicitation in ontology development processes – the nutrigenomics use case. BMC Bioinformatics,7, 267–281.

Gómez-Pérez, A., Fernández-López, M. & Corcho, O. (2004). Ontological Engineering, Advanced Information and KnowledgeProcessing Series. Springer-Verlag.

Grueninger, M. & Fox, M. (1995). Methodology for the design and evaluation of ontologies. In D. Skuce (ed.), IJCAI-95Workshop on Basic Ontological Issues in Knowledge Sharing.

Guarino, N. (1995). Formal ontology, conceptual analysis and knowledge representation. International Journal of Human-Computer Studies, 43, 625–640.

Guarino, N. (1998). Formal ontologies and information systems. In N. Guarino (ed.), Proceedings of FOIS’98 (pp. 3–15). IOSPress.

Guarino, N. & Giaretta, P. (1995). Ontologies and knowledge bases: Towards a terminological clarification. In N. Mars (ed.),Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing (pp. 25–32). IOS Press.

Halpin, T. (2001). Information Modeling and Relational Databases. Academic Press.Holsapple, C. & Joshi, K. (2002). A collaborative approach to ontology design. Communications of the ACM, 45(2), 42–47.Humphreys, B. & Lindberg, D. (1992). The unified medical language system project: a distributed experiment in improving

access to biomedical information. In K. Lun (ed.), Proceedings of the 7th World Congress on Medical Informatics (MED-INFO92) (pp. 1496–1500).

Jarrar, M., Demey, J. & Meersman, R. (2003). On reusing conceptual data modeling for ontology engineering. Journal on DataSemantics, I, 185–207 (LNCS, Springer-Verlag).

Jarrar, M. & Heymans, S. (2006). Unsatisfiability reasoning in orm conceptual schemes. In Proceeding of International Con-ference on Semantics of a Networked World (ICSNW 3006), LNCS (Vol. 4254, pp. 517–534). Springer.

Jarrar, M. & Meersman, R. (2002). Scalability and reusable in ontology modeling. In Proceedings of the International confer-ence on Advances in Infrastructure for e-Business, e-Education, e-Science, and e-Medicine on the Internet (SSGRR2002s)(only available on CD-ROM).

Jones, D., Bench-Capon, T. & Visser, P. (1998). Methodologies for ontology development. In J. Cuena (ed.), Proceedings of theIT and KNOWS Conference, XV IFIP World Computer Congress (pp. 62–75). Chapman-Hall.

Karanikas, H. & Theodoulidis, B. (2002). Knowledge discovery in text and text mining software. Technical report, UMIST–CRIM, Manchester.

Karapiperis, S. & Apostolou, D. (2006). Consensus building in collaborative ontology engineering processes. Journal of Uni-versal Knowledge Management, 1(3), 199–216.

Kogout, P. et al. (2002). UML for ontology methodology. The Knowledge Engineering Review, 17(1), 61–64.Kotis, K. & Vouros, G. (2005). Human-centered ontology engineering: The HCOME methodology. International Journal of

Knowledge and Information Systems, 10(1), 109–131.Meersman, R. (1982). The RIDL conceptual language. Research report, Int. Centre for Information Analysis Services, Control

Data, Brussels.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 26

26 P. Spyns et al. / An ontology engineering methodology for DOGMA

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Meersman, R. (1996). An essay on the role and evolution of (data)base semantics. In R. Meersman & L. Mark (eds), DatabaseApplication Semantics, Proceedings of the Sixth IFIP TC-2 Working Conference on Data Semantics (DS-6), IFIP ConferenceProceedings (Vol. 74, pp. 1–7). Chapman and Hall.

Meersman, R. (1999a). Semantics ontology tools in information system design. In Z. Ras & M. Zemankova (eds), The Pro-ceedings of the International Symposium on Methodologies for Intelligent Systems (ISMIS99), Vol. 1609. Springer.

Meersman, R. (1999b). The use of lexicons and other computer-linguistic tools in semantics, design and cooperation of databasesystems. In Y. Zhang, M. Rusinkiewicz & Y. Kambayashi (eds), The Proceedings of the Second International Symposium onCooperative Database Systems for Advanced Applications (CODAS99) (pp. 1–14). Springer.

Meersman, R. (2001a). New frontiers in modeling technology: The promise of ontologies. In E. Kerckhoffs & M. Snorek (eds),Proceedings of the 15th European Multi-Simulation Conference (pp. 3–6).

Meersman, R. (2001b). Ontologies and databases: More than a fleeting resemblance. In A. d’Atri & M. Missikoff (eds),OES/SEO 2001 Rome Workshop. Luiss Publications.

Mika, P. (2005). Ontologies are us: A unified model of social networks and semantics. In Proc. of the 4th Internat. SemanticWeb Conf. (ISWC05), LNCS (pp. 522–536). Springer.

Miller, G. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11), 39–41.Missikoff, M. & Navigli, R. (2005). Applying the unified process to large-scale ontology building. In Proceedings of 16th IFAC

World Congress (IFAC).Mizoguchi, R. (2003). Tutorial on ontological engineering. New Generation Computing, 21(2).Nirenburg, S. & Raskin, V. (2001). Ontological semantics, formal ontology, and ambiguity. In Proceedings of the Second

International Conference on Formal Ontology in Information Systems (pp. 151–161). ACM Press.Noy, N. & McGuinness, D. (2001). Ontology development 101: A guide to creating your first ontology. Technical Report

KSL-01-05 /SMI-2001-0880, Stanford Knowledge Systems Laboratory/Stanford Medical Informatics.Oberle, D. & Spyns, P. (2004). Handbook on Ontologies, International Handbooks on Information Systems (pp. 499–516).

Springer.O’Riain, S. & Spyns, P. (2006). Enhancing the business analysis function with semantics. In R. Meersman et al. (eds), On the

Move to Meaningful Internet Systems 2006: CooPIS, DOA and ODBASE, LNCS (Vol. 4275, pp. 818–835). Springer-Verlag.Porzel, R. & Malaka, R. (2005). Ontology Learning from Text: Methods, Evaluation and Applications (pp. 107–121). IOS Press.Pretorius, A. (2005). Visual analysis for ontology engineering. Journal of Visual Languages and Computing, 16(4), 359–381.Reinberger, M.-L. & Spyns, P. (2005). Unsupervised text mining for the learning of DOGMA-inspired ontologies. In P. Buite-

laar, P. Cimiano & B. Magnini (eds), Ontology Learning from Text: Methods, Applications and Evaluation (pp. 29–43).Amsterdam: IOS Press.

Reinberger, M.-L., Spyns, P., Pretorius, A. & Daelemans, W. (2004). Automatic initiation of an ontology. In R. Meersman etal. (eds), On the Move to Meaningful Internet Systems 2004: CooPIS, DOA and ODBASE (part I), LNCS (Vol. 3290, pp.600–617). Springer-Verlag.

Schreiber, G., Akkermans, H., Anjewierden, A., de Hoog, R., Shadbolt, N., Van de Velde, W. & Wielinga, B. (1999). KnowledgeEngineering and Management – The CommonKADS Methodology. MIT Press.

Shamsfard, M. & Barforoush, A. (2003). The state of the art in ontology learning: a framework for comparison. KnowledgeEngineering Review, 18(4), 293–316.

Simperl, E.P.B. & Tempich, C. (2006). Ontology engineering: a reality check. In R. Meersman et al. (eds), On the Move toMeaningful Internet Systems 2006: CooPIS, DOA and ODBASE, LNCS (Vol. 4275, pp. 836–854). Springer-Verlag.

Sowa, J. (1984). Conceptual Structures: Information Processing in Mind and Machine. Brooks Cole Publishing Co.Sowa, J. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Addison-Wesley.Spyns, P. (2005a). Adapting the object role modelling method for ontology modelling. In M.-S. Hacid, N. Murray, Z. Ras &

S. Tsumoto (eds), Foundations of Intelligent Systems, Proceedings of the 15th International Symposium on Methodologiesfor Information Systems (ISMIS05), LNAI (Vol. 3488, pp. 276–284). Springer-Verlag.

Spyns, P. (2005b). Object role modelling for ontology engineering in the DOGMA framework. In R. Meersman et al. (eds), Onthe Move to Meaningful Internet Systems 2005: OTM 2005 Workshops, LNCS (Vol. 3762, pp. 710–779). Springer-Verlag.

Spyns, P. & De Bo, J. (2004). Ontologies: a revamped cross-disciplinary buzzword or a truly promising interdisciplinary re-search topic? Linguistica Antverpiensia NS, 3, 279–292.

Spyns, P., de Moor, A., Vandenbussche, J. & Meersman, R. (2006). From folksologies to ontologies: how the twain meet. InR. Meersman et al. (eds), On the Move to Meaningful Internet Systems 2006: CooPIS, DOA and ODBASE, LNCS (Vol. 4275,pp. 738–755). Springer-Verlag.

Spyns, P. & Hogben, G. (2005). Validating an automated evaluation procedure for ontology triples in the privacy domain. InM.-F. Moens & P. Spyns (eds), Proceedings of the 18th Annual Conference on Legal Knowledge and Information Systems(JURIX 2005) (pp. 127–136). Amsterdam: IOS Press.

Spyns, P., Meersman, R. & Jarrar, M. (2002). Data modelling versus ontology engineering. SIGMOD Record Special Issue31(4), 12–17.

UN

CO

RR

ECTE

D P

RO

OF

AO ios2a v.2008/05/27 Prn:3/06/2008; 14:56 F:ao047.tex; VTEX/Diana p. 27

P. Spyns et al. / An ontology engineering methodology for DOGMA 27

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

Spyns, P. & Reinberger, M.-L. (2005). Lexically evaluating ontology triples automatically generated from text. In A. Gómez-Pérez & J. Euzenat (eds), Proceedings of the second European Semantic Web Conference, LNCS (Vol. 3532, pp. 563–577).Springer-Verlag.

Sure, Y. (2003). Methodology, tools & case studies for ontology based knowledge management. PhD thesis, AIFB Karlsruhe.Sure, Y., Tempich, C. & Vrandecic, D. (2004). SEKT methodology: Survey and initial framework. SEKT Deliverable #D7.1.1,

AIFB Karlsruhe, Karlsruhe.Tang, Y. & Meersman, R. (2007a). On constructing semantic decision tables. In R. Wagner, N. Revell & G. Pernul (eds), Pro-

ceedings of the 8th International Conference on Database and Expert Systems Applications (DEXA’2007), LNCS (Vol. 4653,pp. 34–44). Springer-Verlag.

Tang, Y. & Meersman, R. (2007b). Towards building semantic decision table with domain ontologies. In C. Man-chung, J. Liu,R. Cheung & J. Zhou (eds), International Conference on Information Technology and Management (pp. 14–22). ISM Press.

Tang, Y., Spyns, P. & Meersman, R. (2007). Towards semantically grounded decision rules using ORM+. In Proceedings of theInternational Symposium on Rule Interchange and Applications (Rule-ML 2007), LNCS (Vol. 4824). Springer-Verlag.

Tang, Y., Spyns, P., Zhao, G. & Dominguez-Burgos, A. (2006). Ontology capture methodology v3. PRIME Diverable #D7.3,STAR Lab, Brussel.

Tempich, C. (2006). Ontology engineering and routing in distributed knowledge management applications. PhD thesis, AIFBKarlsruhe.

Trog, D., Tang, Y. & Meersman, R. (2007). Towards ontological commitments with the Ω-RIDL mark-up language. In Proceed-ings of the International Symposium on Rule Interchange and Applications (Rule-ML 2007), LNCS (Vol. 4824, pp. 92–106).Springer-Verlag.

Trog, D., Vereecken, J., Christiaens, S., De Leenheer, P. & Meersman, R. (2006). T-lex: A role-based ontology engineering tool.In R. Meersman et al. (eds), On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops, LNCS (Vol. 4278).Springer-Verlag.

Uschold, M. & Gruninger, M. (1996). Ontologies: Principles, methods and applications. The Knowledge Engineering Review,11(2), 93–155.

Uschold, M. & King, M. (1995). Towards a methodology for building ontologies. In D. Skuce (ed.), IJCAI-95 Workshop onBasic Ontological Issues in Knowledge Sharing.

Verheyden, P., De Bo, J. & Meersman, R. (2005). Semantically unlocking database content through ontology-based mediation.In C. Bussler, V. Tannen & I. Fundulaki (eds), Proceedings of the second workshop on the Semantic Web and DataBases(SWDB04), LNCS (Vol. 3372, pp. 109–126). Springer-Verlag.

Wintraecken, J. (1990). The NIAM Information Analysis Method, Theory and Practice. Kluwer Academic Publishers.Zhao, G. (2004). Application semiotics engineering process. In Proceedings of the 14th International Conference of Software

Engineering and Knowledge Engineering (SEKE04) (pp. 354–359).Zhao, G., Gao, Y. & Meersman, R. (2004a). An ontology-based approach to business modelling. In Proceedings of the Interna-

tional Conference of Knowledge Engineering and Decision Support (ICKEDS2004) (pp. 213–221).Zhao, G., Kingston, J., Kerremans, K., Coppens, F., Verlinden, R., Temmerman, R. & Meersman, R. (2004b). Engineering an

ontology of financial securities fraud. In R. Meersman et al. (eds), On the Move to Meaningful Internet Systems 2004: OTM2004 Workshops, LNCS (Vol. 3292, pp. 605–620). Springer-Verlag.

Zhao, G. & Meersman, R. (2005). Architecting ontology for scalability and versatility. In R. Meersman et al. (eds), On the Moveto Meaningful Internet Systems 2005: DOA, ODBASE and CoopIS, Vol. 2, LNCS (Vol. 3761, pp. 1605–1164). Springer-Verlag.