Enabling Cross-Disciplinary E-Science by Integrating Geoscience Ontologies with Dolce

12
66 1541-1672/09/$25.00 © 2009 IEEE IEEE INTELLIGENT SYSTEMS Published by the IEEE Computer Society SEMANTIC SCIENTIFIC KNOWLEDGE INTEGRATION Enabling Cross-Disciplinary E-Science by Integrating Geoscience Ontologies with Dolce Boyan Brodaric, Geological Survey of Canada Florian Probst, SAP Research Cross-disciplinary e-Science can be enabled by using foundational ontologies such as Dolce to integrate knowledge representations from different geoscience domains. G eoscientists are increasingly concerned with big problems related to climate change, natural hazards, and environmental health. In solving these problems, they’re regularly encountering data and knowledge that are complex, diverse, distributed, and massive, causing them to turn to e-Science for operational aids. Useful e-Science resources such as high-performance computing grids, sensor networks, and large-scale data inte- gration and modeling capabilities enable greater volumes of data to be collected in situ and then processed by distributed systems aimed at stimu- lating new scientific knowledge. Although the new knowledge sometimes includes new concepts and theories, it more frequently involves new predictive models of reality that exhibit dramatically increased geospatial resolution and thematic complexity. E-Science is thus becoming more knowledge- driven via its reliance on knowledge representa- tions to achieve scientific goals. For many of the big problems, this requires geoscientists to represent and integrate knowledge from different science do- mains, which contrasts with recent trends in which integration is concentrated within single scientific domains. For example, groundwater pollution es- timation requires representation and integration of data and knowledge from at least geology, hydroge- ology, soils, and topography. These in turn often re- quire representation of data and knowledge at finer scales such as those described by chemistry and physics. However, existing ontologies are generally designed for data integration within single geosci- ence domains at specific scales—for example, for earthquake simulation, severe-storm prediction, or virtual solar observation—and not for cross- disciplinary use. Science knowledge in general, including geosci- ence, is represented implicitly and explicitly. It is represented implicitly in scientist’s heads and in the undeclared concepts behind the structural com- ponents of formal schemas, such as those used to structure data repositories, data transfer formats, Web service signatures, and automated workflow specifications. Explicit representations include in- formal expressions of geoscience classifications,

Transcript of Enabling Cross-Disciplinary E-Science by Integrating Geoscience Ontologies with Dolce

66 1541-1672/09/$25.00 © 2009 IEEE IEEE INTELLIGENT SYSTEMSPublished by the IEEE Computer Society

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

Enabling Cross-Disciplinary E-Science by Integrating Geoscience Ontologies with DolceBoyan Brodaric, Geological Survey of Canada

Florian Probst, SAP Research

Cross-disciplinary

e-Science can be

enabled by using

foundational

ontologies such

as Dolce to

integrate knowledge

representations

from different

geoscience domains.

Geoscientists are increasingly concerned with big problems related to climate

change, natural hazards, and environmental health. In solving these problems,

they’re regularly encountering data and knowledge that are complex, diverse, distributed,

and massive, causing them to turn to e-Science for operational aids. Useful e-Science

resources such as high-performance computing grids, sensor networks, and large-scale data inte-gration and modeling capabilities enable greater volumes of data to be collected in situ and then processed by distributed systems aimed at stimu-lating new scientific knowledge. Although the new knowledge sometimes includes new concepts and theories, it more frequently involves new predictive models of reality that exhibit dramatically increased geospatial resolution and thematic complexity. E-Science is thus becoming more knowledge-driven via its reliance on knowledge representa-tions to achieve scientific goals. For many of the big problems, this requires geoscientists to represent and integrate knowledge from different science do-mains, which contrasts with recent trends in which integration is concentrated within single scientific domains. For example, groundwater pollution es-timation requires representation and integration of

data and knowledge from at least geology, hydroge-ology, soils, and topography. These in turn often re-quire representation of data and knowledge at finer scales such as those described by chemistry and physics. However, existing ontologies are generally designed for data integration within single geosci-ence domains at specific scales—for example, for earthquake simulation, severe-storm prediction, or virtual solar observation—and not for cross- disciplinary use.

Science knowledge in general, including geosci-ence, is represented implicitly and explicitly. It is represented implicitly in scientist’s heads and in the undeclared concepts behind the structural com-ponents of formal schemas, such as those used to structure data repositories, data transfer formats, Web service signatures, and automated workflow specifications. Explicit representations include in-formal expressions of geoscience classifications,

JaNuarY/FEbruarY 2009 www.computer.org/intelligent 67

models, and theories, such as those found in various scientific publications (for exam-ple, notebooks, reports, papers, books, and maps), as well as those found in formally structured representations such as in scien-tific software. Science ontologies are a spe-cial form of computable representation used to denote both implicit and explicit scien-tific knowledge: science ontologies make explicit the implicit intentions behind sci-entific schema components, and they are used to directly represent explicit scientific knowledge. For example, ontologies are providing formal definitions for relations in relational-database schemas, to facilitate data integration between databases, and they’re annotating scientific publications to help scientists find relevant articles in e-Sci-ence infrastructures. They’re also beginning to provide formal definitions for the catego-ries in scientific classification systems as well as contributing to formal descriptions of geoscientific theories and models, to po-tentially help scientists find, annotate, and create knowledge in these infrastructures. In recognition of such uses, ontologies are proliferating within specific scientific do-mains, but in doing so they’re being devel-oped mainly independently, often with min-imal overlap and different premises. The geosciences are no exception: ontologies are being developed in domains concerning water, rock, soil, atmosphere, and the envi-ronment, and these efforts are largely frag-mented and isolated, resulting in minimal connection between ontologies and frequent conflicts in design. This inhibits their inte-gration and hinders the cross-disciplinary activities required by big e-Science.

One approach to integrating domain-

science ontologies involves superimpos-ing a general bridge ontology from which existing domain ontologies are specialized and from which bridging components are inherited. For example, we can bridge two domain classes by specializing a common class from the bridge ontology or by spe-cializing different classes from the bridge ontology and inheriting a relation from the bridge ontology that connects the domain classes. Foundational ontologies, which are being developed coincidently with domain-science ontologies, are a good candidate for playing the role of bridge ontology not only because of their formality, rigor, and commitment to internal coherence but also because of their generality—the contents of foundational ontologies are intended to be reused across science domains and thus should in principle be able to connect do-main ontologies. However, this is a some-what untested hypothesis, as the bridging capacity of foundational ontologies is min-imally tested against science ontologies in general, and is largely untested against geo-science ontologies in particular.

In this article, we test the notion that we can use a foundational ontology to bridge two geoscience knowledge representations in support of cross-disciplinary e-Science. This involves evaluating whether the foun-dational ontology provides sufficient con-tent to connect the domain ontologies. It also involves evaluating the semantic com-patibility of classes and therefore, second-arily, assessing how well the ontologies can be integrated without change. These two as-pects, connectivity and semantic compat-ibility, are related but not necessarily corre-lated. In other words, connectivity

can be attained without changes if the on-tologies are semantically compatible, can be attained with changes if the ontol-ogies are originally incompatible,might not be attained if the incompatibil-ity is too severe, ormight not be attained if the ontologies are compatible but the foundational ontology lacks sufficient breadth or depth to link the domain ontologies.

Specifically, we investigate the poten-tial for the Dolce foundational ontology1,2 to integrate the GeoSciML schema3 and the Sweet ontology,4 in support of a cross- disciplinary use case focused on groundwa-ter pollution. We integrate only those por-tions of the domain representations that are essential to our use case, but do so in a man-ner sufficient to guide complete integration. The sidebar “Related Work on Integrating Geoscience Ontologies” provides additional background material on these representa-tions and why they are integrated.

A Groundwater Pollution Use CaseAn example of a cross-disciplinary geo-science problem is the calculation of the Drastic Groundwater Pollution index.5 In Drastic, a numeric groundwater pollution potential value is assigned to each geospa-tial location in a geographic area. The pol-lution value at each location is calculated via a weighted sum of seven parameters, where parameter values at each location oc-cur in the range 0−10 based on data from various science domains, as shown in Table 1. The Drastic name is derived from these seven parameters, which include depth to

Table 1. Drastic parameters, domains, main entities, data sources, and knowledge sources.

Parameter Domain Entity Data source Ontology/schema

Depth to water Hydrogeology Aquifer Hydrogeology map/model Sweet

Water well Water well database GeoSciML

Recharge Hydrogeology Aquifer Hydrogeology map/model Sweet

Aquifer media Hydrogeology Aquifer Hydrogeology map/model Sweet

Geology Rock Geology map/model GeoSciML

Soil media Soil science Soil material Soils database Sweet

Topography Topography Terrain slope Digital elevation model Sweet

Impact of vadose zone Geology Geology unit Geology map/model GeoSciML

Geology Rock Geology map/model GeoSciML

Hydraulic conductivity Hydrogeology Water body Water well database GeoSciML

68 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

water, recharge, aquifer media, soil media, topography, impact of the vadose zone, and hydraulic conductivity.

In this work we focus on a single pa-rameter, aquifer media, which measures the pollution potential of the materials that constitute an aquifer and requires knowl-

edge from hydrogeology and geology. An aquifer is a water-bearing geologic body that can yield water to wells and other ground openings. Aquifers are constituted by materials that can have various degrees of consolidation, such as rocks (consoli-dated) or sediments (unconsolidated). The

specific nature of the materials determines an aquifer’s porosity, which refers to its water storage capacity and which is a key determinant of pollution potential: high porosity results in higher aquifer media values (for example, 8–10), while low po-rosity results in lower values (for example,

Related work includes efforts to integrate ontologies, specialize domain ontologies from foundational on-tologies, and represent geoscience knowledge.

Geoscience Knowledge IntegrationAlthough ontology integration has been extensively studied, including the use of a bridge ontology to integrate source ontologies,1 less work has been carried out on the integra-tion of existing science ontologies via a foundational ontol-ogy, and even less involving geosciences. The existing work focuses on using nonfoundational bridge ontologies to in- tegrate heterogeneous geoscience data or specializing a foundational ontology with a single geoscience domain on-tology.2,3 Our work differs from these prior efforts in its use of the Dolce foundational ontology to integrate two existing and widely recognized geoscience knowledge representa-tions, in support of cross-disciplinary e-Science.

Geoscience OntologiesMost geoscience knowledge is represented informally in scientific artifacts such as papers, reports, maps, notebooks, and textbooks. More formal representations include sche-mas, ontologies, computational workflows, and linguistic structures such as glossaries, thesauri, and taxonomies. How-ever, few formal representations exist for rocks and aquifers: rock details are mainly represented in several database and data transfer schema, of which GeoSciML is an international standard; aquifers are represented in the Sweet ontology in its original form as well as in a groundwater extension.3 None of the representations independently describe the rock material qualities of aquifers, such as those required by the use case, and only the GeoSciML schema and original Sweet ontology are widely available and used operationally.

We selected GeoSciML and Sweet on this basis for this work. GeoSciML and Sweet differ not only in content but also in the generality, formality, and expressiveness of their representation. The GeoSciML schema is represented in UML and in XML schema, and it denotes the entities and rela-tions typically found on geologic maps, such as geologic bodies and rock materials—it’s a relatively narrow but deep representation of a fragment of geoscience knowledge. In contrast, the Sweet ontology contains broad geoscience knowledge about the Earth’s physical environment, making it a suitable candidate for helping integrate across geosci-ence domains, but it doesn’t contain entities for rock mate-rial qualities, and its list of materials is sparse.

Sweet is expressed in OWL and has the potential to con-tain GeoSciML. Overlap between Sweet and GeoSciML is likely to occur at the lowest levels of the Sweet hierarchy and at the upper levels of the GeoSciML hierarchy. Sweet has been extended to other geoscience domains, but neither

it nor GeoSciML have been integrated into a foundational ontology.

Foundational OntologiesFoundational ontologies are intended to apply to all do-mains rather than to some aspect of one domain, and they’re normally expressed rigorously as a formal logic theory. They differ in terms of their philosophical underpin-nings, leading to nuanced variations in content, and in their expressivity, such as the nature of their logical languages.

Dolce in particular shows promise for extension to geospa-tial and geoscience domains, in that spatiality is a key crite-rion used to distinguish its most general categories, and it also has the potential to represent aspects of scientific clas-sification systems.

Specifically, Dolce includes four core categories.4 An endu-rant is an object-like entity and wholly present at any point in time during its lifespan (for example, physical bodies, amounts of matter, features). A perdurant is a process- like entity and not wholly present at a particular time (for example, events, processes, and states). A quality is a de-pendent characteristic seen as an individual entity itself that inheres in an endurant, perdurant, or abstract, such that physical endurants have physical qualities (for example, size), perdurants have temporal qualities (for example, age), nonphysical endurants have abstract qualities (for example, the value of the dollar), and an abstract has qualities disjoint from spatial and temporal location (for example, the num-ber 2, quality spaces such as the Munsell color space and its regions, such as red).

References 1. N. Noy, “Semantic Integration: A Survey of Ontology-Based

Approaches,” SIGMOD Record, Dec. 2004, vol. 33, no. 4, pp. 65−70.

2. T. Bittner, “From Top-Level to Domain Ontologies: Ecosystem Classifications as a Case Study,” Spatial Information Theory: Cognitive and Computational Foundations of Geographic In-formation Science (COSIT 07), LNCS 4736, M. Duckham et al., eds., Springer, 2007, pp. 61–77.

3. A. Tripathi and H.A. Babaie, “Developing a Modular Hydro-geology Ontology by Extending the Sweet Upper-Level On-tologies,” Computers & Geosciences, vol. 34, no. 9, 2008, pp. 1022–1033.

4. C. Masolo et al., WonderWeb Deliverable D18, Ontology Li-brary (final), ISTC-CNR Laboratory for Applied Ontology, 2003; www.loa-cnr.it/Papers/D18.pdf.

related work on integrating geoscience ontologies

JaNuarY/FEbruarY 2009 www.computer.org/intelligent 69

1–3). Drastic distinguishes between pri-mary and secondary porosity in determin-ing these values:

Primary porosity refers to the water stor-age capacity of the material’s pores and is mainly influenced by

grain size (size of the particles that con-stitute the material),particle sorting (distribution of the parti-cles in the material),consolidation degree (degree of cementa-tion of the material),fabric (presence of internal layering in the material), andaquifer thickness (thickness of the geo-logic body).

Secondary porosity refers to the water stor-age capacity of the spaces between materi-als. It’s mainly influenced by

aquifer fracture density (degree of open spaces or channels that intersect the geo-logic body and are created after its gene-sis—for example, fractures, joints, caves),aquifer thickness (thickness of the geo-logic body), andprocesses operating on the aquifer.

Drastic uses these attributes to define default aquifer media values for several common aquifer scenarios. The following axioms informally state the three represen-tative scenarios we considered in this work:

A1. Massive shale: If an aquifer is thick, exhibits low fracturing, and is constituted

by shale rock material that hosts a mas-sive fabric, then aquifer media = 2.

A2. Weathered metamorphic/igneous rock: If an aquifer consists of uncon-solidated materials that are produced by a weathering process acting on met-amorphic or igneous rock, then aquifer media = 4.

A3. Sand or gravel: If an aquifer is constituted by unconsolidated sand or gravel material with fine-grained or phaneritic (coarse-grained) particles, then aquifer media = 8.

The data used to calculate aquifer media values come mainly from water well data-bases typically managed by public water agencies and from detailed 3D models of the geologic bodies and water flows. They also come from 2D maps of the geologic bodies typically developed by national or state geological agencies. The knowledge is packaged in distinct representations: the Sweet ontology contains environmental and hydrogeologic classes such as aquifer, and the GeoSciML schema provides con-structs for geologic bodies and their constit-uent materials. Figure 1 illustrates simpli-fied fragments of Sweet and GeoSciML that are most relevant to this use case. Note that Sweet doesn’t contain rock material attri-butes, which exist in GeoSciML, although it does provide properties (hasRock, has-Substance) to connect to them, but with unspecified changes. Also, Sweet doesn’t treat an aquifer as a body of rock with parts.

Because the use case requires us to describe the parts and rock material attributes of an aquifer, we must integrate Sweet and GeoSciML, but this leads to problem P1:

Problem P1: As Sweet and GeoSciML are not integrated, it’s impossible to precisely state an aquifer’s material at-tributes or the geologic bodies that are part of an aquifer, as required by this use case.

GeoSciML is relevant to the use case not only because it distinguishes a geologic body (GeologicUnit) from its constitu-ent material (RockMaterial) but also because it further delineates the materi-als—for example, into consolidated ma-terial (Rock) and sediments. It also pos-sesses most of the primary and secondary porosity attributes required by the use case, lacking only fracturing density. Here, an at-tribute is a slot used to describe an entity, such as lithology in Figure 1, and an at-tribute value is a filler for the slot, such as Gneiss. Then, because GeoSciML attri-bute values are represented as instances, not types, common subsumption reasoning cannot be performed over attribute values in that the values are not part of the first- order representation. This is shown in Fig-ure 1 where, for example, attribute values for lithology are represented as instances of ControlledConcept, preventing first-order expression of taxonomic and other relations such as “Gneiss is a sub-class of MetamorphicRock.” Although GeoSciML enables some attribute values

Aquifer

+ hasRock [0..*]+ hasSubstance [0..*]

UndergroundWater

BodyOfWater GeologicUnit

+ thickness [0..*]

RockMaterial

+ grainSize [0..*]+ sorting [0..*]+ consolidationDegree+ fabric: ControlledConcept [0..*]+ lithology: ControlledConcept [0..*]

GeologicEvent

+ process [1..*]

Gneiss

VocabRelation

Rock

0..*

geologicHistory

composition

Sweet GeoSciML

ControlledConcept

MetamorphicRock

+target

+source

Figure 1. Simplified Sweet (left, in blue) and GeoSciML (right, in gold) fragments. Solid arrows denote subsumption, dashed arrows denote instantiation, and solid lines denote relations.

Aquifer

+ hasRock [0..*]+ hasSubstance [0..*]

UndergroundWater

BodyOfWater GeologicUnit

+ thickness [0..*]

RockMaterial

+ grainSize [0..*]+ sorting [0..*]+ consolidationDegree+ fabric: ControlledConcept [0..*]+ lithology: ControlledConcept [0..*]

GeologicEvent

+ process [1..*]

Gneiss

VocabRelation

Rock

0..*

geologicHistory

composition

Sweet GeoSciML

ControlledConcept

MetamorphicRock

+target

+source

70 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

to be related as instances in a second- order representation via VocabRelation (which relates source and target ControlledConcept instances), this de-sign is impractical because it requires cus-tomized tools for processing and leads to the following problem:

Problem P2: Taxonomic and other relations can’t be expressed between GeoSciML attribute values. For ex-ample, Gneiss can’t be expressed as a subclass of MetamorphicRock di-rectly in the ontology, as the second ax-iom would ideally require.

Although P2 doesn’t inhibit GeoSciML’s primary purpose of data transmission, it does entail reasoning limitations such as the determination of subsumed classes. Similar issues are found in Sweet to a limited de-gree, inasmuch as quality values and their relations aren’t represented in the ontol-ogy. So, we decided to evaluate whether the Dolce foundational ontology could help in-tegrate Sweet and GeoSciML to adequately satisfy the use case, by remedying problems P1 and P2 and enabling representation of all three axioms.

Our ApproachWe used the Sweet 1.1 (http://sweet.jpl.nasa.gov/ontology), GeoSciML 2.0 rc3 (http://geosciml.org), and Dolce 2.1 Lite-Plus (OWL 397; www.loa-cnr.it/ontologies) versions in this work. The final integrated ontology, called Dolce Rocks, is represented in OWL-DL (ifgi.uni-muenster.de/~probsfl/ontologies/ drocks.owl). Following OWL terminology conventions, Dolce Rocks consists of

classes, which refer to categories that can have individuals (“instances”) as members; properties, which refer to directed rela-tions between instances of a class, called the domain, and instances of another class or datatype, called the range; restrictions, which are constraints on properties that apply locally within a class; andinstances, which are single entities that can instantiate a class or property.

We followed these general principles in constructing Dolce Rocks:

Reusability. The source ontologies should

remain unaltered.Completeness. The integrated ontology should contain contents to satisfy the use case.Connectivity. The domain ontologies should be related to satisfy the use case.Precision. Ontologic precision of the re-lations should be increased—for exam-ple, the properties within and between source ontologies should make finer on-tological distinctions.Consistency. Semantic incompatibilities should be minimized in the integrated ontology.Coherency. Coherency should be im-proved by reducing complexity and add-ing explanations of ontological choices.

In terms of notation, we show class and property names in a monospace font and preceded by an ontology namespace des-ignator. For example, dol#endurant denotes the Dolce endurant class, swe#Rock denotes the Sweet Rock class, gsml#Rock denotes the GeoSciML Rock class, and drok#FractureDensity- Quality refers to Dolce Rocks’ FractureDensityQuality class. (See the sidebar “Related Work on Integrating Geoscience Ontologies” for definitions of endurant and other categories.) The inte-gration fundamentals considered in devel-oping Dolce Rocks include mapping, align-ment, and merger. Mapping refers to the similarity relations established between classes or properties, and alignment and merger refer to how the source ontologies are realized in the integrated ontology: Are they referenced (alignment) or embedded and possibly changed (merger) within the final integrated ontology? Mapping typi-cally includes binary relations for equiv-

alence (≡), overlap (~), disjoint (≠), and subsumption (⊂), with specific roles for participants in the subsumption relation: a subsumed class is called a subclass, and the subsuming class is called a superclass.

These relations can be interpreted ex-tensionally or intensionally, where an ex-tension is the collection of individuals that instantiate a class or property, and an inten-sion is the meaning reflected by the ontol-ogy structure of a class or property. Exten-sionally, equivalence implies that extensions are identical, overlap implies that extensions share some but not all individuals, disjoint implies that extensions don’t share any in-dividuals, and subsumption implies that an extension is fully contained within another.

A problem with the extensional approach is its inability to adequately handle situa-tions in which extensions are identical but intensions differ. For example, Living-Person and BreathingPerson have the same extension and would thus be mapped as equivalent (≡ (LivingPerson, BreathingPerson)), despite differences in intension. In contrast, intensional ap-proaches define the similarity relations in terms of ontology structure. Equivalence then implies the same ontology structure (for example, classes have the same super-classes and property restrictions, whereas properties have the same domains and ranges). Overlap implies partially shared ontology structure, disjoint implies differ-ent ontology structures, and subsumption implies a narrower ontology structure (for example, a subsumed class adds properties or restrictions to its subsuming class, a sub-sumed property specializes the domain or range of its subsuming property).

However, what constitutes the same or a shared ontology structure can vary widely from one method to another, such as graph-based or logic-based. Under most inten-sional approaches, LivingPerson and BreathingPerson are typically not equiv-alent, because their intensions differ. One doesn’t subsume the other because one isn’t a specialization of the other (they both are subsumed by Person). They’re not structur-ally disjoint because they share a superclass (Person) and thus have some shared mean-ing and would overlap (~ (LivingPerson, BreathingPerson)). In this work, we de-veloped intensional mappings because no instances were available, and many classes were not fully defined and thus their exten-sions could not be fully scoped.

alignment strives

for maximum reusability

of ontologies but

doesn’t resolve semantic

incompatibilities resulting

from integration.

JaNuarY/FEbruarY 2009 www.computer.org/intelligent 71

We first mapped Sweet and GeoSciML to Dolce and then to each other, following the intuition that mapping to the founda-tional ontology would clarify mappings be-tween domain ontologies. We directly ex-pressed the ≡, ⊂, and ≠ relations between classes using analogous class operators in OWL, and only the ⊂ relation was avail-able for OWL properties. Although the ~ relation doesn’t have a direct operator in de-scription logics (on which OWL is based), we expressed it with ⊂ such that classes or properties that overlapped became sub-sumed under a more general foundational class or property. The alignment or merger options determine how the integrated on-tology deploys the mappings and how it is related to the source ontologies. Alignment assumes the source ontologies are semanti-cally compatible and thus highly reusable, so they can be integrated without change. Under alignment, the integrated ontology consists of the mappings plus any additional classes and properties, and the original on-tologies are referenced in their unaltered state. However, this is problematic when the original ontologies aren’t semantically compatible, because the incompatibilities

would be exposed and continue to persist in the integrated ontology.

For example, a domain subclass might be mapped to (subsumed by) disjoint su-perclasses—its original superclass and a more fitting superclass in the foundational ontology—leading to semantic incompat-ibility in the integrated ontology because both subsumption relations are retained un-der alignment. Removing the original sub-sumption relation would fix the problem, but this entails a change in the original on-tology, which violates the reusability goal. In essence, alignment strives for maximum reusability of ontologies but doesn’t resolve semantic incompatibilities resulting from integration. In contrast, in a merger, a new ontology is produced consisting of possibly modified versions of the relevant pieces of the source ontologies as well as the map-pings; the new ontology is thus an internally consistent variant of its sources. Merger es-sentially sacrifices reusability for seman-tic compatibility. In this work, we opted to align ontologies to investigate, as a second-ary goal, the semantic compatibility and hence reusability of the original domain ontologies.

The Dolce Rocks OntologyDolce Rocks contains the Dolce founda-tional ontology, which subsumes the Sweet and GeoSciML fragments. This results in new and finer connections within and be-tween the domain fragments, but also in an integrated ontology that contains se-mantic incompatibilities due to alignment problems.

Mapping Dolce + GeoSciMLFigure 2 shows the mapping between Dolce and the GeoSciML fragment relevant to the use case. As an upgrade to the original GeoSciML representation, the attributes that map to Dolce qualities (for example, grain size) have their values represented as Dolce abstract region classes instead of instances, and these regions are connected to their respective qualities via the dol#q-loca-tion relation: ⊂ (gsml#Phaneritic, gsml#GrainSizeRegion) and dol#q-location(gsml#GrainSizeQuality, gsml#GrainSizeRegion). This is ad-vantageous not only because it enables sub-sumption reasoning on former GeoSciML attribute values, but also because such val-ues denote regions in a space, allowing

BodyOfGround RockMaterial

GeologicUnit

GrainSizeQuality GrainSizeRegion

Fine-grainedRegion

Shale

Aquifer

Role

Physical-Endurant

Amount-of-Matter

EndurantPerdurant

WeatheringProcess

GeologicProcess

Quality Abstract

Region

Process Physical-Quality PhysicalRegionFeature

Fabric

Physical-Body

Concept

Contaminant

Massive

Chemical

Nitrate

Gneiss

MetamorphicRock SedimentaryRock

Rock

Dust

+q-location

+inherent-in +has-quality+participant-in +participant

+host+host-of

+generic-constituent-of+generic-constituent

+classifies+classified-by

AquiferMediaQuality

AquiferMediaRegion2

+ parameterValue = 2

Figure 2. A fragment of the Dolce Rocks ontology. The Dolce classes are white, Sweet classes are light blue, GeoSciML classes are medium blue, and Dolce Rocks classes are dark blue.

72 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

them to be further defined in terms of the dimensions constituting the space. For example, gsml#PhaneriticRegion is sometimes defined as a grain size > 1 mm. Either the GeoSciML attributes that aren’t qualities are mapped to an ap-propriate Dolce class and connected with the relevant property, or their values are mapped to subclasses. An example of the first case is a rock material that hosts a fab-ric feature. For example, gsml#Foliated and gsml#Massive refer to the pres-ence and absence, respectively, of a layer-ing pattern hosted by the rock material: dol#host-of (gsml#RockMaterial, gsml#Fabric), ⊂ (gsml#Fabric, dol# feature), and ⊂ (gsml#Massive, gsml#Fabric). An example of the sec-ond case occurs when the values for the lithology attribute of rock mate-rial become subclasses of rock material: ⊂ (gsml#Rock, gsml#RockMaterial), ⊂ (gsml#MetamorphicRock, gsml# Rock), and ⊂ (gsml#Gneiss, gsml# MetamorphicRock). Figure 3 shows

some of the mappings between Dolce and the GeoSciML rock material class, and how as a result we use properties inherited from Dolce to connect GeoSciML classes.

Mapping Dolce + SweetThe mapping from Sweet to Dolce is also complete for a large Sweet fragment, as each Sweet class in this fragment can be mapped to a Dolce class (see Table 2). This results in an enriched integrated ontology, because Dolce makes finer semantic distinctions for both classes and properties. For exam-ple, Sweet doesn’t originally include classes such as those in Dolce for State, Role, and Feature, and many Sweet classes are orphaned because they do not have super-classes. Although the resultant integrated ontology is more precise, it’s also less con-sistent, as semantic incompatibilities are in-troduced via alignment.

Orphans. Orphaned Sweet classes are mapped to Dolce superclasses. For exam-ple, swe#State denotes a physical state of

matter such as being liquid, and it maps to dol#state, a perdurant (⊂ (swe#State, dol#state)). Note that here a physical state refers to an aggregate state or phase: it does not denote matter, such as a chemi-cal, mineral, or rock material, but rather a physical mode of being manifest for some time by some matter—for example, an amount of water manifest as solid, liquid, or gas. The key point is that all orphaned classes are subsumed by some Dolce class, which increases the potential for connectiv-ity between the domain ontologies. It also increases semantic precision because the orphans inherit a rigorous structure.

States. Representing physical states as perdurants also potentially helps simplify some complex Sweet structures, such as a class having multiple superclasses. For ex-ample, swe#Rock is subsumed by both swe#MixedSubstance and swe#Solid (which in turn is subsumed by swe#State), indicating rocks exist in a solid state and have various constituents. If swe#State

⊂ (gsml#RockMaterial, dol#amount-of-matter)⊂ (gsml#Fabric, dol#feature)dol#participant-in (gsml#RockMaterial, gsml#GeologicProcess)dol#participant-in (gsml#RockMaterial, gsml#GeologicEvent)dol#host-of (gsml#RockMaterial, gsml#Fabric)dol#generic-constituent (gsml#RockMaterial, gsml#ConstituentPart)dol#plays (gsml#RockMaterial, gsml#CompositionPart)dol#has-quality (gsml#RockMaterial, gsml#GrainSizeQuality)dol#q-location (gsml#GrainSizeQuality, gsml#GrainSizeRegion)dol#has-quality (gsml#RockMaterial, gsml#SortingQuality)dol#q-location (gsml#SortingQuality, gsml#SortingRegion)dol#has-quality (gsml#RockMaterial, gsml#ConsolidationDegreeQuality)dol#q-location (gsml#ConsolidationDegreeQuality,

gsml#ConsolidationDegreeRegion)

⊂ (gsml#Rock, gsml#RockMaterial)dol#has-quality (gsml#Rock, gsml#ConsolidationDegreeQuality)dol#q-location (gsml#ConsolidationDegreeQuality, gsml#ConsolidatedRegion)

⊂ (gsml#MetamorphicRock, gsml#Rock)dol#participant-in (gsml#MetamorphicRock, gsml#MetamorphicProcess)dol#generic-constituent (gsml#MetamorphicRock, gsml#Crystal)

⊂ (gsml#Gneiss, gsml#MetamorphicRock)dol#host-of (gsml#Gneiss, gsml#Foliated)dol#has-quality (gsml#Gneiss, gsml#GrainSizeQuality)dol#q-location (gsml#GrainSizeQuality, gsml#PhaneriticRegion)

Figure 3. List of mappings and relations involving some GeoSciML rock material classes and Dolce. Note how we use Dolce properties to connect GeoSciML classes.

JaNuarY/FEbruarY 2009 www.computer.org/intelligent 73

would map to a perdurant, as described ear-lier, then swe#Rock could be related to swe#Solid via the dol#participant-in property instead of via the original sub-sumption relation, which would ideally be removed. Multiple superclasses could thus be eliminated if the ontologies are to be merged. However, in an alignment situa-tion where the original ontologies are pre-served in pursuit of reusability, the original subsumption relation in Sweet cannot be re-moved, because this would entail a change to Sweet. As a consequence of such align-ment, swe#Rock is subsumed both by an endurant (swe#MixedSubstance) and a perdurant (swe#State), which are disjoint, resulting in a semantic incompatibility.

Roles. As with states, some structural com-plexity might be reduced by replacing sub-sumption relations with more specific prop-erties, in this case with properties that link a role to the entity playing the role. How-ever, as we discussed earlier and for the same reasons, this reduction in structural complexity is only realized in merger. For example, swe#Dust is originally sub-sumed by both swe#Contaminant and swe#Particulate. But if we consider contaminants to be roles played by en-tities such as dust (that is, ⊂ (swe# Contaminant, dol#role)), then the orig-inal ⊂ (swe#Dust, swe#Contaminant) can be replaced by dol#plays (swe# Dust, swe#Contaminant) in a merger situation, or exist alongside it in an align-ment situation. The coexistence of the orig-inal and new relations under alignment

leads to the semantic incompatibility of dust subsumed by both a physical endurant (swe#Particulate) and nonphysical en-durant (dol#role), which are disjoint.

Features. Dolce features are understood as entities hosted by a physical endurant. For example, a cliff has a surface that is a fea-ture of the cliff. The surface is an individual entity that is dependent on the cliff and is not a part that can be removed: even if some rocks are chipped from the surface, the sur-face remains. However, Sweet doesn’t dis-tinguish between the host physical endu-rant (cliff, ground, mountain, sea) and the feature (surface, hole, mountainside, sea floor). For example, swe#EarthRealm refers to the physical layers that are parts of the Earth, and it subsumes both features (swe#SeaFloor) and the physical bod-ies that host them (swe#BodyOfGround, swe#BodyOfWater), thus overlapping Dolce’s classes for hosts and features. This results in some subclasses of swe# EarthRealm being mapped to hosts and others to features: ⊂ (swe#SeaFloor, dol# feature), ⊂ (swe#BodyOfGround, dol# physical-body). Furthermore, swe# EarthRealm itself is mapped to dol# physical-endurant, ⊂ (swe#EarthRealm, dol#physical-endurant), which is a Dolce class that subsumes both hosts and features. Thus, while these mappings in-crease semantic precision through finer ontologic distinctions, they also increase the structural complexity of the integrated ontology in the case of alignment ow-ing to a greater number of superclasses.

This is exemplified by classes such as swe#SeaFloor, which end up having mul-tiple superclasses because the original sub-sumption relation to the Sweet superclass is retained alongside the new subsump-tion relation to the Dolce superclass. Under merger, the originals could be removed and the structural complexity reduced.

Properties. A property critical to the use case is swe#hasSubstance, which has a material substance as its range and an un-specified domain. This enables, for example, an aquifer to be related to a rock material (swe#hasSubstance (swe#Aquifer, gsml#RockMaterial)). Another prop-erty that can potentially be used for this pur-pose is swe#hasRock, which has an un-specified domain and range and could thus presumably also relate an aquifer to a rock material or perhaps even to another geologic body. However, Dolce provides alternate properties with greater semantic precision. These include a property for the relation

between an endurant and its constituent material, such as an aquifer and a rock ma-terial (dol#generic-constituent, dol#generic-constituent-of);between a material and some pro-cess or event, such as a rock ma-terial and the process that gener-ated it (dol#participant, dol# participant-in);between a quality and the material in which it inheres, such as a grain size and a rock material (dol#inherent-in, dol#hasquality); and

Table 2. Some mappings between Sweet and Dolce.

Dolce ≡ Sweet ⊃ Sweet

Physical-body BodyofGround, BodyOfWater

Feature SeaFloor, CoronaHole

Material-Artifact Infrastructure, Dam, Product

Physical-Object LivingThing, MarineAnimal

Amount-of-Matter Substance

Activity HumanActivity

Physical-Phenomenon Phenomena

Process Process

Role Contaminant

State StateOfMatter, Solid

Quality Quantity, Moisture

74 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

between a feature and its host, such as a fabric and a rock material (dol#host, dol#host-of).

Because of this increased precision, we use the more granular Dolce properties to relate Sweet classes to GeoSciML rock materials.

Mapping Dolce + Sweet + GeoSciMLMapping Sweet and GeoSciML, af-ter both are subsumed by Dolce, requires the addition of some ⊂ and ≡ relations to the integrated ontology. This includes ⊂ (swe#Aquifer, gsml#GeologicUnit) and ≡ (swe#Rock,gsml#Rock), as shown in Figure 2. Qualities and associated re-gions for fracture density and aquifer media are also added to satisfy the use case, as is a property that denotes the historical succes-sion of rock materials.

• DiscussionThe main achievement relates to our pri-mary objective of connecting cross- disciplinary representations in support of the use case. The results show that a foun-dational ontology can indeed play a sup-portive role in bridging domain ontologies for e-Science purposes. This role involves provision of a rigorous, consistent, well- defined superstructure that can add structure and precision to the subsumed domain on-tologies, and increased connectivity through the inheritance of foundational proper-ties. For example, many Sweet endurants, such as swe#Aquifer, swe#Crust, or swe#Mantle, can now have their rock constituents fully described via GeoSciML rock material classes. This is particularly evident for the use case, which is satisfied by the integrated ontology via resolution of problems P1 and P2, and via representation

of the three axioms A1 to A3. Problem P1 is resolved by adding ⊂

(swe#Aquifer, gsml#GeologicUnit) to the integrated ontology, following the hydrogeological notion that an aquifer is a body of rock material. This enables an aquifer to be fully described geologi-cally through the inheritance of ontologi-cal structure from gsml#GeologicUnit. swe#Aquifer is thus upgraded via the in-heritance of geological qualities required by the use case (for example, thickness) and via the inheritance of more semantically pre-cise properties from Dolce that connect an aquifer with rock materials and processes, and that also connect rock materials with processes, qualities, and features.

Problem P2 is resolved by representing attribute values as region classes instead of instances, thus enabling subsumption and other relations between attribute values, and

∀ Aq, Ma, Sh, ThQ, ThR, FDQ, LoR • [swe#Aquifer (Aq) ∧ gsml#Shale (Sh) ∧ gsml#Massive (Ma) ∧ gsml#ThicknessQuality (ThQ) ∧ gsml#ThickRegion (ThR) ∧ drok#FractureDensityQuality (FDQ) ∧ gsml#LowRegion (LoR) ∧ dol#generic-constituent (Aq, Sh) ∧ dol#host-of (Sh, Ma) ∧ dol#has-quality (Aq, ThQ) ∧ dol#q-location (ThQ, ThR) ∧ dol#has-quality (Aq, FDQ) ∧ dol#q-location (FDQ, LoR)

→ ∃ AqMQ, AqMR2 • [drok#AquiferMediaQuality (AqMQ ) ∧ dol#has-quality (Aq, AqMQ) ∧ drok#AquiferMediaRegion2 (AqMR2) ∧ dol#q-location (AqM, AqMR2)]]

Figure 4. Axiom A1*. Here, Axiom 1 is represented using Dolce properties to connect Sweet and GeoSciML classes.

∀ Aq, RoM, CoDQ, UnCR, Wth, MeR, IgR • [swe#Aquifer (Aq) ∧ gsml#RockMaterial (RoM) ∧ gsml#ConsolidationDegreeQuality (CoDQ) ∧ gsml#UnConsolidatedRegion (UnCR) ∧ swe#WeatheringProcess (Wth) ∧ gsml#MetamorphicRock(MeR) ∧ gsml#IgneousRock (IgR) ∧ dol#generic-constituent (Aq, RoM) ∧ dol#has-quality (RoM, CoDQ) ∧ dol#q-location (CoDQ, UnCR) ∧ dol#participant-in (Aq , Wth) ∧ dol#participant-in (MeR, Wth) ∧ dol#participant-in (IgR, Wth) ∧ (drok#rockMaterialSuccessor (MeR, RoM) ∨ drok#rockMaterialSuccessor (IgR, RoM))

→ ∃ AqMQ, AqMR4 • [drok#AquiferMediaQuality (AqMQ ) ∧ dol#has-quality (Aq ,AqMQ) ∧ drok#AquiferMediaRegion4 (AqMR4) ∧ dol#q-location (AqM,AqMR4)]]

Figure 5. Axiom A2*. Axiom 2 is represented using Dolce properties to connect Sweet and GeoSciML classes.

JaNuarY/FEbruarY 2009 www.computer.org/intelligent 75

connecting qualities to value regions via the Dolce dol#q-location property. The in-tegrated ontology can then uniquely enable axioms A1, A2, and A3 to be expressed as A1*, A2*, and A3*, respectively, using Dolce properties and a mixture of classes from Sweet and GeoSciML and supplemented with qualities and regions for fracture den-sity and aquifer media. These axioms aren’t expressible with any single domain ontol-ogy or any integrated pair. They require the fully integrated Dolce Rocks ontology, in which the Sweet and GeoSciML classes are subsumed by Dolce classes and inherit Dolce properties, enabling classes from the environmental-hydrogeologic and geologic domains to be related in new ways—for example, via dol#q-location—or with greater semantic precision—for example, via dol#generic-constituent. Fig-ures 4 through 6 present the axioms using first-order-logic notation.

The second achievement relates to our secondary objective of evaluating the reus-ability of the original domain ontologies. The results clearly demonstrate that the foundational ontology and a domain ontol-ogy (Sweet) are semantically incompatible in various places, and that these incompat-ibilities are propagated to the integrated on-tology under an alignment approach. This results in semantic inconsistency in the in-tegrated ontology as well as structural com-plexity owing to the introduction of multiple superclasses. It’s also evident that a merger

integration strategy would have resulted in a superior integrated ontology, because the incompatibilities could have been removed, but this would essentially have required sig-nificant shifts in the meaning of compo-nents of the original domain ontology.

EvaluationWe evaluated the integrated ontology against the guiding principles outlined in the “Our Approach” section. The domain ontologies can’t be reused from the perspec-tive of integration, because parts of Sweet must be changed to eliminate semantic in-compatibilities in the integrated ontology. They can be reused from the use-case per-spective, because the ontologies as a group are now connected to satisfy the use case.

We can also demonstrate complete-ness and increased connectivity by sat-isfying the use case: we map all domain classes relevant to the use case to Dolce and connect all classes relevant to the use case by introducing foundational proper-ties (such as dol#plays, dol#host, and dol#q-location) and classes (such as dol#state, dol#role, dol#feature, and dol#region), whose finer ontologic distinctions increase semantic precision. However, integrating the ontologies through alignment reduces semantic consistency as semantic incompatibilities are introduced and retained in the integrated ontology. For example, in the aligned ontology an aqui-fer is both a body of water and a body of

ground, a rock is both an endurant and a perdurant, and dust is both a physical and a nonphysical endurant. Note that a merger approach would have eliminated such in-compatibilities by removing or changing parts of the original ontologies.

Under alignment, structural complexity also increases as more classes are subsumed by multiple superclasses, leading to a var-ied evaluation for the coherency of the inte-grated ontology. On the one hand, semantic inconsistencies and greater structural com-plexity clearly reduce coherency for some parts of the integrated ontology, but this is balanced on the other hand by increased ex-planatory power owing to the presence of richer properties and classes whose distinc-tions are now well articulated and founded on foundational principles. So, the domain science classes are also more coherent for some parts of the integrated ontology.

Issues in representing Geoscientific KnowledgeAlthough the domain ontologies were suc-cessfully connected and the use case was satisfied, we encountered several issues dur-ing integration that require further attention. These are primarily related to the enhanced representation of scientific knowledge, in-cluding qualities and the representation of scientific classifications.

Dolce qualities. Dolce endurants, perdur-ants, and qualities have unary qualities

∀ Aq, Sd, Gr, CoDQ, GrSQ, UnCR, FgR, PhR •[swe#Aquifer (Aq) ∧ gsml#Sand (Sd) ∧ gsml#Gravel (Gr ) ∧ gsml#ConsolidationDegreeQuality (CoD ) ∧ gsml#GrainSizeQuality (GrSQ) ∧ gsml#UnConsolidatedRegion (UnCR) ∧ gsml#Fine-grainedRegion (FgR ) ∧ gsml#PhaneriticRegion (PhR ) ∧ ((dol#generic-constituent (Aq , Sd ) ∧ dol#has-quality (Sd , CoDQ) ∧ dol#has-quality (Sd , GrSQ)) ∨ (dol#generic-constituent (Aq , Gr ) ∧ dol#has-quality (Gr , CoDQ) ∧ dol#has-quality (Gr , GrSQ))) ∧ dol#q-location (CoDQ, UnCR) ∧ (dol#q-location (GrSQ, FgR ) ∨ dol#q-location (GrSQ, PhR ))

→ ∃ AqMQ, AqMR8 • [drok#AquiferMediaQuality (AqMQ ) ∧ dol#has-quality (Aq ,AqMQ) ∧ drok#AquiferMediaRegion8 (AqMR8) ∧ dol#q-location (AqM,AqMR8)]]

Figure 6. Axiom A3*. Representing Axiom 3 using Dolce properties to connect Sweet and GeoSciML classes.

76 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

only. In other words, they involve only the quality-bearing entity, such as a geologic body having qualities for size, color, and mass. However, it’s not fully evident how to specify the qualities of the relation between multiple entities (for example, proportion, distance, and direction), particularly for those relations that are represented as prop-erties in the ontology, such as dol#part-of (gsml#GeologicUnit x is 10% dol#part-of swe#Aquifer y) or dol#generic-constituent-of (gsml#Shale z is 10% dol# genericconstituent- of swe# Aquifer y). Our solution is inspired by an alternative notion of role. In Dolce, roles are contextual, sociocognitive artifacts that are temporarily played by endurants in a situation, but roles can alternatively be con-sidered functions performed by participants in a relation.6 We introduce a Dolce role for this alternative notion that includes abstract qualities (for example, proportion). Then, in unidirectional binary relations such as OWL properties, the range class plays the role. For example, to represent the notion that a gsml#GeologicUnit is proportionally dol#part-of an swe#Aquifer, the ab-stract gsml#proportion quality is added to a gsml#GeologicUnitPart role, which is played by gsml#GeologicUnit (see Figure 7).

A related issue considers how qualities can be related to perdurants. For example, how can we specify that a certain quality value is a product of its host endurant’s par-ticipation in a process or event? It’s scien-tifically plausible that processes or events might alter the value of some of an endur-ant’s qualities, such as compression chang-ing the shape of a rock body but not its color or materials. So, to ensure accurate capture of the effects of scientific processes, it’s valuable to represent the relation between a quality and perdurant. However, physical

qualities and perdurants aren’t directly re-lated in Dolce.

Dolce quality spaces and regions. A qual-ity’s value is located in a quality space. Quality spaces are constituted by quality di-mensions and their relations, which together structure the quality space. In turn, a qual-ity space can be partitioned into specific regions, and these regions express value ranges. For example, the Munsell color space consists of hue, saturation, and bright-ness dimensions, which are related to form a solid that’s partitioned into regions denot-ing colors such as red. However, the struc-ture of quality spaces is also not directly represented in Dolce—classes are provided for qualities and their spaces and regions, but not for dimensions or their relations. Related to this is the inability to contain as-pects of measurement for such spaces, for example units of measure. Semantic refer-ence spaces are introduced elsewhere to ad-dress this issue, but we have not yet imple-mented them.7

Dolce concepts and scientific classifica-tions. When is it appropriate to represent the elements of a scientific classification, such a rock material, as a physical entity, and when is it appropriate to represent it as part of a theory? In terms of ontology design, this amounts to a choice between subsump-tion—for example, ⊂ (gsml#Gneiss, gsml#RockMaterial)—and classifi- cation—for example, dol#classifies (gsml#Gneiss, gsml#RockMaterial). The latter is clearly preferable when the on-tology must represent multiple theories for rock classification, because it would sup-port the coexistence of multiple definitions of rock materials such as gsml#Gneiss. However, in doing so, it would increase complexity in the ontology because of a dual representation: Dolce’s Descriptions and

Situations extension8 distinguishes between the attributes of a rock material and those of a theory for a rock material. The latter are social artifacts (descriptions and concepts), and the former aren’t. For example, rock materials are created by a certain geologic process and have a grain-size quality, while a specific theory that classifies rock materi-als uses concepts for geologic processes and grain size in the defining description of the rock materials. This scenario isn’t limited to rock materials; it extends to any entity that’s subject to classification, including physical objects, processes, and qualities. Further work is required to test the adequacy of this dual representation and to determine its us-ability, particularly in integration situations where the differences could prove criti-cal. This might be the case, for example, in mapping a subclass of a physical entity in one ontology to a concept that classifies the entity in another ontology, thus signifi-cantly shifting the meaning of the original subclass.

Scientific prototypes and classifications. Another issue related to scientific classifi-cation involves prototypes. Scientific clas-sifications often denote prototypical condi-tions rather than necessary and/or sufficient conditions, implying instances can vary from prototypical conditions. But how can prototypicality be incorporated into foun-dational ontologies? Various approaches to prototype representation include a defini-tion containing a modality, probability, or spatiality, or alternatively a relation to a rep-resentative exemplar. The spatial approach is explicitly available to Dolce, at least for prototypes defined by qualities, in that a prototype can be denoted by a narrow qual-ity region within a larger region that en-compasses the full space of quality values. However, not all scientific classifications are defined by qualities—for example, rock

⊂ (gsml#GeologicUnit, dol#physical-body)dol#part (gsml#GeologicUnit, gsml#GeologicUnit)dol#plays (gsml#GeologicUnit, gsml#GeologicUnitPart)

⊂ (swe#Aquifer, gsml#GeologicUnit)

⊂ (gsml#GeologicUnitPart, dol#role)gsml#has-quality (gsml#GeologicUnitPart, gsml#proportion)gsml#part-of (gsml#GeologicUnitPart, gsml#GeologicUnit)

Figure 7. Representing qualities of properties. In Dolce Rocks, these are represented as qualities of a role played by the range of the property.

JaNuarY/FEbruarY 2009 www.computer.org/intelligent 77

material classifications such as Gneiss are partially defined by causal processes such as metamorphic. Dolce also makes avail-able a property pointing to a representative exemplar (dol#prototype), but this does not prescriptively guide how an instance can vary from the prototype.

Our project suggests that foundational ontologies can be useful not only to

geoscience ontology integration but also to cross-disciplinary e-Science. However, by introducing semantic incompatibilities, the alignment approach to ontology integration can lead to serious problems, decreasing coherency and increasing structural com-plexity. This might be acceptable when the integration is carried out for a narrow pur-pose aimed primarily at connectivity, or when the integrated ontology is temporary and does not necessarily need to persist be-yond a specific operation in an e-Science environment. For longer-term purposes, merger is likely a better strategy for cross- disciplinary integration because it ensures semantic consistency, but potentially at the cost of reusability. Apart from illuminating various alignment issues specific to the do-main ontologies, this work also identifies some general issues related to the representa-tion of scientific knowledge with Dolce, and perhaps with other foundational ontologies.

Key questions remain regarding the rep-resentation of theoretic entities such as sci-entific classifications as well as the repre-sentation of scientific prototypes. Additional challenges that follow from these issues, and which are related to foundational ontologies and e-Science, include the representation of scientific discovery, change, and conflict, to enable evaluation and replication of scien-tific results. For those results to be under-stood and trusted, e-Science environments must represent not only the artifacts dis-covered by science (ontology) but also the process by which they are discovered (epis-temology). This challenge is fundamental and most likely treated optimally in founda-tional ontologies for science, which are just beginning to emerge.9

AcknowledgmentsThis work has been supported in part by the Geological Survey of Canada and the Univer-sity of Münster. We thank the three anonymous reviewers who provided useful, insightful com-

ments that led to significant improvements in the manuscript.

References 1. A. Gangemi et al., “Sweetening WordNet

with Dolce,” AI Magazine, vol. 24, no. 3, 2003, pp. 13–24.

2. C. Masolo et al., WonderWeb Deliverable D18, Ontology Library (final), ISTC-CNR Laboratory for Applied Ontology, 2003; www.loa-cnr.it/Papers/D18.pdf.

3. M. Sen and T. Duffy, “GeoSciML: Devel-opment of a Generic Geoscience Markup Language,” Computers and Geosciences, vol. 31, no. 9, 2005, pp. 1095–1103.

4. R.G. Raskin and M.J. Pan, “Knowledge Representation in the Semantic Web for Earth and Environmental Terminology (Sweet),” Computers and Geosciences, vol. 31, no. 9, 2005, pp. 1119–1125.

5. L. Aller et al., Drastic: A Standardized System for Evaluating Ground Water Pol-lution Potential Using Hydrogeologic Set-tings, tech. report EPA/600/2-87/035, US Environmental Protection Agency, 1987, p. 641.

6. C. Masolo et al., “Social Roles and Their Descriptions,” Proc. 9th Int’l Conf. Prin-ciples of Knowledge Representation and Reasoning (KR 04), AAAI Press, 2004, pp. 267–277.

7. F. Probst, Semantic Reference Systems for Observations and Measurements, doctoral dissertation, Univ. of Münster, 2007.

8. A. Gangemi, C. Catenacci, and M. Battag-lia, “Inflammation Ontology Design Pat- tern: An Exercise in Building a Core Bio-medical Ontology with Descriptions and Situations,” Ontologies in Medicine, D.M. Pisanelli, ed., IOS Press, 2004, pp. 64–80.

9. B. Brodaric, F. Reitsma, and Y. Qiang, “SKIing with Dolce: Toward E-Science Knowledge Infrastructure,” Proc. 5th Int’l Conf. Formal Ontology in Information Systems (FOIS 08), IOS Press, 2008.

For more information on this or any other com-puting topic, please visit our Digital Library at www.computer.org/csdl.

t h e a u t h o r Sboyan brodaric is a research scientist at the Geological Survey of Canada, where he works on geoscience information interoperability, geoscience ontologies, e-Science, and environmental risk applications. Brodaric received his PhD in geographical information science, with a focus on geospatial semantics, from Pennsylvania State University. He’s deputy editor of the interna-tional journal Computers and Geosciences. Contact him at [email protected].

Florian Probst is a research scientist at SAP Research, where he’s developing methods for im-proved Web-based communication. His research interests include formal ontology, geospatial ontology engineering, and semantic Web technologies. Florian received his PhD in geoinformat-ics, with a focus on geospatial ontology engineering, from the University of Münster. Contact him at [email protected].

IEEE Intelligent Systems deliversthe latest peer-reviewed research onall aspects of artificial intelligence,

focusing on practical, fielded applications.Contributors include leading experts in

• Intelligent Agents • The Semantic Web • Natural Language Processing• Robotics • Machine Learning

IEEE

Visit us on the Web atwww.computer.org/intelligent

THE #1 ARTIFICIALINTELLIGENCEMAGAZINE!