Comparing categories among geographic ontologies
Transcript of Comparing categories among geographic ontologies
ARTICLE IN PRESS
0098-3004/$ - se
doi:10.1016/j.ca
�CorrespondE-mail addr
Computers & Geosciences 31 (2005) 145–154
www.elsevier.com/locate/cageo
Comparing categories among geographic ontologies
Marinos Kavouras�, Margarita Kokla, Eleni Tomai
Cartography Laboratory, School of Rural and Surveying Engineering, National Technical University of Athens,
Zografos Campus, Athens 15780, Greece
Received 16 June 2004; received in revised form 27 July 2004; accepted 27 July 2004
Abstract
Numerous attempts have been made to generate semantic ‘‘mappings’’ between different ontologies, or create
aligned/integrated ones. An essential step towards their success is the ability to compare the categories involved. This
paper introduces a systematic methodology for comparing categories met in geographic ontologies. The methodology
explores/extracts semantic information provided by categories’ definitions. The first step towards this goal is the
recognition of syntactic and lexical patterns in definitions, which help to identify (a) semantic properties such as
purpose, location, cover, and (b) semantic relations such as hypernym, part of, has-parts, etc. At the second step, a
similarity measure among categories is applied, in order to explore how (the) extracted properties and relations
interrelate. This framework enables us to (a) better understand the impact of context in cross-ontology ‘‘mappings’’, (b)
evaluate the ‘‘quality’’ of definitions as to whether they respect mere ontological aspects (such as unambiguous
taxonomies), and (c) deal more effectively with the problem of semantic translation among geographic ontologies.
r 2004 Elsevier Ltd. All rights reserved.
Keywords: Geographic ontologies; Semantic properties; Semantic relations; Similarity
1. Introduction
A close inspection of existent geographic categoriza-
tions or geographic data exchange standards shows that
although they often refer to apparently similar cate-
gories, they use different semantics due to different
contexts. This ‘‘Babel Tower’’ makes the association
process and the establishment of an aligned or
integrated ontology (Sowa, 2000) very problematic.
There have been numerous attempts to deal with the
problem of ontology integration and semantic inter-
operability (Wache et al., 2001; Vckovski et al., 1999;
Uitermark, 2001; Kokla and Kavouras, 2001). In this
endeavor, it is essential to understand, reveal and resolve
existing heterogeneities. In earlier attempts, similarities
e front matter r 2004 Elsevier Ltd. All rights reserve
geo.2004.07.010
ing author. Fax: +30210 7722634.
ess: [email protected] (M. Kavouras).
and heterogeneities between geographic categories were
identified on the basis of their common attributes
(Kavouras and Kokla, 2002). Such attributes were
defined based on experts’ knowledge on the features
involved. This approach however involved a great
degree of subjectivity, which may contradict the inten-
tions of the original designer. Since definitions have been
recognized as an important source of semantics, it was
decided to exploit their potential. Besides their semantic
value (Jensen and Binot, 1987; Klavans et al., 1993;
Swartz, 1997), definitions are rich in different kinds of
knowledge such as lexical, world, encyclopedic and
semantic (see SIGLEX workshop, in Barriere, 1997). In
addition, they are often the only objective available
source we can rely on, especially in existing geographic
data collections. Furthermore, any findings concerning
the semantic completeness of definitions in describing
categories would greatly help to form better definitions
d.
ARTICLE IN PRESSM. Kavouras et al. / Computers & Geosciences 31 (2005) 145–154146
in new classifications as described by Tomai and
Kavouras (2004). The purpose of the present research is
to identify semantic information from definitions and to
enrich the representation of categories with semantic
properties and relations, such as those reported by Kokla
and Kavouras (2002), in order to disambiguate geo-
graphic categories. The ability to represent and visualize
the degree of semantic similarity with concept mapping
tools (Skupin, 2002; Tomai and Kavouras, 2002) can
greatly facilitate the entire process. For tackling these
semantic heterogeneities, we explore similarities/dissim-
ilarities of two well-known geographic ontologies, i.e.,
CORINE LC, MEGRIN, and one lexical ontology—
WordNet, which includes geographic categories.
Another aspect of the research is the representation of
semantic similarity, in order to identify semantic
heterogeneities, and therefore facilitate interoperability.
In the domain of geographic information science, few
approaches have attempted to model similarity. As far
as similarity of geographic entity classes is concerned,
Rodrıguez et al. (1999) and Rodrıguez and Egenhofer
(2003) have proposed a computational method for
assessing similarity between two ontologies by a
similarity function that compares distinguishing features
of the entities involved, such as parts, functions and
attributes. Our approach to establish semantic similarity
among categories from different geographic ontologies
exclusively uses semantic information, which can be
derived from categories’ definitions.
In summary, this paper compares category definitions,
determines heterogeneities, portrays semantic similarity,
and overall prepares the ground for the integration
process. In the rest of the paper, Section 2 presents the
characteristics of the ontologies employed. Semantic
relations and properties are described in Section 3.
Section 4 is concerned with the computation and
visualization of semantic similarity. The results of this
work are analyzed in Section 5, and finally some
conclusions are drawn in Section 6.
2. Case-study ontologies
As mentioned above, several problems of association
and integration are encountered when trying to compare
categories from distinct repositories of geographic
information. In this research, we identified semantic
relations–properties from the following categorizations:
�
M
CO
2MEGRIN’s PETIT project. http://www.eurogeographic-
CORINE LC1 is a categorization intended to provide
consistent localized geographical information on land
cover of the member states of the European Com-
1European Environmental Agency: CORINE Land Cover
ethodology and Nomenclature. http://reports.eea.eu.int/
R0-part1/en, http://reports.eea.eu.int/COR0-part2/en.
s.o
La
htt
munity, by using satellite data. CORINE Land Cover
has a three-level hierarchy of categories. The upper
level consists of 5 categories, the middle level of 15,
and the lowest one of 44 categories.
�
GDDD-Geographical Data Description Directory(MEGRIN’s GDDD)2 contains information on
available digital geographic information from Eur-
ope’s National Mapping Agencies (NMAs). Layer
names, feature type names, and feature attribute
types names correspond to the nomenclature used in
the DIGEST Feature and Attribute Coding Catalo-
gue (FACC).
�
WordNet3 is a lexical database for the Englishlanguage, whose design was based on current
psycholinguistic theories.
At the context level, the first two categorizations are
considered as domain ontologies, while WordNet is a
general lexical ontology. At the level of formality, all
three categorizations can be considered as terminological
ontologies (Sowa, 2000), since they contain categories
specified by definitions expressed in natural language,
associated by subtype/supertype relations. Furthermore,
the first two can be characterized as light ontologies or
taxonomies since they establish classifications.
Categories at the lowest (and more detailed) hier-
archical level were examined for CORINE LC. In
addition, for the sake of simplicity and clarity, the study
was restricted only to a small, yet representative set of
categories from the three ontologies, properly selected to
account for a range of heterogeneities encountered
between geographic categorizations. Therefore, the
categories selected for the experiment were:
�
CORINE LC’s categories 4 (wetlands) and 5 (waterbodies),
�
MEGRIN’s category hydrography,�
WordNet’s definitions for the related category terms.Therefore, we ended up with definitions of 29
‘‘category types’’ (Table 1). The term category type
refers to categories that are found in different ontologies
under the same term (name of the category) but exhibit
differences in their definitions or the contexts under
which they are used.
3. Determination of semantic relations and properties
The rationale behind this research is to determine
semantic information from definitions and to enrich the
rg/megrin/PROJECTS/PETIT/Prototyp_desc.html.3WORDNET 1.7.1—a Lexical Database for the English
nguage, Cognitive Science Laboratory, Princeton University.
p://www.cogsci.princeton.edu/�wn/.
ARTICLE IN PRESS
Table 1
Category types used in our approach
Ontology Category type
CORINE land cover Peat bog
Water course
Water body
Salt marsh
Saline
Intertidal flat
Coastal lagoon
Estuary
Sea and ocean
Inland marsh
MEGRIN Bog
Canal
Lake/pond
Salt marsh
Salt pan
Watercourse
WordNet Body of water
Bog
Canal
Lake
Pond
Salt pan
Watercourse
Watercourse
Marsh
Estuary
Sea
Ocean
Lagoon
M. Kavouras et al. / Computers & Geosciences 31 (2005) 145–154 147
representation of categories with semantic properties
and relations in order to reveal similarities and hetero-
geneities. The field of natural language processing
develops methodologies for automatic extraction of
semantic information from definitions. According to
Jensen and Binot (1987), definitions include a wealth of
knowledge expressed in natural language, which can be
analyzed by natural language processing systems.
Definitions are a kind of text with special structure
and content. They are rich sources of scientific knowl-
edge of a domain. In geographic ontologies, definitions
are the primary and usually the only descriptions of
category terms, since other elements that could con-
tribute to the semantic definition of geographic cate-
gories (e.g., properties, functions, axioms) are either
missing or superficially described. Research on defini-
tions is seeking ways to exploit the wealth of informa-
tion latent in this special kind of text.
Definitions of geographic categorizations are usually
comprised of two parts: the genus and the differentiae.
The genus or hypernym is the superordinate term of the
defined category term. For example, in the definition:
‘‘hotel: a building where travelers can pay for lodging
and meals and other services’’, ‘‘building’’ is the genus of
category ‘‘hotel’’.
The differentiae are other elements of the definition
apart from the genus, which differentiate words with the
same genus. Thus, in the definition: ‘‘skyscraper: a very
tall building with many storeys’’, ‘‘skyscraper’’ has the
same genus (i.e., ‘‘building’’) with ‘‘hotel’’, but they are
distinguished by the differentiae (e.g., ‘‘where travelers
can pay for lodging and meals and other services’’ and
‘‘tall’’, ‘‘with many storeys’’).
The methodology adopted here, for analyzing defini-
tions and extracting semantic information, was intro-
duced by Jensen and Binot (1987) and further pursued
by Ravin (1993) and Vanderwende (1995). This
approach consists in the syntactic analysis of definitions
and the application of rules, which examine the existence
of certain syntactic and lexical patterns. Patterns take
advantage of specific elements of definitions, in order to
identify a set of semantic properties–relations and their
values based on syntactic analysis. Patterns applied in
the genus part of the definition extract the hypernym or
‘‘is-a’’ relation. Patterns applied in the differentiae part
extract other semantic relations such as ‘‘is-part-of’’,
‘‘has-parts’’, ‘‘adjacent-to’’, etc., as well as semantic
properties such as purpose, location, time, size, etc.
Therefore, it was necessary to specify semantic
relations and properties used in geographic definitions.
For that reason, different geographic ontologies, stan-
dards and categorizations (e.g., CYC Upper Level
Ontology, WordNet, CORINE Land Cover, DIGEST,
SDTS, etc.) were analyzed in order to identify patterns,
which are systematically used to express specific
semantic relations and properties. The most commonly
used are shown in Table 2. Besides general semantic
elements (e.g., PURPOSE, CAUSE, TIME, etc.), other
context-specific semantic elements were also identified.
For example, categories relative to hydrography are
described by semantic elements, such as nature (natural
or artificial) and flow (flowing or stagnant).
The pattern for the extraction of the semantic relation
PURPOSE (Vanderwende, 1995) is:
If the verb used (created, intended, prepared,
provided, etc.) is post modified by a prepositional
phrase with the preposition for, then create a
PURPOSE relation with the head(s) of that preposi-
tional phrase as the value.
For example, a PURPOSE property is extracted from
the definition: ‘‘canal: a manmade or improved natural
waterway used for transportation’’ (MEGRIN), with
value ‘‘transportation’’.
The methodology for extracting semantic information
is used to decompose definitions of geographic cate-
gories into a set of semantic properties—relations and
ARTICLE IN PRESSM. Kavouras et al. / Computers & Geosciences 31 (2005) 145–154148
their corresponding values. This formalized semantic
information is further used to disambiguate similar
categories by explicitly and objectively identifying
similarities and heterogeneities between them.
More specifically, if the methodology for extracting
semantic information is used for analyzing category
‘‘lake’’ as defined by MEGRIN: ‘‘lake/pond: a body of
water surrounded by land’’, the following semantic
properties and relations are determined: HYPERNYM
with value ‘‘body’’, MATERIAL with value ‘‘water’’
and SURROUNDED-BY with value ‘‘land’’. Respec-
tively, from the analysis of same category type as defined
by WordNet: ‘‘lake: a body of (usually fresh) water
surrounded by land’’, the same semantic properties—
relations and values—are determined. Therefore, it is
evident that the two ontologies equivalently define the
category ‘‘lake’’ (Table 3).
If, however, the above methodology is used for the
analysis of category ‘‘ditch’’ as defined by the same
ontologies (MEGRIN and WordNet), the resulting
semantic properties–relations and values reveal hetero-
Table 3
Determination of semantic information for category ‘‘lake’’
HYPERNYM
Lake (MEGRIN) Body
Lake (WordNet) Body
Table 2
Examples of semantic properties and relations
Semantic properties
Purpose
Cause
Location
Time
Material-cover
Size
Semantic relations
Is-a
Is-part-of
Has-part
Adjacent-to
Surrounded-by
Associated-with
Table 4
Determination of semantic information for category ‘‘ditch’’
HYPERNYM PURP
Ditch (MEGRIN) Channel Irriga
Ditch (WordNet) Waterway
geneities between the definitions of the homonymous
categories (Table 4).
Table 5 shows the complete set of semantic informa-
tion (properties and values) that can be identified in the
definitions of the 29 categories from the three different
ontologies.
3.1. Findings
�
M
W
W
O
tio
The presence of hypernyms in definitions may express
the ‘‘is-a’’ relation, but the values of hypernyms in
definitions differ significantly. A sample of 29
categories from two geospatial ontologies and one
lexical database, which all refer to waterbodies and
watercourses and coincide in naming (11 naming
terms for the 29 categories), present 19 distinct
hypernyms. Furthermore, as far as the hypernymic
relation is concerned, we can state the following:
o CORINE’s 10 categories that belong to 4 cate-
gories of the intermediate level, which further up
belong to 2 categories of the superordinate level,
have 9 distinct hypernyms in their definitions.
Definitions do not properly address the taxonomic
structure of the hierarchy, i.e., genera of category
terms do not necessarily coincide with their super-
ordinate category terms. Suggestively, we pinpoint
two cases of inconsistency.
(a) Definitions are circular (water courses are water
coursesy).
(b) The use of distinct terms, which could refer to
the same hypernyms, for instance, the terms
area stretch, zone, expanse, etc.
o MEGRIN’s 6 category definitions also have 5
distinct hypernyms; one definition is circular (water
course is a coursey).
o In WordNet this kind of inconsistency is absent
(the hypernymic relation is correctly addressed in
the definitions).
o CORINE’s hypernyms do not match those of
WordNet at all. All water bodies (such as lagoon,
AT
at
at
SE
n o
ER
er
er (u
r dr
IAL SURROUNDED BY
Land
sually fresh) Land
SIZE NATURE
ainage
Small Natural
ARTIC
LEIN
PRES
STable 5
Properties and values of categories as identified in their definitions in three ontologies
Categories Semantic information (properties and VALUES)
Hypernym Nature Use/purpose Material-cover Is part of Form
morphology
Size Location Surrounded
by
Condition-state
(attribution)
CORINE LC
Inland marsh Land Low-lying Flooded
(TIME: in winter)
Saturated
(MATERIAL-CAUSE: water
TIME: all year round)
Peat bog (Peat) land Decomposed moss
and vegetable
matter
Salt marsh Area Vegetation Low-lying, above
the high-tide line
Susceptible to flooding
(MATERIAL-CAUSE:
sea water)
Saline Salt-pan (Salt) Active or in process
of abandonment
Intertidal flat Expanse Mud, sand
or rock
Between high and low
water marks
Generally
unvegetated
Water course Water
course
Natural or
artificial
Water
drainage
channel
(Water)
Water body Stretch Natural or
artificial
Water
Coastal
lagoon
Stretch Salt or
brackish water
Coastal areas
Estuary Sea
and ocean
Mouth
Zones
River Seaward of the
lowest tide limit
MEGRIN
Bog Area Soil rich in
plant residue
Poorly drained
periodically flooded
Canal Waterway Manmade
or improved
natural
Transportation (Water)
Lake/pond Body Water Land
Salt marsh Depression Natural Salt encrusted
clayey soil
In arid/semi-arid
regions
Salt pan Area Natural surface
salt deposits
Flat
Watercourse Course Natural (Water) Flowing
M.
Ka
vou
ras
eta
l./
Co
mp
uters
&G
eoscien
ces3
1(
20
05
)1
45
–1
54
149
ARTIC
LEIN
PRES
STable 5. (continued )
Categories Semantic information (properties and VALUES)
Hypernym Nature Use/purpose Material-cover Is part of Form
morphology
Size Location Surrounded
by
Condition-state
(attribution)
WordNet
Body of
water
Part Water Earth’s
surface
Bog Ground Decomposing
vegetation
Wet spongy
Canal Strip Boats or
irrigation
Water Long
and narrow
Lake Body (Usually fresh)
water
Land
Pond Lake Small
Salt pan Basin Salt and
gypsum
Shallow In a desert region
Watercourse Channel Natural or
artificial
Watercourse Body Natural Running water On or under
the earth
Marsh Land Grassy
vegetation
Low-lying
Lagoon Body Water Cut-off from land
(ACTOR) a reef
of sand or coral
Estuary Part Fresh or
salt water
River Wide Near the sea
Sea Division Salt water Ocean Large (Partially) enclosed
ACTOR: land
Ocean Body Water Hydrosphere Large
M.
Ka
vou
ras
eta
l./
Co
mp
uters
&G
eoscien
ces3
1(
20
05
)1
45
–1
54
150
ARTICLE IN PRESS
Ta
Sim
1
2
M. Kavouras et al. / Computers & Geosciences 31 (2005) 145–154 151
estuary, sea and ocean, water body, watercourse of
category 5) in CORINE LC are defined using terms
that refer to two-dimensional hypernyms while in
WordNet they are defined using as hypernym the
term body that refers to three-dimensional physical
objects. This distinction indicates that CORINE
LC is taking a map view because it classifies land
cover, not geographic entities.
ble
il
�
(a) CORINE LC is a land cover ontology, subse-quently semantic property ‘‘material-cover’’ is present
in most definitions of its categories; therefore defini-
tions in existent geospatial ontologies (esp. task or
domain ontologies) are context driven. (b) The same
semantic property, however, is also present in the
remaining ontologies (only two of the category
definitions do not contain lexical information for
that semantic relation).
�
Semantic property ‘‘nature’’ (artificial/manmade) isaddressed in only 7 definitions out of the total 29.
�
Semantic property ‘‘purpose’’ is present only in 3 outof 29 definitions. This is because natural entities do
not have purposes in contrast to artificial ones.
�
Semantic properties such as ‘‘size’’ and ‘‘form/morphology’’ are not adequately included in defini-
tions either: 3/29 and 4/29, respectively. This is very
low, considering that geospatial categories are ex-
pected to significantly possess properties about size
and morphology.
�
The importance of the meronymic semantic relation‘‘has-part’’, or ‘‘is part-of’’ is not widely addressed in
definitions; only 5 of the total 29 definitions present
such information, 4 of which belong to WordNet (out
of a total of 13). Semantic properties ‘‘location’’ and
‘‘surrounded by’’ (both of them denoting topology)
are met in 12 definitions only, which again seems low.
Both realizations are contrary to what is generally
expected about the presence of mereotopologic
relations in geographic ontologies (Casati et al.,
1998).
�
Semantic property ‘‘time’’ is also absent to a wideextent from definitions.
4. Determination and visualization of semantic similarity
In order to determine the similarity between two
categories, we take into account the values of the
6
arity for categories: lake and (peat) bog based on Table 5
Categories Similarity S
Lake (MEGRIN) S1;2 ¼ 1:000Lake (WordNet)
properties/relations they possess. If the values of a given
semantic property or relation coincide, then the two
category types are similar in terms of that property/
relation. If the values of a property/relation are distinct,
then similarity between the two categories is equal to
zero.
The similarity measure S between two categories a, b
is set by the ratio model (based on Tversky’s similarity
measure):
Sða; bÞ ¼C
A þ B þ C;
where C is the number of properties/relations which
categories a and b share, but also exhibit common values
for, A is the number of properties/relations of category a
but not of b, and B is the number of properties of
category b but not of a (examples can be found in
Table 6). As it can be understood, the ratio is bounded
between 0 and 1, the former denoting complete
dissimilarity, and the latter, coincidence of entities.
In special cases, to assess the similarity appropriately;
�
Compound nouns, such as ‘‘peatland’’ (does not existin WordNet)—the hypernym of peatbog (Table 6),
were identified and used as adjective+noun (peat
land) and not as a compound word, so the hypernym
was taken to be ‘‘land’’ instead.
�
Similar terms were grouped to diminish the range ofvalues of certain properties. Consider, for instance,
the values ‘‘water, fresh water, brackish water, salt
water’’ of property ‘‘material-cover’’ for canal, lake,
and coastal lagoon, respectively. When establishing
similarity, the value for these categories was taken as
‘‘water’’.
In order to visualize the different ontologies, we use
multi-dimensional scaling (MDS) (Kruskal and Wish,
1978). The method uses a similarity/dissimilarity matrix
to project the data into the projection space, which in
our case is a two-dimensional space. MDS is a
dimensionality reduction method that represents multi-
dimensional data sets by using a stress function;
therefore, distances among data reflect the correspond-
ing (dis) similarities. The value of the stress function is
an indicator of the goodness-of-fit of the result. The
higher its value, the more the distortion imposed on the
visualization of the entities; therefore, distances are
Categories Similarity S
Peatbog (CORINE LC) S1;2 ¼ 0:333Bog (WordNet)
ARTICLE IN PRESS
Bog
Canal
Lake/Pond
Salt marsh
Salt pan
Watercourse
MEGRIN
Sea
WordNet
Body of water
Bog
CanalLake
Pond
Salt pan
Watercourse
Watercourse
Marsh
Lagoon
Estuary
Ocean
Fig. 1. Visualization output for three ontologies.
M. Kavouras et al. / Computers & Geosciences 31 (2005) 145–154152
greater than the corresponding dissimilarities. The
output is a scatter plot of the data where similar entities
are close in the representation space while dissimilar
ones are far away. The visualization result is shown in
Fig. 1.
5. Interpretation of results
As mentioned before, the output of the MDS is the set
of coordinates for the examined caterory types. Subse-
quently, a clustering method is used to form groups of
categories that enjoy common properties–relations and
values. We are then able to explore whether differentia-
tions in naming denote the same category, while
sameness in naming but differentiation of the categories’
definitions denote distinct ones. In the current approach,
we used a hierarchical clustering method to examine
which way the three distinct ontologies contribute in the
formation of common upper-categories in a unified
schema (Fig. 2).
The analysis of properties/relations and their values
(in the findings of Section 3) indicated whether and
which ontological assertions (such as unambiguous
taxonomic structure) can be derived from definitions.
As a result, the following guiding principles for the
definitions of categories in geospatial ontologies could
be useful:
�
Basic ontological semantic relations (meronymy,hypernymy, hyponymy) should be present in defini-
tions due to their expressiveness and rich semantics.
�
Category definitions in ontologies shouldaddress the taxonomic structure of the categorization
correctly. Any inconsistency between the
definition’s hypernym and the superordinate category
term itself (when the categorization is hierarchical)
presents a misconstruction in representing the
hierarchy.
�
Definitions should account for the so-called specialfeatures of geospatial categories such as morphology,
location/topology, which, according to the previous
analysis, does not seem to be the case.
Domain and task ontologies have context-driven
definitions, which is not a drawback. These
definitions however, should not contradict general
knowledge of the given categories and should
reflect to some extent the way they are construed by
humans, otherwise they are superficial and not widely
accepted.
6. Conclusions and further work
The research presented focuses on the determination
of semantic information from definitions of geographic
categories in order to identify and formalize similarities
and heterogeneities. Visualization of semantic similarity
proves to be a very useful tool for the association of
similar categories. Portraying similarities/dissimilarities
in a projection space gives us a concrete measure of the
heterogeneity of distinct ontologies. We can then draw
inferences
ARTICLE IN PRESS
Fig. 2. Resulting clusters showing heterogeneities among same category types (terms) in different ontologies.
M. Kavouras et al. / Computers & Geosciences 31 (2005) 145–154 153
�
as to what extent different ontologies can beintegrated,
�
about the associations between category types,�
concerning the comparison of categories’ definitionsin a cross-ontological examination.
The purpose of the present work was to demonstrate
the difficulty in dealing with category semantics. It also
presents an alternative to customary approaches, which
manually determine similarities and heterogeneities
between category types, mainly based on similarity
between category terms. However, similarity in terms
does not necessarily imply equivalent category types.
Besides superficially dealing with categories, such
approaches usually result in misapprehending the
intentions of the original designers. On the contrary,
the present work formalizes semantic information
immanent in definitions of category types. Definitions
are usually the basic available and semantically rich
feature of geographic data collections and they reflect
the intentions of original designers. The result of
semantic similarity determination and visualization can
be used as a pre-processing step to semantic integration
of geographic categorizations.
Finally, it should be realized that the approach was
not intended to produce ‘‘perfect’’ results. Emphasis was
put on objectivity and automation (avoiding ad hoc
manual procedures and subjective experts’ knowledge).
Furthermore, any consequential ‘‘imperfect’’ results
have a value of their own, for they reveal (and provide
an opportunity to fix) imperfections of the original
taxonomy definitions, as well as help engineer better
ontologies in the future.
Acknowledgments
This work has been partially supported by the
Heraclitus Research Programme 2.2.3.b of the Hellenic
Ministry of National Education. The authors are also
indebted to the anonymous reviewers for their very
constructive comments.
References
Barriere, C., 1997. From a children’s first dictionary to a lexical
knowledge base of conceptual graphs. Ph.D. Dissertation.
Simon Fraser University, Vancouver, BC, Canada, 339 pp.
Casati, R., Smith, B., Varzi, A., 1998. Ontological tools for
geographic representation. In: Guarino, N. (Ed.), Formal
Ontology in Information Systems. IOS Press, Amsterdam,
pp. 77–85.
Jensen, K., Binot, J.L., 1987. Disambiguating prepositional
phrase attachments by using on-line dictionary definitions.
Computational Linguistics 13 (3/4), 251–260.
Kavouras, M., Kokla, M., 2002. A method for the formaliza-
tion and integration of geographical categorizations. Inter-
national Journal of Geographical Information Science 16
(5), 439–453.
Klavans, J., Chodorow, M., Wacholder, N., 1993. Building a
knowledge base from parsed definitions. In: Jensen, K.,
ARTICLE IN PRESSM. Kavouras et al. / Computers & Geosciences 31 (2005) 145–154154
Heidorn, G., Richardson, S. (Eds.), Natural Language
Processing: The PLNLP Approach. Kluwer Academic
Publishers, Dordrecht, The Netherlands.
Kokla, M., Kavouras, M., 2001. Fusion of top-level and
geographical domain ontologies based on context formation
and complementarity. International Journal of Geographi-
cal Information Science 15 (7), 679–687.
Kokla, M., Kavouras, M., 2002. Extracting latent semantic
relations from definitions to disambiguate geographic
ontologies. In: GIScience 2002 Abstracts, Second Interna-
tional Conference on Geographic Information Science.
Boulder, CO, pp. 87–90.
Kruskal, J.B., Wish, M., 1978. Multidimensional scaling. Sage
University Paper Series on Quantitative Applications in the
Social Sciences, Number 07-011. Sage Publications, New-
bury Park, CA, 96 pp.
Ravin, Y., 1993. Disambiguating and interpreting verb defini-
tions. In: Jensen, K., Heidorn, G.E., Richardson, S.D.
(Eds.), Natural Language Processing: The PLNLP Ap-
proach. Kluwer Academic Publishers, Dordrecht, The
Netherlands, pp. 175–189.
Rodrıguez, A., Egenhofer, M., 2003. Determining semantic
similarity among entity classes from different ontologies.
IEEE Transactions on Knowledge and Data Engineering 12
(2), 442–456.
Rodrıguez, A., Egenhofer, M., Rugg, R., 1999. Assessing
semantic similarities among geospatial feature class defini-
tions. In: Vckovski, A., Brassel, K., Schek, H.-J. (Eds.),
Interoperating Geographic Information Systems, Second
International Conference, INTEROP’99, Zurich, Switzer-
land. Lecture Notes in Computer Science, vol. 1580.
Springer, Berlin, pp. 189–202.
Skupin, A., 2002. A cartographic approach to visualizing
conference abstracts. IEEE Computer Graphics and Appli-
cations 22 (1), 50–58.
Sowa, J.F., 2000. Knowledge Representation: Logical, Philo-
sophical, and Computational Foundations. Brooks Cole
Publishing Co., Pacific Grove, CA 594 pp.
Swartz, N., 1997. Definitions, dictionaries, and meanings,
http://www.sfu.ca/philosophy/swartz/definitions.htm.
Tomai, E., Kavouras, M., 2002. ‘‘Sharpening’’ vagueness:
identifying, measuring, and portraying its impact on
geographic categories. In: GIScience 2002 Abstracts,
Second International Conference on Geographic Informa-
tion Science. Boulder, CO, pp. 189–192.
Tomai, E., Kavouras, M., 2004. From ‘‘onto-geonoesis’’ to
‘‘onto-genesis’’: the design of geographic ontologies.
GeoInformatica 8 (3), 285–302.
Uitermark, H.T., 2001. Ontology-based geographic data set
integration. Ph.D. Dissertation. Deventer, The Netherlands,
139 pp.
Vanderwende, L., 1995. The analysis of noun sequences using
semantic information extracted from on-line dictionaries. Ph.D.
Dissertation. Faculty of the Graduate School of Arts and
Sciences, Georgetown University, Washington, DC, 312 pp.
Vckovski, A., Brassel, K., Schek, H.-J. (Eds.), 1999. Interoperating
geographic information systems. Second International Con-
ference, INTEROP’99, Zurich, Switzerland. Lecture Notes in
Computer Science, vol. 1580. Springer, Berlin, 327 pp.
Wache, H., Vogele, T., Visser, U., Stuckenschmidt, H., Schuster,
G., Neumann, H., Hubner, S., 2001. Ontology-based
integration of information—a survey of existing approaches.
In: Proceedings of IJCAI-01 Workshop: Ontologies and
Information Sharing, Seattle, WA, pp. 108–117.