Using pattern detection techniques and refactoring to improve the performance of ASMOV

6
Using Pattern Detection Techniques and Refactoring to Improve the Performance of ASMOV Bahareh Behkamal Department of computer engineering M.S. Student of Payam Noor University Tehran, Iran [email protected] Mahmoud Naghibzadeh Department of computer engineering Ferdowsi University Mashhad, Iran [email protected] Reza Askari Moghadam Department of computer engineering Payam Noor University Tehran, Iran [email protected] AbstractOne of the most important challenges in semantic Web is ontology matching. Ontology matching is a technology that enables semantic interoperability between structurally and semantically heterogeneous resources on the Web. Despite serious research efforts on ontology matching, matchers still suffers from severe problems with respect to the quality of matching results. Furthermore, Most of them take a lot of time for finding the correspondences. The aim of this paper is improving ontology matching results by adding the preprocessing phase for analyzing the input ontologies. This phase is added in order to solve problems caused by ontology diversity. We select one of the best matchers of Ontology Alignment Evaluation Initiative (OAEI) which is Automated Semantic Matching of Ontologies with Verification, called ASMOV. In preprocessing phase, some new patterns of ontologies are detected and then refactoring operations are used for reaching assimilated ontologies. Afterward, we applied ASMOV for testing our approach on both the original ontologies and their refactored counterparts. Experimental results show that these refactored ontologies are more efficient than the original unrepaired ones with respect to the standard evaluation measures i.e. Precision, Recall, and F-Measure. Keywords-Ontology matching; Pattern detection; Refactoring operation; ASMOV I. INTRODUCTION Semantic Web is a Web technology, which enables semantic interoperability between Web sources and Web users. One approach for modeling the knowledge in semantic Web is Ontology. Ontologies have become a key to semantic interoperability and the main vehicle of the development of the Semantic Web. Ontologies are developed by different developers and different methods, therefore we encounter heterogeneity problems in ontologies. For this reason, we need an approach that overcomes this problem. One approach is applying an ontology matching technique. The purpose of ontology matching is to find identical and synonymous concepts and relationships of ontological content. There are many different methods for doing this. The increasing number of methods available for ontology matching suggests the need to establish a consensus for evaluation of these methods. The Ontology Alignment Evaluation Initiative (OAEI) is a coordinated international initiative to forge this consensus [1]. In this paper, we select one of the best matcher of OAEI2009 which is Automated Semantic Matching of Ontologies with Verification, named ASMOV. The result in [2] shows that ASMOV is one of the most effective matcher among all systems participant in OAEI2009. For this reason, we use this matcher for evaluating the proposed approach. We analyzed input ontologies of matchers in order to detect bad naming patterns and anomalies in content of ontologies and then applying refactoring operations for assimilating entities of ontologies. A. ontology matching In this section we define the preliminary concepts in the ontology matching field which used throughout the paper. We follow the terminology proposed in [3], [4]. Matching Process. A matching process can be seen as a function f which takes two ontologies o and o', a set of parameters p and a set of oracles and resources r, and returns an alignment A between o and o'. 2010 5th International Symposium on Telecommunications (IST'2010) 978-1-4244-8185-9/10/$26.00 ©2010 IEEE 979

Transcript of Using pattern detection techniques and refactoring to improve the performance of ASMOV

Using Pattern Detection Techniques and Refactoring

to Improve the Performance of ASMOV Bahareh Behkamal

Department of computer engineering M.S. Student of Payam Noor University

Tehran, Iran [email protected]

Mahmoud Naghibzadeh Department of computer engineering

Ferdowsi University Mashhad, Iran

[email protected]

Reza Askari Moghadam Department of computer engineering

Payam Noor University Tehran, Iran

[email protected]

Abstract— One of the most important challenges in semantic Web is ontology matching. Ontology matching is a technology that enables semantic interoperability between structurally and semantically heterogeneous resources on the Web. Despite serious research efforts on ontology matching, matchers still suffers from severe problems with respect to the quality of matching results. Furthermore, Most of them take a lot of time for finding the correspondences. The aim of this paper is improving ontology matching results by adding the preprocessing phase for analyzing the input ontologies. This phase is added in order to solve problems caused by ontology diversity. We select one of the best matchers of Ontology Alignment Evaluation Initiative (OAEI) which is Automated Semantic Matching of Ontologies with Verification, called ASMOV. In preprocessing phase, some new patterns of ontologies are detected and then refactoring operations are used for reaching assimilated ontologies. Afterward, we applied ASMOV for testing our approach on both the original ontologies and their refactored counterparts. Experimental results show that these refactored ontologies are more efficient than the original unrepaired ones with respect to the standard evaluation measures i.e. Precision, Recall, and F-Measure.

Keywords-Ontology matching; Pattern detection; Refactoring operation; ASMOV

I. INTRODUCTION

Semantic Web is a Web technology, which enables semantic interoperability between Web sources and Web users. One approach for modeling the knowledge in semantic Web is Ontology. Ontologies have become a key to semantic interoperability and the main vehicle of the development of the

Semantic Web. Ontologies are developed by different developers and different methods, therefore we encounter heterogeneity problems in ontologies. For this reason, we need an approach that overcomes this problem. One approach is applying an ontology matching technique. The purpose of ontology matching is to find identical and synonymous concepts and relationships of ontological content. There are many different methods for doing this. The increasing number of methods available for ontology matching suggests the need to establish a consensus for evaluation of these methods. The Ontology Alignment Evaluation Initiative (OAEI) is a coordinated international initiative to forge this consensus [1]. In this paper, we select one of the best matcher of OAEI2009 which is Automated Semantic Matching of Ontologies with Verification, named ASMOV. The result in [2] shows that ASMOV is one of the most effective matcher among all systems participant in OAEI2009. For this reason, we use this matcher for evaluating the proposed approach. We analyzed input ontologies of matchers in order to detect bad naming patterns and anomalies in content of ontologies and then applying refactoring operations for assimilating entities of ontologies.

A. ontology matching In this section we define the preliminary concepts in the

ontology matching field which used throughout the paper. We follow the terminology proposed in [3], [4].

• Matching Process. A matching process can be seen as a function f which takes two ontologies o and o', a set of parameters p and a set of oracles and resources r, and returns an alignment A between o and o'.

2010 5th International Symposium on Telecommunications (IST'2010)

978-1-4244-8185-9/10/$26.00 ©2010 IEEE 979

• Correspondence. A correspondence between an entity e belonging to ontology o and an entity e' belonging to ontology o' is a 5-tuple < id, e, e', R, conf > where:

• id is a unique identifier of the correspondence;

• e and e' are the entities (e.g. properties, classes, individuals) of o and o' respectively;

• R is a relation such as “equivalence”, “more general”, “less general”, “disjointness”, “overlapping”, holding between the entities e and e'.

• Conf is a confidence measure (typically in the [0, 1] range) between the entities e and e';

• Alignment. An alignment of ontologies o and o' is a set of correspondences between entities of o and o'.

In our experiments, because we choose ASMOV as a matcher, we considered classes as entities and “equivalence”, “more general”, and “less general”, as relations.

II. PROBLEM DEFINITION

Ontology Matching is one of the hottest topics in many fields of Semantic Web research. Matching of ontologies (or schemas) is a critical operation in many application domains, such as Semantic Web, ontology integration, data warehouses, E-commerce, query mediation, etc [5]. All these applications can benefit from ontology alignment; that is set of correspondences between entities of two ontologies. Each correspondence represents a kind of relationship between entities or more generally between whole semantic structures.

Ontology matchers use different methods to find the correspondences. There are many systems for doing so [6], [7], [8], [9], [10], [11], [12], but the quality of delivered alignments is not very high. Higher quality could be achieved by manually designing alignments which is very demanding. We can get better quality of matchers by using refactored ontologies instead of original unrepaired ontologies. For doing this, we have a profound survey on naming dedicated for entities and taxonomy used in ontologies. Taxonomy is a classification of entities. It usually hierarchical. Do two things: give exact names for everything we are dealing with (i.e. domain) and show which things are parts of other things (sometimes called parent-child relationships and sometimes broader-narrower). n.

Different naming conventions and different taxonomic structure in defining the entities of ontologis leads to several problems for matchers. Some problems can be overcome by using the ontology design patterns (ODP) [13]. ODPs identify ontologcial design structures, terms, larger expressions, and semantic contexts [14]. An ontology design patterns is a successful reusable solution to a recurrent modeling Problem. The ODP contains ontology patterns (OPs) that help to design of ontologies. OPs are of different types and can help in many different ways. In this paper we concentrate on naming OPs and Structural OPs types. Naming OPs is one of the pattern types of ODPs that are conventions on how to create names for namespaces, files, and ontology elements in general (classes,

properties, etc.). Naming OPs are good practices that improve ontology readability and understanding by humans and matchers, by supporting homogeneity in naming procedures. Assimilating the name of entities of ontologies based on naming conventions and establishing uniform ontologies makes the ontology easier to understand by users and matchers. Structural ODPs include logical ODPs and architectural ODPs which related to the hierarchical relations in ontologies [15]. These patterns are proposed to help design of ontologies based on common rule in hierarchy and granularity for defining the entities of ontologies. Therefore, we use these pattern techniques and ontology refactoring process for the preprocessing phase of ASMOV to solve problems caused by ontology diversity. This work makes ontologies easier to understand by ASMOV and helps avoid some common mistakes in alignment results. Furthermore, we can reduce the executing time by applying ontology refactoring process on these patterns.

III. RELATED WORK This section discusses the state-of-the-art ontology

matching techniques, and describes the previous work about pattern detection and refactoring in the field of ontology matching.

Euzenat and Shvaiko [4] present a comprehensive review of current approaches and classifying them along three main dimensions: input interpretation, kind of input, and granularity. The input interpretation dimension is divided into syntactic, external, and semantic. The kind of input dimension categorizes techniques which works on textual strings; structural; extensional; and semantic. The granularity dimension distinguishes between element-level and structure-level techniques [7].

Here, we introduce some ontology matching techniques that have good ranks in OAEI:

• Automated Semantic Matching of Ontologies with Verification (ASMOV) [7] that is an algorithm uses lexical and structural characteristics of two ontologies to iteratively calculate a similarity measure between them, derives an alignment, and then verifies the alignment using semantic verification phase to ensure that it does not contain semantic inconsistencies. Using semantic verification phase for this matcher leads to enhance performance in contrast other matchers.

• RiMOM [16] is a dynamic multistrategy ontology alignment framework that combine multiple strategies to improve the matching effectiveness. The key insight in this framework is that similarity characteristics between ontologies may vary widely. This approach considered both textual and structural characteristics of ontologies. RiMOM is a framework based on risk minimization of bayesian decision. It employs multiple ontology alignment strategies and sets the combination weight.

• Falcon-AO[6] is a practical ontology matching system that has a good performance and a number of

980

remarkable features. It is an automatic ontology matching system that uses multiple elementary matchers (V-Doc, GMO and PBM), coordination rules and the similarity combination strategy.

Furthermore, some previous work on pattern detection in ontologies [17], [18], [19], [20] are studied. These papers analyzed collections of OWL ontologies with the aim of determining the frequency of several combined name and graph patterns potentially indicating underlying semantic structures and analysis of concept naming in OWL ontologies for detecting modeling errors and assessing their quality. In [21] authors work on the impact of ontology refactoring on the results of three matching systems.

IV. PREPROCESSING PHASE The preprocessing phase is a phase which is added to

ASMOV for analyzing the input ontologies in order to find the different lexical and structural patterns in input ontologies. In this phase two kinds of patterns are detected which are explained in section IV.A and the refactoring operations are applied on those patterns which are explained in IV.B. In this paper, we concentrate on element level and structural level techniques and work on three kind of input dimension include: textual strings, semantic, and structural which are explained in section III.

A. Lexical Patterns and Structural Patterns The patterns in this study were detected based on our preliminary analysis of some ontologies from the Conference Track. The patterns considered in this study are lexical and structural ones. For detecting lexical patterns we analyzed name of entities, especially classes in OWL ontologies. Furthermore, we found that ASMOV use Lin method[22] for calculating the Lexical similarity. The lexical feature consists of all the human-readable information provided in ontology. In ASMOV three lexical features in OWL ontologies are considered: id, label, and comment. One problem is that various ontologies use different methods for defining the names of homogeneous concepts especially for compound words. Different styles in concept naming in OWL ontologies leads to many problems for calculating lexical similarities in ASMOV. For example, in two ontologies of conference track namely 'Conference' and 'Ekaw' we realized two different class naming for a similar concept namely <Conference#conference-www and Ekaw#website> and also <Conference#rejected-contribution and Ekaw#rejected-paper > which these two pairs aren't found by ASMOV. Therefore, for solving this problem, some lexical patterns are detected based on naming Ops and ontology design patterns[13] that are explained in part II for establishing equal name from these kind of different style naming. For this reason, we used one of the refactoring operations called RN, which are described in section IV.B, for assimilating the lexical features of OWL ontologies in order to reach better matching result them what is used Lin method in ASMOV which are calculated the lexical similarity.

Another pattern is based on this fact that taxonomic structure of ontologies is often various and confusing. One reason is that, different developers have dissimilar viewpoints for defining ontologies in same domains. Therefore, they use different hierarchy and granularity for defining the entities of ontologies. For example, in two ontologies of conference track namely 'Conference' and 'Ekaw' we realized two different granularity in concept naming for a similar concept namely "author". In 'conference' we found three level of granularity for "author" include: contribution_regular-author, contribution_co-author, and Conference _1th-author, while in 'Ekaw' we found only one level of granularity for "author", namely Paper_ author. Furthermore, ontologies typically use subsumption relations to accomplish a variety of tasks. In this respect, we encountered some problems for calculating the relational similarity by ASMOV. The relational or hierarchical similarity is computed by combining the similarities between the parents and children of the entities being compared. For detecting structural patterns we analyzed structure and subsumption relations of ontologies. Therefore, by considering the problems mentioned above and ASMOV work, we realized that different taxonomic structure and different granularity in peer ontologies leads to problem. For solving this problem, we used another refactoring operation called RS which are described in section IV.B for assimilating the structural features of OWL ontologies in order to reach better result from relational similarity method than what is used in ASMOV.

The results show that, in most ontologies, there are a significant number of occurrences of aforementioned patterns.

For example, Figure 1 illustrates different styles in class naming and different taxonomic structure for defining the same concepts in a part of two ontologies namely ConfOf and Sigkdd.

ConfOf.owl Sigkdd. owl Figure 1. Different styles in class naming and different taxonomic structure in two ontologies (ConfOf.owl, Sigkdd.owl)

In Figure 2 we show that correspondences between two

classes of ConfOf and Sigkdd after applying RN and RS refactoring operations. The lines in this Figure show correspondences that are found by ASMOV after refactoring operations. The correspondences which are found exist in reference alignment.

981

Figure 2. Correspondences find by ASMOV after refactoring

B. Refactoring Operations

All cases of modeling errors detected via some patterns which mentioned above can be repaired by two refactoring operations. The detection of aforementioned patterns is the starting point for a refactoring. There are three general refactoring operations which include: adding operation (ADD), restructuring operation (RS), and renaming operation (RN). These patterns consist of different steps depending on the detected situation[21]. In this paper, we use RN and RS for lexical patterns and structural patterns.

Applying rename operations (RN) for the name of classes in one ontology by considering the name of these classes in peer ontology that they have the same taxonomic structure leads to better results in lexical similarity of ASMOV, because, ASMOV use lin method for calculating the lexical similarity which explained in part IV.A.

Furthermore, restructuring operations (RS) are applied for assimilating the structural features of OWL ontologies by considering the parent-child relations and various granularity are used in peer ontologies. So, we can get better results from structural similarity phase of ASMOV by transforming part of one ontology into another ontology. The experimental results in section V shows this fact. We carried out experiments on six pairs of ontologies from the Conference Track. The reason for choosing these six pairs between other ontologies is described in section V.A.

The number of RN and RS operations on these six pairs of ontologies is illustrated in table I.

TABLE I. FREQUENCY OF REFACTORING OPERATIONS

RN RS ConfOf - Sigkdd 2 2 Cmt - Sigkdd 4 3 Edas - Ekaw 7 — Cmt - ConfOf 6 2 Conference - ConfOf 2 1 Conference - Ekaw 8 2

V. EXPERIMENTAL RESULTS

A. Data sets

The ontologies used in our experiments are part of OAEI. The OAEI offers several tracks and subtracks concerned with different types of matching problems. We tested our approach on the conference track. The conference dataset can be seen as a much harder test case compared to other ontologies of OAEI like benchmark dataset, because it is more heterogeneous and has been extensively studied over the past years.

The Conference tests consist of 15 ontologies where each pair of ontologies constitutes a matching problem. These ontologies have been developed as part of the OntoFarm project and dealing with conference organization[23]. Six out of fifteen ontologies of the Conference Track were used in our experiment. Reference mapping (also referred to as gold standard) are available for all possible combinations of these selected ontologies. These ontologies are Cmt, ConfOf, Ekaw, Conference, Edas and Sigkdd. They are described in OWL-DL and serialized in the RDF/XML format.

B. Evaluation Metrics

As indicators for measuring how good an alignment is, we used precision, recall, and F-measure adapted for ontology alignment evaluation [4]. Precision is defined as the number of correctly found correspondences divided by the total number of found correspondences and recall is defined as the number of correctly found correspondences divided by the number of reference alignment. A perfect precision score of 1.0 means that every correspondence computed by the algorithm was correct (correctness), whereas a perfect recall scores of 1.0 means that all correct correspondences were found (completeness).

Precision and recall are defined in (1), (2).

precision # # (1)

recall # # 2

F-measure represents a tradeoff between precision and recall and it is calculated as (3) in which the β parameter represents the influence of recall and precision. It is mainly set to 1 which implies precision and recall are of the same importance.

f measure β B B (3)

By setting β 1, (3) leads to (4). f measure (4)

982

C. Comparison results using original on

refactored ontologies

We carried out experiments on six pairs othe conference track. We automatically genby ASMOV for these pairs of ontologies,sigkdd, Cmt-Sigkdd, Edas-Ekaw, Cmt-EkConference-ConfOf, and Conference-Ekawrefactoring operations (RN, RS) based on pathese ontology pairs. Then compare generaASMOV for both the original ontologies ancounterparts.

The results of our experiments showontologies improve the matching results wstandard evaluation measures.

Figure 3. The results using original ontologies

Figure 4. The results using original ontologies

VI. CONCLUSIONS AND FUTURE

In this paper a preprocessing phase is add

order to analyze input ontologies for dete

00.10.20.30.40.50.60.70.80.9

1

00.10.20.30.40.50.60.70.80.9

1

ntologies and

of ontologies from nerated alignments , namely ConfOf-aw, Cmt-ConfOf, w. Then applied atterns detected on ated alignments by nd their refactored

w that refactored with respect to the

EWORK

ded to ASMOV in ecting the various

lexical and structural patterns inby ontology diversity. Afterwand restructuring operations (Runiform ontologies for improvin

It is hypothesized that ASMrefactored ontologies than orighypothesis was confirmed by experimentations were carried from the Conference Track, Sigkdd, Edas-Ekaw, Cmt-EkaConfOf, and Conference-Ekawshow that refactored ontologieswith respect to the standardPrecision, Recall, and F-Measutesting our approach for other participants in OAEI contest. Fpatterns and other refactoring o

REFER

[1] "Ontology Alignmavailable in: http://oae

[2] J. Euzenat, A. Ferra

Joslyn, V. Malaisé, Cand M. Sabou, "ResuEvaluation Initiative 22009, 2009.

[3] v. mascardi, a. locoontology matching viaevaluation," 2009.

[4] J. Euzenat and P. SSpringer-Verlag, 2007

[5] K. Kotis and M. Lanzcurrent status, dilempresented at ConferenSoftware Intensive Sy

[6] W. Hu and Y. Qu, "Fmatching system," v2008.

[7] Y. R. Jean-Marya, EKabuka, "Ontologyverification," Web SeAgents on the World September 2009.

[8] P. a. X. Wang, B, "Lilfor OAEI 2009," International Workshcollocated with the 8Conference USA, 200

[9] M. H. S. a. M. AonoOntology Alignment E

[10] M. Nagy, M. Vargasontology mapping wOntology Alignment E

Precision

Recall

F-measure

Precision

Recall

F-measure

n order to solve problems caused ward, renaming operations (RN)

RS) are applied to establishing ng the matching results.

MOV will reach better results for ginal unrepaired ontologies. This

experimentation Results. The out on six pairs of ontologies namely ConfOf-sigkdd, Cmt-

aw, Cmt-ConfOf, Conference-w. The results of the experiments s improve the result of ASMOV

d evaluation measurements i.e. ure. For future works, we suggest matching tools, especially those urthermore, the set of detectable perations can be extended.

RENCES

ent Evaluation Initiative", ei.ontologymatching.org/.

ara, L. Hollink, A. Isaac, C. . Meilicke, A. Nikolov, J. Pane,

ults of the Ontology Alignment 2009," presented at Proc. of OM

oro, and p. rosso, "automatic a upper ontologies:a systematic

Shvaiko, "Ontology matching," 7. zenberger, "Ontology Matching:

mmas and future challenges " nce on Complex, Intelligent and ystems (CISIS), 2008. alcon-AO: A practical ontology ol. 6, pp. 237-239 September

E. P. Shironoshitaa, and M. R. y matching with semantic emantics: Science, Services and

Wide Web, vol. 7, pp. 235-251

ly: Ontology Alignment Results presented at The Fourth

hop on Ontology Matching 8th International Semantic Web 09. o, "Anchor-Flood," presented at Evaluation Initiative 2008. s-Vera, and E. Motta, "DSSim-

with uncertainty " presented at Evaluation Initiative 2008.

983

[11] J. Tang, Li, Y., et al, "RiMOM: A Dynamic Multistrategy Ontology Alignment Framework," IEEE Transactions on Knowledge and Data Engineering, pp. 1218-1232, 2009.

[12] F. Giunchiglia, P. Shvaiko, and M. Yatskevich, "S-Match: an algorithm and implementation of semantic matching," In: Proc. of the First European Semantic Web Symposium - ESWS pp. 61–75, 2004.

[13] N. Noy and D. L. McGuinness, "Ontology development 101: A guide to creating your first ontology," March 2001.

[14] J. R. Reich, "Ontological design patterns for the integration of molecular biological information," 1999.

[15] A. Gangemi and V. Presutti, "Ontology design patterns," Handbook on Ontologies, pp. 221-243, 2009.

[16] J. Li, J. Tang, Y. Li, and Q. Luo, "RiMOM: A Dynamic Multistrategy Ontology Alignment Framework," IEEE Transactions on Knowledge and Data Engineering, vol. 21, pp. 1218-1232, August 2009.

[17] O. Svab-Zamazal and V. Svatek, "Analysing Ontological Structures through Name Pattern Tracking," presented at In: EKAW 2008, Acitrezza, Italy, Springer LNCS 2008.

[18] O. Svab-Zamazal and V. Svatek, "Towards Ontology Matching via Pattern-Based Detection of Semantic

Structures in OWL Ontologies," presented at In Proceedings of the Znalosti Czecho-Slovak Knowledge Technology conference, 2009.

[19] Sv´atek V. and S. a. O., "Tracking Name Patterns in OWL Ontologies," presented at EON-2007 at ISWC-2007, Busan, Korea, 2007.

[20] V. Sv´atek, "Design Patterns for SemanticWeb Ontologies: Motivation and Discussion," presented at In: 7th Conf. on Business Information Systems (BIS-04), Poznan, April 2004.

[21] O. Sváb-Zamazal, V. Svátek, C. Meilicke, and H. Stuckenschmidt, "Testing the impact of pattern-based ontology refactoring on ontology matching results," presented at Proceedings of the ISWC 2008 Workshop on Ontology Matching (Poster Paper), Karlsruhe, Germany, 2008.

[22] D. Lin, "An information-theoretic definition of similarity," presented at presented at Proceedings of the 15th International Conference on Machine Learning (ICML), Morgan Kaufmann, San Francisco, 1998.

[23] O. Svab, V. Svatek, and H. Stuckenschmidt, " A study in empirical and `casuistic' analysis of ontology mapping results," presented at In Proceedings of the 4th European conference on The Semantic Web (ESWC-07), Berlin, Heidelberg, 2007.

984