M2BenchMatch: An assisting tool for metamodel matching

6
M 2 BenchMatch: An Assisting Tool for MetaModel Matching Lamine LAFI 1 ISSAT, University of Sousse, Miracl Laboratory, Tunisia [email protected] Jamel FEKI 2 University of Sfax Miracl Laboratory, Tunisia [email protected] Slimane HAMMOUDI 3 ESEO, University of Angers, France [email protected] AbstractBenchmarking remains an important area for several computer science domains. In this paper, we present a benchmarking software tool called M 2 BenchMatch (MetaModel Benchmark Matching); it is intended to assist the expert to select a metamodel matching technique for a given couple of metamodels to be matched. It produces values for quality metrics. These values are very useful to evaluate and compare the quality of results produced by metamodel matching techniques and then to assist the user during new matching by recommending an appropriate technique. Keywords—Metamodel Matching; Comparaison Criteria; Metamodel Benchmarking I. INTRODUCTION AND MOTIVATIONS Many approaches for metamodel matching [1, 2, 3, 4, 5] have been proposed in recent years; their benefit is approved in several scenarios (e.g. available couple of metamodel with a given metamodel matching technique). Several matching software prototypes supporting these approaches have been developed and published in research papers. Most of these researches provide an experimental section for a particular scenario, i.e., using real-world metamodels. However, a matching tool may provide an acceptable matching quality for a specific scenario, but it may be unreliable in other cases. So, it is difficult to evaluate two metamodel matching tools in order to identify which performs best quality. Note that quality is usually estimated through well known metrics called quality measures. As far as we know, there is no complete and comprehensive benchmark for metamodel matching tools contrary to schema matching, where many evaluations are presented as in [6, 7]. These evaluations of schema matching tools are realized according to a set of matching criteria and then a summary of each matching tool’s capabilities is provided. In metamodel matching context, evaluating matching tools is quite difficult for several reasons. First, these tools are not always available in order to perform tests on specific sets of metamodels. Secondly, some tools require specific resources to run efficiently (i.e., to guarantee good matching quality), like a thesaurus and the pseudo-code of each approach; these resources are often unavailable. Thirdly, running some matching tools requires specific additional data files. We note that the majority of metamodel matching tools evaluations suffer from two drawbacks. First, evaluating the matching tools, using limited scenarios as those provided in a case study, one cannot objectively judge the capabilities of each matching tool. In addition, its limits may be hidden. Secondly, some matching tools generate an integrated metamodel instead of a set of mappings; unfortunately the measures of the literature evaluate a set of mappings but cannot evaluate an integrated metamodel. The work presented in this paper has twofold objective, first it aims to define a set of evaluation criteria for metamodel matching. This set is an extension of the schema evaluation criteria provided in [5]. Secondly, we develop a benchmark (M 2 BenchMatch) for metamodel matching. Based on the defined criteria, this benchmark serves to evaluate existing metamodel matching techniques. This assists the expert user to select which technique is more appropriate for a given couple of metamodels. In order to obtain significant evaluation, M 2 BenchMatch evaluates four matching tools against the same set composed of ten couples of metamodels; these metamodels have different size that we have classified into three classes namely Small, Medium and Large. This paper is organized as follows. It presents, in Section II, the properties of the designed benchmark for metamodel matching. Section III describes the quality measures as evaluation criteria. Section IV gives the architecture of the developed prototype M 2 BenchMatch. While Section V illustrates a demonstration and finally Section VI concludes this work. II. PROPERTIES OF M 2 BENCHMATCH A benchmark needs to be: Simple, as both end-users and metamodel matching experts are targeted by the benchmark. Portable, that is OS-independent. This requirement is fulfilled by using Java Eclipse. Scalable, the metamodels matching in large scale should be possible. Generic, as it should work with most of the available matchers. It should allow one to restrict the matching criteria to include various types of matchers, based on their shared capabilities, i.e. their least common denominator. It should be able to compare metamodel

Transcript of M2BenchMatch: An assisting tool for metamodel matching

M2BenchMatch: An Assisting Tool for MetaModel Matching

Lamine LAFI1 ISSAT, University of Sousse, Miracl Laboratory, Tunisia [email protected]

Jamel FEKI2

University of Sfax Miracl Laboratory, Tunisia [email protected]

Slimane HAMMOUDI3

ESEO, University of Angers, France

[email protected]

Abstract— Benchmarking remains an important area for several computer science domains. In this paper, we present a benchmarking software tool called M2BenchMatch (MetaModel Benchmark Matching); it is intended to assist the expert to select a metamodel matching technique for a given couple of metamodels to be matched. It produces values for quality metrics. These values are very useful to evaluate and compare the quality of results produced by metamodel matching techniques and then to assist the user during new matching by recommending an appropriate technique.

Keywords—Metamodel Matching; Comparaison Criteria; Metamodel Benchmarking

I. INTRODUCTION AND MOTIVATIONS Many approaches for metamodel matching [1, 2, 3, 4, 5]

have been proposed in recent years; their benefit is approved in several scenarios (e.g. available couple of metamodel with a given metamodel matching technique). Several matching software prototypes supporting these approaches have been developed and published in research papers. Most of these researches provide an experimental section for a particular scenario, i.e., using real-world metamodels. However, a matching tool may provide an acceptable matching quality for a specific scenario, but it may be unreliable in other cases. So, it is difficult to evaluate two metamodel matching tools in order to identify which performs best quality. Note that quality is usually estimated through well known metrics called quality measures.

As far as we know, there is no complete and comprehensive benchmark for metamodel matching tools contrary to schema matching, where many evaluations are presented as in [6, 7]. These evaluations of schema matching tools are realized according to a set of matching criteria and then a summary of each matching tool’s capabilities is provided. In metamodel matching context, evaluating matching tools is quite difficult for several reasons. First, these tools are not always available in order to perform tests on specific sets of metamodels. Secondly, some tools require specific resources to run efficiently (i.e., to guarantee good matching quality), like a thesaurus and the pseudo-code of each approach; these resources are often unavailable. Thirdly, running some matching tools requires specific additional data files.

We note that the majority of metamodel matching tools evaluations suffer from two drawbacks. First, evaluating the matching tools, using limited scenarios as those provided in a case study, one cannot objectively judge the capabilities of each matching tool. In addition, its limits may be hidden. Secondly, some matching tools generate an integrated metamodel instead of a set of mappings; unfortunately the measures of the literature evaluate a set of mappings but cannot evaluate an integrated metamodel. The work presented in this paper has twofold objective, first it aims to define a set of evaluation criteria for metamodel matching. This set is an extension of the schema evaluation criteria provided in [5]. Secondly, we develop a benchmark (M2BenchMatch) for metamodel matching. Based on the defined criteria, this benchmark serves to evaluate existing metamodel matching techniques. This assists the expert user to select which technique is more appropriate for a given couple of metamodels.

In order to obtain significant evaluation, M2BenchMatch evaluates four matching tools against the same set composed of ten couples of metamodels; these metamodels have different size that we have classified into three classes namely Small, Medium and Large.

This paper is organized as follows. It presents, in Section II, the properties of the designed benchmark for metamodel matching. Section III describes the quality measures as evaluation criteria. Section IV gives the architecture of the developed prototype M2BenchMatch. While Section V illustrates a demonstration and finally Section VI concludes this work.

II. PROPERTIES OF M2BENCHMATCH A benchmark needs to be:

• Simple, as both end-users and metamodel matching experts are targeted by the benchmark.

• Portable, that is OS-independent. This requirement is fulfilled by using Java Eclipse.

• Scalable, the metamodels matching in large scale should be possible.

• Generic, as it should work with most of the available matchers. It should allow one to restrict the matching criteria to include various types of matchers, based on their shared capabilities, i.e. their least common denominator. It should be able to compare metamodel

matching tools producing both an integrated metamodel and a set of mappings, and those that only produce one of them.

• Complete, as it should provide the complete set of measures which can be derived in each case.

• Extensible, the benchmark is able to evolve according to research progress. This extensibility gathers three points: (i) future metamodel matching tools could be benchmarked; (ii) new evaluation metrics could be added to measure the matching quality or time performance; and (iii) users could easily add new metamodels.

III. QUALITY MEASURES

In order to compare the metamodel matching approaches, we need to define a set of evaluation criteria. We build this set by inheriting criteria from the literature of this domain, enriching some of them with additional details, and adding new specific criteria. The existing criteria identification relies on those defined in [5], [10] and [11]. We envisage a set of evaluation criteria and we classify them into two categories: Criteria adopted from schema matching, enriched and re-used for metamodel matching evaluation:

• Input metamodels: What kind of input has been used (Metamodel information, instances, size, dictionaries etc.)? The simpler the test problems are and the more auxiliary information is used, the more likely the systems can achieve better effectiveness. However, the dependence on auxiliary information may also lead to increase Pre-matching effort.

• Output: Given two input metamodels M1 and M2, a matching algorithm produces a set of mappings. A simple mapping relates an element of M1 to one element of M2 and has a similarity value which indicates its plausibility [12]. A similarity value can be discrete (i.e., 1 or 0, true or false) or continuous (i.e., real value between 0 and 1). Mappings can have an endogenous (i.e., the output has the same technical space as the input) or exogenous (i.e., when input and output spaces are different). In a mapping result, one or more elements of the first model may be related to more than one element of the second model and inversely, resulting in different matching cardinalities 1:1, 1:n, and m:n [12].

• Evaluation: Calls interest in assessing the quality of matching results. To evaluate the quality of the metamodel matching tools, we reuse measures stemming from the field of information retrieval [11]. These measures compare the manually determined matches (also called relevant matches) to the automatically found matches. We detail this criteria as follows :

The interrelationships between metamodels can be organized in sets manually or automatically created. A set manually created contains all correct matches while a set automatically created may contain valid and invalid matches. The first set is denominated real matches, and the later derived matches (cf. Fig.1). Naturally, the two sets of real matches and

derived matches intersect. Therefore, other subsets are defined as follows [10]:

• A (false negatives): denotes matches needed but not automatically identified.

• B (true positives): depicts matches which are needed and have also been correctly derived by the automatic match operation.

• C (false positives): represents matches falsely proposed by the automatic match operation.

• D (true negatives): signifies false matches which have also been correctly discarded by the automatic match operation.

Figure1. Comparing real matches and derived matches.

Based on the cardinalities of these sets, the four following match quality measures are provided as parameters for benchmarks: Precision | || | | | (1)

The Precision reflects the share of real correspondences among all found ones. Recall | || | | | (2)

The Recall specifies the share of real correspondences that are found. F Measure 2 P RP R (3)

Overall Recall 2 P (4)

Note that all these measures were developed specifically in the schema matching context. We can notice that F-Measure represents the harmonic mean of Precision and Recall. The main underlying idea of the Overall quality measure is to quantify the post-matching effort needed for adding missed matches and removing false ones.

New specific criteria for metamodel matching evaluation: • Auxiliary resources: The need to use auxiliary

resources at the time of the matching is very frequent. These resources are expensive and cannot only have an influence on the quality of the results but also on the performances.

• Human efforts: it is necessary to specify the human effort and how it is quantified: For example, the pre-matching effort (Metamodel dataset, weights,

A B CD

Real Matches Derived Matches

thresholds, etc.) and the post-matching effort (correction and improvement of the results).

• Matching process: What information has been included in the match result (mappings between attributes or whole tables, nodes or paths, etc.)? What is the correct result? The less information the system provides as output, the lower the probability of making errors but the higher the post-processing effort may be.

IV. M2BENCHMATCH ARCHITECTURE

M2BenchMatch (MetaModel Benchmark Matching tool) is implemented using Eclipse. It calculates the measures described in the previous section and should help one choose among the available metamodel matching tools the one that best satisfies the user’s needs.

As depicted in Fig.2, M2BenchMatch requires the following:

A couple of metamodels to be aligned: a Source metamodel (SM) and a Target metamodel (TM) (cf. Fig. 2). If

their format is not Ecore they have to be processed through the Metamodel encoding step (cf. Fig.2, step 1), otherwise step 1 is bypassed and then step 2 is directly followed.

A Base of Matching Approaches: actually, in M2BenchMatch contains four main recent approaches for metamodel matching namely Similarity Flooding (SF) [1] used initially in the fields of schema matching and ontology alignment, Semi-Automatic Matching Tool for Model Driven Engineering (SAMT4MDE for short) [2] used in the MDA technological space (UML/MOF: Meta Object Facility), ModelCVS [3] and finally the AtlanMod Matching Language (AML) [4].

The External resources of components includes ontologies (domain specific), or dictionaries and expert correspondences which are useful for matching tools.

The Additional specific informations denote Metamodels dataset, weights or thresholds to be manually tuned...

Fig.2 shows the architecture of M2BenchMatch.

External resources and additional specific informations

Base of Matching Approaches

Figure 2. Architecture of M2BenchMatch for metamodel matching.

After choosing the input metamodels and converting them into an internal structure required by the matching algorithm, the next step selects a metamodel matching technique among those available in the benchmark and then runs it. This execution produces similarity values calculated between elements of the first metamodel and those of the second one. The matching technique then uses these similarity values to generate a set of candidate mappings. This generation procedure is specific to each matching tool. The user can manually intervene during the matching process to modify, validate or complete the candidate mappings. To do so, two available heuristics commonly called Aggregation and Selection can be adopted in the matching process.

The aggregation integrates the similarity values already obtained in order to form a unified score. This aggregation can use one of the following aggregate functions: Max, Min or Weighted Sum. Each of these functions is applied to similarity values among all the values returned by similarity metrics. For instance, the weighted sum multiplies each similarity value by a weight and then sums the results into a single value.

The second heuristic (i.e., Selection) chooses among the obtained mappings those ones having similarity values satisfy a criterion. This selection could be Threshold-based which selects mappings whose similarity values are higher than a given threshold. Also, it could be Strengthening and weakening base; i.e., defines a function taking similarity

values and returning a value between 0 and 1, and then a function filters mappings whose similarity values are evaluated to 0.

The alignment model produced during step 2 is not necessarily in the Ecore format which is required by step 4a; therefore, it will be encoded into an Ecore alignment format through step 3.

The evaluation step determines, for the Ecore alignment, values of the three measures Precision, Recall, and F-measure to evaluate the quality results for a used metamodel matching approach. Two sets are evaluated for these measures: one set is calculated on the correct mappings provided by an expert while the second set is calculated on the candidate mappings issued from a metamodel matching tool.

We use the additional metric called Overall in order to evaluate the human expert post-matching effort.

The Evaluation (4a) enables the expert to compare the results of several matching approaches applied on the same pair of input metamodels. It is based on the above quality metrics in order to identify the appropriate matching technique that guarantees the generation of good quality measures. The Comparison (4b) is useful when the expert assesses a new metamodel matching technique in order to incorporate it into the benchmark. This confers an incremental construction of the benchmark.

V. DEMONSTRATION M2BenchMatch demonstrates four metamodel matching

tools. It is useful to compare the strengths and weaknesses of different metamodel matching techniques, so that an informed decision can be made concerning which approach should be adopted for a given couple of metamodels to be matched. Experimental results comparing the effectiveness of the application of various implementations of four recent techniques (Similarity Flooding, ModelCVS, SAMT4MDE and AML) on real-world metamodels are given.

In order to evaluate these techniques, we have assessed them on ten well known couples of metamodels described in [13], [14], [15], [16], [17], [18]. Fig. 4 shows the Precision obtained for each of the four above techniques applied on each of the ten couples of metamodels. Fig. 5 and Fig. 6 shows the Recall and F-Measure respectively for the same metamodels and techniques. For these couples we have the result of their manual alignments produced by experts. These manual alignments will be useful to evaluate the alignments obtained with each of the metamodel matching techniques. Because an alignment quality depends on the metamodels sizes (i.e., number of elements in the metamodel: classes, attributes, enumerations…) [9], we have classified metamodels according to their size into three main categories: Small, Medium and Large (cf. Table. I). Based on this classification, the tested couples cover the three following combinations: Large-Large (i.e., Ecore2UML2.0, Ecore2UML), Large-Medium (i.e., Ecore2Minjava, Ecore2Minjava2) and Small-Small (i.e., ER_ODM2Webml, BibTeXA2BibTeXB).

Fig. 3 shows the main interface of M2BenchMatch. First, the user can choose the Default Run option which applies the four

matching techniques enumerated above on the ten couples of metamodels available in the benchmark.

TABLE I. THREE METAMODEL CATEGORIES Small Medium Large

Size (*) Size ≤ 80 80 ≤ Size ≤ 150 150 ≤ Size ≤ 300 (*)The size of a metamodel is the number of its elements.

Figure 3. M2BenchMatch main menu.

One tool have to be chosen via a file dialog box and then executed after selecting a pair of input metamodels, their characteristics and additional required information. This process is then repeated for the three other techniques. M2BenchMatch compares the matching quality of these techniques. It calculates and returns values for the three measures: Precision, Recall and F-measure.

M2BenchMatch is a flexible software tool, thus to run it using a specific technique one uses the Run Specific Scenario option. The process is identical to the Default Run except that the user should choose both the input and output metamodels to be matched. For some techniques (SF, SAMT4MDE, and AML) some additional information and auxiliary resources are sometimes required as threshold and initial mapping. Then measure values estimating the quality comparison of a pair of metamodels through different metamodel matching techniques are displayed.

Finally, M2BenchMatch enables one to compare the quality of several matching tools, on one or more scenarios, by using the Compare Matching Tools option. The user enters a metamodel matching technique name, and chooses the pair of metamodel to be matched. Next, the user decides which matchers (i.e., matching technique) to compare. It is possible to add as many matchers as desired to the Base of Matching Approaches (cf. Figure 2) while entering their characteristics (name, size, type). When all matchers have been chosen, the user can add another scenario. This process of adding metamodel and techniques ends when the user has selected all matchers with the different scenarios he wanted.

M2BenchMatch automatically generates three curves to better reflect the quality measures returned after the execution of each of the four techniques of metamodels matching. These curves help to assist the expert to choose the best technique. The first curve is based on the evaluation of the Precision measure, the second one callback function Recall, and the third in terms of F-measure.

Figures Fig. 4, Fig.5 and Fig. 6 graphically represent experimental results for the techniques SF (including its three configurations namely Standard, Flattened and Saturated), ModelCVS, SAMT4MDE and AML). All of them are run on ten pairs of metamodels. Fig. 4 depicts values of Precision, Fig. 5 the values of Recall and finally Fig. 6 represents the value of F-measure.

Figure 4. Values of Precison produced with SF, SAMT4MDE, ModelCVS and AML techniques for ten couples of metamodels.

Figure 5. Values of Recall produced with SF, SAMT4MDE, ModelCVS and AML techniques for ten couples of metamodels.

Figure 6. Values of F-measure produced with SF, SAMT4MDE, ModelCVS and AML techniques for ten couples of metamodels.

Finally, we discuss the results of these experiments. Dealing with mapping discovery, we mainly notice that AML provides

a better Precision (less irrelevant mappings discovered) while SF achieves a higher Recall (more relevant mappings

discovered). AML and SAMT4MDE are more appropriate with all couples of metamodels. It is difficult to conclude anything for the large scale since both matching tools perform well in one of the pair of metamodels (Ecore2UML2.0, Ecore2UML) taken from the classes that represent Large-Large metamodels. However, as ModelCVS and SF do not degrade the quality measures especially with metamodels from Large-Medium and Small-Small combinations, so they might be a better choice. On the two couples of metamodels ER_ODM2Webml and BibTeXA2BibTeXB of small size, it is more suitable to use SF. ModelCVS does also obtain a high Precision in these cases, since most labels share similar tokens. The metamodel structure does not help to judge on the choice of a matching tool. However, we could assume that SF, which is based on neighbor affinity rules, performs better with a non-flat structure. For generating an integrated schema or when quality measures are crucial, SF is a better choice. Although classification of the metamodels into three classes enables us to draw some conclusions about the four tools examined in this study, it is still difficult to interpret results because of the complexity of metamodel matching techniques in some cases. Yet, it appears that a tool may be suitable for a given pair of metamodel, but totally useless for another one.

VI. CONCLUSION In this paper, we have designed and implemented the

M2BenchMatch benchmark for metamodel matching techniques. M2BenchMatch produces an improved objective comparison, since the matching techniques are evaluated against the same input pairs of metamodels. Thus, the benchmark highlights the strengths and weaknesses of different metamodel matching approaches, so that, we can advise a well-founded decision for the question: Which metamodel matching approach should be adopted for a given pair of input metamodels to be matched?

Currently, we have compared four recent metamodels matching techniques to build the M2BenchMatch prototype. This few number is not problematic since, in fact, the architecture of M2BenchMatch is extensible according to Fig. 2 where we can increment its Base of Metamodel Matching Approaches by adding new approaches to the benchmark. Now, we continue to enrich the benchmark with other matching techniques and metamodels datasets in order to implement a complete repository of matching techniques on metamodels with several scenarios.

As a further and immediate extension for this work, we are looking to assist the expert selecting an appropriate metamodel matching technique. This assistance will be based on the characteristics of the couple of metamodels to be matched and using conclusions learned from previous experiments. These conclusions should be stored for this reuse. We are also studying the usage of decision trees to facilitate this assistance of the expert.

REFERENCES [1] J. R. Falleri., Huchard, M., Lafourcade, M., and Nebut, C.

(2008). Metamodel matching for automatic model

transformation generation. In: Proceedings of MoDELS ’08, pages 326–340.

[2] Jr. Jose de Sousa, D. lopes, D. Barreiro Claro, and Z. Abdelouahab. A Step Forward in Semi-automatic Metamodel Matching: Algorithms and Tool. In: Filipe, J., Cordeiro, J. (eds.) Enterprise Information Systems. LNBIP, vol. 24, pp. 137–148. Springer, Heidelberg (2009).

[3] G. Kappel, H. Kargl, G. Kramler, A. Schauerhuber, M. Seidel, M. Strommer, M. Wimmer, Matching Metamodels with Semantic Systems – An Experience Report. In BTW 2007, Datenbank system in Business, Technologie und Web, pp 38-52

[4] K. Garces, F. Jouault, P. Cointe, J. Bezivin.: Managing Model Adaptation by Precise Detection of Metamodel Changes. In: In Proc. of ECMDA 2009, Enschede, The Netherlands, Springer (June 2009).

[5] D. Lopes, S. Hammoudi, J. De Souza, Metamodel Matching: Experiments and Comparison. In International Conference on Software Engineering Advances (ICSEA'06), 2006, pp. 225-236.

[6] E. Rahm,., Bernstein, P. (2001). A survey of approaches to automatic schema matching. VLDB Journal 10(4): pages 334-350.

[7] P. Shvaiko, J. Euzenat: A Survey of Schema-Based Matching Approaches. J. Data Semantics IV: 146-171 (2005).

[8] L. Lafi, I. Saidi, S. Hammoudi, and J. Feki: Comparison of two metamodel matching techniques, In 4th Workshop on Model-Driven Tool & Process Integration (MDTPI2011) at the 7th European Conference on Modeling Foundations and Applications (ECMFA), June 6, 2011. Birmingham-London.

[9] L. Lafi., S. Hammoudi, and J. Feki: Metamodel matching techniques in MDA: Challenge, issues and comparison, in 1st International Conference on Model & Data Engineering (MEDI’2011), LNCS vol. 6918, pp. 278-286, September 2011, Obidos-Portugal.

[10] Do, H. H., Melnik, S., & Rahm, E. (2003). Comparison of Schema Matching Evaluations. Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems, pages 221-237.

[11] C. J. Rijsbergen Information Retrival. http://www.dcs.gla.ac.uk/Keith/Preface.html, 1979.

[12] H.H.Do: Schema Matching and Mapping-based Data Integration. PhD. thesis, University of Leipzig (2005).

[13] S. Walderhaug,, U. Johansen, E. Stav,, J. Aagedal. Towards a generic solution for traceability MDD. In ECMDA Traceability Workshop 2006.

[14] OMG UML. 2.0 superstructure final adopted specification. OMG Document reference ptc/03-08, 2, 2003.

[15] G. Kappel, E. Kapsammer, H. Kargl, G. Kramler, T. Reiter, W. Retschitzegger, Wieland Schwinger, and Manuel Wimmer. Lifting metamodels to ontologies: A step to the semantic integration of modeling languages. In Nierstrasz et al. [2006], pages 528–542.

[16] J.R. Falleri. Minjava, 2008. [17] F. Fleurey, Z. Drey, D. Vojtisek, C. Faucher, and V. Mahé.

Kermeta Language, Reference Manual. http://www.kermeta.org/, 2009.

[18] F. Budinsky, SA. Brodsky, and Ed Merks Eclipse modeling framework. Pearson Education, 2003.