Model-driven rule-based mediation in XML data exchange

Post on 03-Mar-2023

2 views 0 download

Transcript of Model-driven rule-based mediation in XML data exchange

Model-driven Rule-based Mediation in XML Data Exchange

Yongxin Liao1, Dumitru Roman2, and Arne J. Berre2 SINTEF ICT

Forskningsveien 1, Oslo, Norway 1yongxinliao@gmail.com, 2{firstname.lastname}@sintef.no

ABSTRACT XML data exchange has become ubiquitous in Business to Business (B2B) collaborations. Automating as much as possible the exchange of XML data between enterprise systems is a key requirement for ensuring agile interoperability and scalability in B2B collaborations. The lack of standardized XML canonical models or schemas in B2B data exchange, as well as semantic differences and inconsistencies between conceptual models of those that want to exchange XML data implies that XML data cannot be directly and fully automatically exchanged between B2B systems. We are left with the option of providing techniques and tools to support humans in reconciling the differences and inconsistencies between the data models of the parties involved in a data exchange. In this paper we introduce such a technique and tool for XML data exchange. Our approach is based on a lifting mechanism of XML schemas and instances to an object-oriented model, and the design and execution of data mediation at the object-oriented level. We use F-logic – an object oriented rule language – together with its Flora2 engine as the underlying mechanism for providing an abstract, object-oriented model of XML schemas and instances, as well as for specification and execution of the mappings at the model level. This provides us with a fully-fledged tool for design- and run-time data mediation, by focusing at the actual semantic models behind the XML schemas, rather than having to deal with the technicalities of XML in the data mediation process. Finally, we present the architecture of the current data exchange system and report on preliminary evaluation of our system.

Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability, D.2.2 [Design Tools and Techniques], H.2.5 [Heterogeneous Database]

General Terms Algorithms, Design, Experimentation, Languages

Keywords XML Data Exchange, Data mediation, Semantic mapping

1. INTRODUCTION Providing techniques and tools to improve the level of automation of XML data exchange in B2B collaborations is widely regarded as a key enabler for agile interoperability and scalability in B2B collaborations [1]. In this paper we introduce a technique and tool for design- and run-time support of XML data exchange. Before we give a brief overview of the approach, let us define in more details the problem of XML data exchange in the context of B2B collaborations.

Since we assume the data sent and received by parties in a B2B collaboration to be in XML, we face the problem of XML data transformation. Figure 1 provides an overview of the elements involved in XML data transformation and the process by which an XML document is transformed into another document. Company X (depicted on the left side of the picture) wants to send the Source XML document (e.g. an invoice) to Company Y. The Source XML document is compliant with an XSD schema (Source XSD) made available by Company X such that the receivers of its XML documents can understand the structure and meaning of such documents. Company Y (on the right side of the figure) processes XML documents (in our case Target XML) according to its own schema Target XSD. If Target XSD differs from Source XSD, then company X is faced with the problem of having to process the Source XML document which it does not understand.

Figure 1. Generic design-time and run-time XML data

transformation.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10…$10.00.

Company X Transformation Layer

Schema

Transformation

Instances Transformation

Company Y

Source XML

Source XSD

Target XML

Target XSD

Design-Time Run-Time

89

Therefore, the core challenge is to generate the Target XML document from the Source XML document, given the Source XSD schema and the Target XSD schema. A Transformation Layer is usually designed to address this challenge by providing means to map the Source XSD to the Target XSD at design time, and by providing an engine that implements the schema mappings at run time when the Target XML needs to be generated from Source XML. Since the transformation cannot be fully automated, the core question is how to design the transformation layer in such a way that the human intervention in the specification and execution of mappings is kept at a minimum. XSD is well known to be a complex language and designing mappings between XSD schemas is nothing but a challenge. It is our belief that the mapping designer should focus on the mappings at the semantic level between the conceptual models behind the XSD schemas that need to be mapped, rather than having to deal with technicalities of XSD. Therefore, in the paper we rely on the lifting of XSD schemas to more abstract, object-oriented models, and the specification of the mapping at this more abstract layer. This will not only ease the specification of the mappings by the mappings creator, but would also enable other kind of schemas, not only XSDs, to be mapped to or from XSD schemas. In this paper we chose F-logic – a rule-based object-oriented logical language – as the language to represent the semantic models behind the XSD schemas. We use F-logic not only for specifying the semantic models, but also for specifying the mappings between them. Furthermore, the use of Flora2 engine 1 – a reasoning engine for F-logic [3] – allows us to perform run-time mediation. In this way, we use F-logic/Flora2 as a platform independent model according to the OMG MDA architecture. We argue for two benefits of our approach to XML data exchange:

1. It allows the mappings creator to focus on the semantic, object-oriented model behind the XSD schemas and specify the mappings at a more abstract, semantic level, rather than having to deal with technicalities of XSD schemas.

2. It allows both specification and execution of data mappings (i.e. design- and run-time mapping) in a single, unifying framework.

The remaining of this paper is organized as follows. Section 2 provides a brief introduction to F-logic/Flora2. Section 3 presents our mapping approach for lifting of XSD schemas to object-oriented modes, mapping specification and run-time execution. Section 4 provides an overview of the architecture of our data exchange system together with some preliminary performance results. Section 5 gives concludes this paper, together with some relevant related work and potential extensions.

2. BRIEF OVERVIEW OF FLORA2 In order to realize data mediation at a more abstract, semantic level, we need a higher level of abstraction for the representation of XML schemas and instances. Our approach is based on using object-oriented representations to abstract XML schemas and instances and then to perform mapping between a source and a 1 http://flora.sourceforge.net/

target at the object-oriented level. It is easier to focus on the semantics of data if it is represented in an object-oriented form rather than a tree-like structure as in XSD. With our solution, mapping rules, schemas and instances will all be in the object-oriented form. In this paper, F-logic, together with its Flora2 implementation, is used as the object-oriented language for formalizing schemas and instances, as well as the mappings. Flora2 is a sophisticated object-oriented knowledge base language and application development environment platform [3]. Flora2 is implemented as a set of run-time libraries and a complier that translate a unified language of F-logic, HiLog and Transaction Logic into tabled Prolog code. Figure 2 presents an example of Flora2 schemas and objects description, rules and queries, as well as loading files into modules. For example, in the specification of schemas ‘=>’ is used to specify the types of the attributes of a class, ‘*’ is used for inheritable attributes, in the specification of objects ‘->’ is used to specify the values of the object’s attributes, ’>>Mod’ means load a program into a module Mod (‘@Mod’ means query the value in model Mod). The reader is referred to [3] for further details of the syntax and semantics of F-logic/Flora2.

Figure 2. Flora2 examples: objects, rules, queries.

The core motivation for choosing Flora2 is that it is a rule based object-oriented logical language which provides support for flexible specification of schemas, instances, mapping rules, and at the same time it can be used to execute mapping rules on instance data. Flora2 comes with an XML package which supports loading and parsing XSD/XML documents, converting them to sets of Flora2 objects stored in user-specified Flora2 modules. It also provides equivalent entities for XSD and XML, features that used in our framework for data mediation.

3. MAPPING APPROACH Our proposed solution called FloraMap which is based on logical rules for specifying mappings at the schema level and executing those mappings at the instance level. The choice for logical rules is motivated by their declarative and procedural semantics, making them a powerful tool for declaratively specifying and at

Object description: John:person[name -> ‘John Doe’, children -> {Bob, Mary}]

Mary:person[name -> ’Mary Doe’, children -> {Alice}]Rules:

?X:human :- ?X:person. Queries:

Whose child is Bob in module Mod: ?X : person@Mod, ?X[name ->?Y, children->Bob]@Mod. Output Result: ?X=’John’,?Y=’John Doe’ Loading programs in modules: ?- [‘path/filename.flr’>>Mod] #include “path/filename.flr” ….

Object Instance Relation

Schema description person[name*=>string, children*=>person].

Class Attribute Type

90

the same time executing mappings. Logical rules cannot work directly with XSDs, and therefore proper abstraction mechanisms need to be developed for abstracting XSD schemas, on top of which mappings can be designed and executed. Our choice for such abstractions is the use of object-oriented techniques for representing XSD and XML, on top of which mapping rules can be more easily specified. Figure 3 below gives an overview of the mapping approach. We can separate the mapping in two parts: Design-time and Run-time.

Figure 3. Mapping Approach – Overview.

Design-time:

1. The Source XSD and Target XSD are represented as source and target Flora2 object-oriented schemas.

2. Logical rules are used to specify the mappings between the source Flora2 schemas and target Flora2 schemas.

Run-time: 3. The Source XML is represented as Flora2 objects of the

source Flora2 schema 4. Logical rules from step 2 are executed for the source Flora2

objects and target Flora2 objects are generated 5. The target Flora2 objects are serialized in target XML

instances The rest of this section will give an overview of how abstraction is achieved (mapping XML schemas and instance to Flora2 representations), how mappings are specified and executed (i.e. mapping Flora2 source objects to Flora2 target objects), and how the resulting Flora2 objects are serialized in XML (i.e. mapping Flora2 objects to XML instances).

To exemplify these steps we will use the exchange of an XML invoice between a company X (source) and a company Y (target). The schemas of the invoices of companies X and Y are presented in Figure 4, together with the following mappings:

1. Bizszam in source is the same as InvoiceNumber in target 2. Bizkelt in source is the same as InvoiceDate in target 3. City in source is the same as DeliveryAddress.city in target 4. Zip in source is the same as DeliveryAddress.zip in target 5. Street in source is the same as DeliveryAddress.street in

target 6. AccDate in target is a concatenation of Ev in the source, a

delimiter, Kanyvho in the source, a delimiter, and the string ’01’, i.e. AccDate = (Ev+‘_’+Kanyvho+‘_’+’01’)

Figure 4. XML Schemas and mappings example.

<xs:element name="InvoiceCompanyY"> <xs:complexType> <xs:sequence> <xs:element name="InvoiceNumber" type="xs:string"/> <xs:element name="AccDate" type="xs:string"/> <xs:element name="InvoiceDate" type="xs:string"/> <xs:element name="DeliveryAddress" minOccurs="0"> <xs:complexType> <xs:sequence> <xs:element name="city" type="xs:string" minOccurs="0"/> <xs:element name="zip" type="xs:string" minOccurs="0"/> <xs:element name="DoorNo" type="xs:string" minOccurs="0"/> <xs:element name="street" type="xs:string" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element>

(b)

<xs:element name="InvoiceCompanyX"> <xs:complexType> <xs:sequence> <xs:element name="Bizszam" type="xs:string“/> <xs:element name="Ev" type="xs:string“/> <xs:element name="Kanyvho" type="xs:string“/> <xs:element name="Bizkelt" type="xs:string“/> <xs:element name="city" type="xs:string" minOccurs="0"/> <xs:element name="zip" type="xs:int" minOccurs="0"/> <xs:element name="street" type="xs:string" minOccurs="0"/> </xs:sequence> </xs:complexType> </xs:element>

(a)

4 3

1

2

Source XSD: Company X Target XSD: Company Y

5

6

Target XSD

Flora2 Schema

Semantic Mapping (Specification and Execution)

Transform Engine

Target XML

Source XML

Flora2 Schema

Flora2 Objects

Flora2 Objects

Design-time Run-time

Source XSD

91

3.1 XSD2OO The technique we designed for abstracting XML schemas to object-oriented models will generate two Flora2 models for each XSD: one Flora2 model (Abstract) contains the “clean” conceptual model of the schema (without any technicalities of XSD, but focusing on the semantics of the elements), and the other one (Special) contains XSD specific information (sequence, choice, etc.) which will be used for generating the structure of target XML instances. In most cases, XSD elements can find a natural representation in Flora2. For example, if a job element in XSD is specified as <element  name=”job”  type=”string”  minOccurs=”0”, maxOccurs=”5”>, it can be transform as in Flora2 as [job {0:5}*=>string]. The {0:5} cardinality is equivalent to minOccurs=”0” and maxOccurs=”5” in XSD.

Due to length restrictions, we do not provide the reader with a complete mapping of XSD to Flora2 schemas. Nevertheless, Table 1 provides three examples of how top-level elements in XSD are mapped to Flora2 representations.

Table 1. Example of XSD elements to Flora2 schema mapping

Situation 1 Top-level Element with BaseType

XSD <element name=”name” type=”string” maxOccurs=”2”/>

Abstract name[name {1:2} *=>string].

Special none

Situation 2 Top-level Element with ComplexType

XSD

<element name=”name”> <complexType> <sequence>

<element name=”firstname” type=”string”/> <element name=”lastname” type=”string”/> </sequence> </complexType> </element>

Abstract name[firstname {1:1} *=>’string’]. name[lastname {1:1} *=> ‘string’].

Special Elements[name->firstname]. Elements[name->lastname]. Sequences[name->[firstname,lastname]].

Situation 3 Top-level Element with SimpleType

XSD

<element name=”age”> <simpleType> <restriction base="int"> <maxInclusive value="200"/> </restriction>

</simpleType> </element >

Abstract age[base *=>’int’]. age[maxInclusive-> 200].

Special none

Attribute and Element are different things in XSD, but we abstract them as the same in Abstract and identify the difference in Special.

XSD import and include have natural equivalents to Flora2 modules. For example, “filename.xsd” is included in XSD file which is presented as <include schemaLocation="filename.xsd"/>. It can be transformed as #include “filename_Abstract.flr” in Flora2 Abstract file and #include “filename_Special.flr” in Flora2 Special file. For XSD import, the following steps can be used for the mapping:

1. [‘filename_Abstract.flr’>>namespace] in Flora2 abstract file 2. [‘filename_Special.flr’>>namespace] in Flora2 special file 3. Keep the element name and replace the “:” with “_” in the

type

Table 2 below exemplifies the way XSD import and include are handled in Flora2 schemas.

Table 2. XSD contains import and include to Flora2 mapping.

Situation 1 XSD Import

XSD

<schema xmlns:ccts=“abcd"> <import namespace=“abcd" schemaLocation="../Information.xsd"/> <element name=”person”> <complexType> <sequence>

<element name=”name” type=”ccts:Type”/> <element ref=”ccts:age”/> <element name=“work”> <complexType> <simpleContent> <extension base=“ccts:workType“/> </simpleContent> </complexType> </element> </sequence> </complexType> </element> </schema>

Abstract

?- [‘path/Information_ Abstract.flr’>>ccts] person[name {1:1} *=> ccts_nameType]. person[‘ccts:age’ {1:1} *=> ccts_age]. person[work {1:1} *=> personwork]. personwork[‘ccts:workType’ {1:1} *=> ccts_workType].

Special ?- [‘path/Information_Special.flr’>>ccts] Elements[person -> name]. Elements[person -> ‘ccts:age’]. Elements[person -> work].

Situation 2 XSD Include XSD <include schemaLocation="person.xsd"/>

Abstract #include “path/person_ Abstract.flr”

Special #include “path/person_Special.flr”

The result of applying the XSD to Flora2 transformation to the XSD schema of Company X (Figure 4.a) is depicted in Figure 5, and the result of applying the transformation to the XSD schema of Company Y (Figure 4.b) is depicted in Figure 6.

92

Figure 5. Flora2 schema representation of Company X XSD

schema (Figure 4.a)

Figure 6. Flora2 schema representation of Company Y XSD

schema (Figure 4.b) These Flora2 Abstract and Special parts represent the source and target XSD and will be used as input in the design-time mapping and run-time target XML instance generation.

3.2 XML2OO The technique we designed for abstracting XML instance to object-oriented models will generate one Flora2 model. Flora2 provides natural equivalences between object entities and XML instances.

For example, if an instance of a job element is represented in XML as <job>Programmer</job>, then it can be transformed to obj_1:person [job->’programmer’] in Flora2. Obj_1 is a unique object name and obj_1:person means obj_1 is one of the instances of person. To transform the XML instance to Flora2 objects, the following high-lever steps are devised: 1. Parsing XML instance files in Flora2, resulting in a Flora2

tree. 2. Load Flora2 Abstract source files in Flora2. 3. Generate the Flora2 object structure according to the Flora2

abstract and query the value from Flora2 tree; Object names are constructed by concatenating “obj_” + a unique number (e.g. 1_1_2) generated from the unique location in the tree.

Step 1 is performed by Flora2 engine itself which is not part of our implementation (Flora2 XML package provides XML parsing support). It stores XML instances in Flora2 tree automatically when XML files are parsed. FloraMap uses this package to load XML file and uses Flora2 tree to query the value. Step 2 and 3 are performed by FloraMap. FloraMap generates the Flora2 objects structure according to the Flora2 Abstract and queries the value from Flora2. Figure 7 shows the generation of a Flora2 object from an XML instance example of Company X. On the upper part are X’s XML instance and Flora2 Abstract. The output is the Flora2 object obj.

Figure 7. XML to Flora2: Company X

3.3 OO2OO The core part of data mediation is the specification and execution of the mappings in Flora2, process which takes as input the Flora2 Abstract schemas of the source and target and the mappings between them, the Flora2 source objects, and generates Flora2 target objects according to the specification of the mappings. This phase can be separated in three steps:

1. Specification of the design-time mappings between the source and target Flora2 Abstract schemas.

Flora2 object (Company X)

obj: InvoiceCompanyX ['Bizszam'->'I_001']. obj: InvoiceCompanyX ['Ev'->'2010']. obj: InvoiceCompanyX ['Kanyvho'->'05']. obj: InvoiceCompanyX ['Bizkelt'->'2010-05-18']. obj: InvoiceCompanyX ['city'->'Oslo']. obj: InvoiceCompanyX ['zip'->'1234'].

bj I i C X [' ' 'Fi S ']

Source XML (Company X)

<InvoiceCompanyX> <Bizszam>I_001</Bizszam> <Ev>2010</Ev> <Kanyvho>05</Kanyvho> <Bizkelt>2010-05-18</Bizkelt> <city>Oslo</city> <zip>1234</zip> <street>First Street</street> </InvoiceCompanyX>

Flora2 Abstract (Company X)

Flora2 Abstract (Company Y)

Flora2 Special (Company Y)

Sequences[InvoiceCompanyY ->['InvoiceNumber','AccDate', 'InvoiceDate‘ ,'DeliveryAddress',TheOrderEnd]].

Elements[InvoiceCompanyY ->InvoiceNumber]. Elements[InvoiceCompanyY ->AccDate]. Elements[InvoiceCompanyY ->InvoiceDate]. Elements[InvoiceCompanyY ->DeliveryAddress]. Sequences[CompanyYDeliveryAddress->['city','zip',

'DoorNo','street']]. Elements[CompanyYDeliveryAddress ->city]. Elements[CompanyYDeliveryAddress ->zip]. Elements[CompanyYDeliveryAddress ->DoorNo]. Elements[CompanyYDeliveryAddress ->street].

Namespace[value->'xs:']. InvoiceCompanyY[InvoiceNumber{1:1}*=>'xs:string']. InvoiceCompanyY[AccDate{1:1}*=>'xs:string']. InvoiceCompanyY[InvoiceDate{1:1}*=>'xs:string']. InvoiceCompanyY[DeliveryAddress{0:*}*=> CompanyYDeliveryAddress]. CompanyYDeliveryAddress[city{0:*}*=>'xs:string']. CompanyYDeliveryAddress[zip{0:*}*=>'xs:string']. CompanyYDeliveryAddress[DoorNo{0:*}*=>'xs:string']. CompanyYDeliveryAddress[street{0:*}*=>'xs:string'].

Flora2 Abstract (Company X)

Namespace[value->'xs:']. InvoiceCompanyX [Bizszam{1:1}*=>'xs:string']. InvoiceCompanyX [Ev{1:1}*=>'xs:string']. InvoiceCompanyX [Kanyvho{1:1}*=>'xs:string']. InvoiceCompanyX [Bizkelt{1:1}*=>'xs:string']. InvoiceCompanyX [city{0:*}*=>'xs:string']. InvoiceCompanyX [zip{0:*}*=>'xs:int']. InvoiceCompanyX [street{0:*}*=>'xs:string'].

Flora2 Special (Company X)

Sequences[InvoiceCompanyX ->['Bizszam','Ev', 'Kanyvho',’Bizkelt','city','zip','street']]

. Elements[InvoiceCompanyX ->Bizszam]. Elements[InvoiceCompanyX ->Ev]. Elements[InvoiceCompanyX ->Kanyvho]. Elements[InvoiceCompanyX ->Bizkelt]. Elements[InvoiceCompanyX ->city]. Elements[InvoiceCompanyX ->zip].

93

2. Generation of the executable (run-time) mappings from the design-time specification of the mappings.

3. Execution of the mappings on source Flora2 object for generation of Flora2 target objects.

For step 1 we provide a simple mechanism to capture the correspondences between the Flora2 Abstract source and target schemas. This is achieved by the following Flora2 predicates: OneToOne([source],[target]). OneToMany([source],[[target1],[target2],…],[n1,m1,n2,m2,..]). ManyToOne([[source1], [source2], [source3],…],[target]). OneToOne means that a class or attribute in the source schema corresponds to a class or attribute in the target schema. OneToMany means that a class or attribute in the source schema corresponds to more than one class or attributes in the target. ManyToOne means that more than one class or attribute in the source schema correspond to one class or attribute in the target. [source] is the path of the source class or attribute. [target] is the path of the target class or attribute. [n1,m1,n2,m2…] are values to identify substrings, first substring is from n1 to m1, second substring is from n2 to m2 and so on. Figure 8 shows the Flora2 specification of correspondences/mappings between the Flora2 Abstract source and target schemas from Figures 5 and 6, respectively. The mapping information is taken from our running example in Figure 4.

Figure 8. Design-time correspondences between the Flora2

schemas of company X and Y For step 2 we have devised a mechanism that takes as input the Flora2 source and target schemas, the design-time correspondences between them, and generates a Flora2 program that represents the executable mappings. This can be achieved in Flora2 in a rather intuitive and straightforward way: for each object instances in source generate new objects (using the newoid primitive defined in Flora2), assign the values to the new objects according to the design-time correspondences rules, and store the new objects in a target knowledge base (using the transactional feature insert of Flora2). Figure 9 shows the generated executable mapping program for our running example.

Figure 9. Fora2 executable program (run-time mappings)

In step 3, Flora2 system is used as the underlying reasoning engine to execute the Flora2 program on source instances. Figure 10 shows the result of applying the executable mapping program on an instance of Company X invoice (obj) and the resulting instance of the Company Y invoice (obj1).

Figure 10. Run-time mapping of Flora2 objects

Executable Mapping Program (Fig 9)

Flora2 target object (Company Y)

Flora2 source object (Company X)

obj: InvoiceCompanyX ['Bizszam'->'I_001']. obj: InvoiceCompanyX ['Ev'->'2010']. obj: InvoiceCompanyX ['Kanyvho'->'05']. obj: InvoiceCompanyX ['Bizkelt'->'2010-05-18']. obj: InvoiceCompanyX ['city'->'Oslo']. obj: InvoiceCompanyX ['zip'->'1234']. obj: InvoiceCompanyX ['street'->'First Street'].

obj1: InvoiceCompanyY[InvoiceNumber->’I_001’]. obj1: InvoiceCompanyY [AccDate->'2010_05_01']. obj1: InvoiceCompanyY[InvoiceDate->’ 2010-05-18']. obj1: InvoiceCompanyY[DeliveryAddress->{obj_4}]. obj_4:CompanyYDeliveryAddress[city->’Oslo']. obj_4:CompanyYDeliveryAddress[zip->‘1234']. obj_4:CompanyYDeliveryAddress[street->‘First Street'].

?- [‘InvoiceCompanyX.flr'>>SourceInstances]. ?-?h: CompanyX@SourceInstances,newoid{?t},newoid{?t_4}, insert{ ?t: InvoiceCompanyY[InvoiceNumber->?t_1], ?t: InvoiceCompanyY [AccDate->?t_2], ?t: InvoiceCompanyY [InvoiceDate->?t_3], ?t: InvoiceCompanyY [DeliveryAddress->?t_4], ?t_4: InvoiceCompanyYDeliveryAddress[city->?t_4_1], ?t_4: InvoiceCompanyYDeliveryAddress[zip->?t_4_2], ?t_4: InvoiceCompanyYDeliveryAddress[street->?t_4_4] | ?t_1=?h.Bizszam@SourceInstances, flora_concat_items([?h.Ev@SourceInstances,_, ?h.Kanyvho@SourceInstances,_01],?t_2)@_plg(flrporting), ?t_3=?h.Bizkelt@SourceInstances, ?t_4_1=?h.city@SourceInstances, ?t_4_2=?h.zip@SourceInstances, ?t_4_4=?h.street@SourceInstances}.

Flora2 Abstract Company X

Design-time Mappings: Company X to Y

Flora2 Abstract Company Y

OneToOne([InvoiceCompanyX],[ InvoiceCompanyY]). OneToOne([InvoiceCompanyX,Bizszam],

[ InvoiceCompanyY,InvoiceNumber ]). OneToOne([InvoiceCompanyX,Bizkelt],

[InvoiceCompanyY,InvoiceDate ]). OneToOne([InvoiceCompanyX,City],

[InvoiceCompanyY,DeliveryAddress, city]). OneToOne([InvoiceCompanyX,Zip],

[InvoiceCompanyY,DeliveryAddress, zip]). OneToOne([InvoiceCompanyX,Street],

[InvoiceCompanyY,DeliveryAddress, street]). ManyToOne([[InvoiceCompanyX,EV],‘_’,

[InvoiceCompanyX,KANYVHO],‘_’,‘01’]], [InvoiceCompanyY, AccDate]).

94

3.4 OO2XML Flora2 to XML mapping is the last process in FloraMap execution and is concerned with serialization of generated Flora2 objects into XML instances. This process takes as input the target schema (both Flora2 Abstract and Special target schemas) and the Flora2 target objects and generates a target XML instances. In the XSD to Flora2 lifting process, FloraMap generated two Flora2 models: Flora2 Abstract (contains conceptual model of the schema) and Flora2 Special (contains XSD specific information). These two Flora2 files are used for generating the structure of target XML instances. Note that the Flora2 Special target schema plays a key role in the serialization of the objects, because it indicates the technical details of the XML instance that should be generated. In Flora2 to Flora2 mapping process, FloraMap generated Flora2 objects which are used to query the values of each class and attribute. Figure 11 depicts the Flora2 to XML process in our running example.

Figure 11. Serialization of Flora2 objects to XML instances

4. System Architecture, Implementation, and Experimental Results The techniques outlined in the previous section have been implemented in FloraMap - as a set of modules implemented in Flora2 which can be used to parse and transform XML schemas and instances into Flora2 schemas and objects, and execute the mediation rules specified at the Flora2 level.

At design-time FloraMap takes as input the source and target XML schemas and generates the object-oriented models of the schemas. Then, the mappings creator specifies the correspondences/mappings between the schemas (similar to the example given in Figure 8), and generates the executable mapping program (similar to the example given in Figure 9) that will be used to execute mediation on source instances.

At run-time FloraMap takes as input the XML source instances, the Flora2 source and target schemas, and the executable mapping rules produced at the design-time. Based on these inputs, FloraMap transforms XML source instances to Flora2 objects, executes the mappings on these source objects and generates target objects, and finally serializes the target objects into XML target instances.

Figure 12 presents a high-level overview of the FloraMap modules and the interactions between them. The followings are the core modules of FloraMap:

• XSD to Flora2: Transforms the input XSDs to Flora2 schema models

• XML to Flora2: Transforms the input XML instances to Flora2 objects

• Flora2 to Flora2: Specifies the mappings between the source and target Flora2 models (OO level)

• Flora2 to XML: serializes the Flora2 objects to XML instances

Figure 12. FloraMap: Core modules and interactions

Several experiments have been performed on the current implementation to test the scalability of FloraMap. The experiments have been carried out on a commodity computer (Intel(R) Core(TM) 2 Duo CPU P8600 @ 2.4GHz, 4GB RAM, Windows Vista 32-bit OS). Two types of experiments have been performed: 1. Transformation of XSDs of various sizes and complexities

to Flora2 Schema.

2. End-to-end data exchange of increasing number of instances for the running example presented in above section.

For the first type of experiments we have used XSDs of various sizes and complexities to test the scalability of generating Flora2 object-oriented models from XML schemas. The used XSDs ranged from simple schemas such as those presented in this paper

XSD to Flora2

Flora2 to Flora2

Flora2 to XMLXML to Flora2

Source Flora2 Schema

Source Flora2 Objects

Target Flora2 Schema

Target Flora2 Objects

Target XSD

Source XSD

Source XML

Target XML

Target XML (Company Y)

<?xml version="1.0"?> < InvoiceCompanyY > <InvoiceNumber>I_001</InvoiceNumber> <AccDate>2010_05_01</AccDate> <InvoiceDate>2010-05-18</InvoiceDate> <DeliveryAddress> <city>Oslo</city> <zip>1234</zip> <DoorNo> </DoorNo> <street>First Street </street> </DeliveryAddress> </ InvoiceCompanyY>

Flora2 Abstract (Company Y)

Flora2 object (Company Y)

Flora2 Special (Company Y)

obj1: InvoiceCompanyY[InvoiceNumber->’I_001’]. obj1: InvoiceCompanyY [AccDate->'2010_05_01']. obj1: InvoiceCompanyY[InvoiceDate->’ 2010-05-18']. obj1: InvoiceCompanyY[DeliveryAddress->{obj_4}]. obj_4: CompanyYDeliveryAddress[city->’Oslo']. obj_4: CompanyYDeliveryAddress [zip->‘1234']. obj_4: CompanyYDeliveryAddress [street->‘First Street'].

95

(in Figure 4) to very complex schemas such as the Northern European Subset of UBL (NES).2 The times needed to generate object-oriented models from XSDs are reported in Figure 13.

Figure 13. Performance results: Generation of Flora2 models

from XML schemas The results show that mapping large and complex schemas such as NES is a time consuming task (took about 7 minutes), however this is not an issue since this generation needs to be done at design time and only once. After producing the Flora2 representations of the XSDs, they can be loaded and processed rather fast by FloraMap, for run-time mediation.

For the second type of experiments, where we tested the end-to-end data exchange, we have used increased numbers of synthetically generated instances of the source schema presented in Figure 4, to generate instances of the target schema (also presented in Figure 4). This type of experiment included the complete mapping of source instances to target instances, through an intermediary schema (not presented here), meaning that we had three schemas and two set of mappings. The time needed to have a complete transformation of increased numbers (1 to 4000) of invoice instances of Company X XSD to instances of Company Y XSD is reported in Figure 14.

Figure 14. Performance results: End-to-end data mediation

These results show that the larger the number of instances the more time is needed for end-to-end processing, with the time being somewhere between linear and exponential. Whereas in some applications this can be acceptable (e.g. processing 4000 instances in about 15 minutes, as our results showed), in some other applications this might not be reasonable.

2 http://www.nesubl.eu/

5. Related Work, Conclusions, and Outlook The problem of mapping between data structures has been extensively studied for decades, and schema mapping is well established as research field [6,2]. Nevertheless, the use of rule-based logical systems for data mapping/exchange hasn’t been yet widely investigated in the community. With this paper we provided a solution to the end-to-end data exchange problem based on the use of F-logic/Flora2 as a logical framework which we used for high-level, abstract specification of schemas and mappings between them, as well as for run-time execution of mappings. Our approach allows the mappings creator to focus on the semantic, object-oriented model behind the XSD schemas and specify the mappings at a more abstract, semantic level, rather than having to deal with technicalities of XSD schemas. The proposed approach allows both specification and execution of data mappings (i.e. design- and run-time mapping) in a single, unifying framework, providing an end-to-end solution to the problem of XML data exchange. There are several works that can be related to our approach. For example [4] presents algorithms to represent XML and XSD in a mainstream object-oriented programming language. It develops two mappings: one uses a set of rules that map an XSD schema into its object-oriented schema, and the other one maps XML instances that conform to an XSD schema to their representation as objects. This is directly related to our generation of Flora2 object-oriented models from XML schemas and instances, however, the representation in [4] does not seem to be complete (e.g. it is unclear how XSD import/include statements are handled). Furthermore, our approach targets specification of mediation as well as run-time execution, whereas [4] focuses just on an object-oriented representation of XML schemas. Another relevant work is [5], which focuses on generation of XML from object oriented modes. This can be related to our serialization of Flora2 objects into XML, but as in the case of [5] the scope of our work is much broader.

In a wider context, the work presented in this paper is related to MDE model transformation techniques and languages [7,8] such as ATL Transformation Languages (ATL). 3 Whereas model transformation languages can be applied to the XML data exchange problem addressed in this paper, it is unclear how suitable and easy is to apply such general purpose languages for the specific case of XSD/XML. A thorough analysis of model transformation techniques developed in the MDE community is needed in order to judge their suitability for XML data exchange. Furthermore, a systematic comparison of mode transformation techniques and logical rule-based approaches for data exchange is needed in order to understand their similarities and differences, and have a clear understanding of their advantages and disadvantages for data exchange.

The FloraMap mapping technique proposed in this paper is promising, and its implementation and experiments showed that run-time mediation is possible and feasible with a logic-based rule approach. However, there are still some directions can be considered to further enhance FloraMap:

3 http://www.eclipse.org/atl/

96

1. Extensions for handling end-to-end n-m mappings between, where multiple sources and multiple targets can exchange data.

2. Inconsistent mappings may lead to errors during the run-time data exchange, therefore design and implementation of a consistency check technique at design time would significantly improve the mapping process. It is expected that the underlying reasoning mechanism provided by F-logic will significantly contribute to the automated detection of inconsistencies between mapping rules, and therefore making logical rule based approaches even more attractive for data exchange.

3. Design and implementation of a graphical interface for design-time mapping. In its current implementation, FloraMap does not come with a graphical editor of Flora2 models and mappings. Reuse of open-source tools such as the emerging in the context of the OpenII project4 could be relevant in this context.

4. FloraMap has been designed for XML data mapping, however since the approach works at an expressive model level, it should be fairly simple to extend it to handle other types of schemas such as relational schemas. This would enable exchange of data that conform to different schematic representation, e.g. relational, XML schemas, etc.

5. (Semi-)Automated generation of executable mapping rules. Approaches for automated generation of rules in the area of ontology and MDE model transformation techniques such as [9,10], as well ideas from semantic Web services matchmaking such as [11], can be employed here to provide sophisticated support for a (semi-) automated generation of mapping rules.

6. More comprehensive validation. Whereas we provided some initial experimental results for the scalability of FloraMap, other aspects of our approach need to be analyzed in a more systematic way. For example, analyzing the complexity of the specification of mapping rules, compared for example to the complexity of the specification of mapping rules using model transformation techniques would be another potential direction for future work.

ACKNOWLEDGMENTS This work is partly funded by the EU projects “A Semantic Service-oriented Private Adaptation Layer Enabling the Next Generation, Interoperable and Easy-to-Integrate Software Products of European Software SMEs (EMPOWER)” 5 and “Environmental Services Infrastructure with Ontologies (ENVISION)” 6.

4 http://www.openintegration.org/ 5 http://empower-project.eu/ 6 http://www.envision-project.eu/

6. REFERENCES [1] Christoph Bussler. B2B Integration. 2003, Springer, ISBN

3540434879. [2] Ken Smith, Peter Mork, Len Seligman, et al. The Role of

Schema Matching in Large Enterprises, CIDR Perspectives 2009.

[3] Guizhen Yang, Michael Kifer. FLORA-2: User’s Manual 2008.

[4] Suad Alagic, Philip A. Bernstein, Mapping XSD to OO Schemas, Microsoft Research, 2008.

[5] R. Xiao, Tharam S. Dillon, E. Chang, Ling Feng. Modeling and Transformation of Object-Oriented Conceptual Models into XML Schema, Database and Expert Systems Applications, 795-804.

[6] Bernstein, P. A. and Melnik, S. Model management 2.0: manipulating richer mappings. In Proceedings of the 2007 ACM SIGMOD international Conference on Management of Data (Beijing, China, June 11 - 14, 2007).

[7] Mens, T, and Van Gorp, P. A Taxonomy of Model Transformation, Electronic Notes in Theoretical Computer Science, Volume 152, 27 March 2006, Pages 125-142.

[8] Czarnecki, K, and Helsen, S. Classification of Model Transformation Approaches. In: Proceedings of the OOPSLA'03 Workshop on the Generative Techniques in the Context Of Model-Driven Architecture, Anaheim, California, USA.

[9] Stephan Roser, Bernhard Bauer. Automatic Generation and Evolution of Model Transformations Using Ontology Engineering Space. J. Data Semantics 11: 32-64 (2008).

[10] Gerti Kappel, Elisabeth Kapsammer, Horst Kargl, Gerhard Kramler, Thomas Reiter, Werner Retschitzegger, Wieland Schwinger, Manuel Wimmer: Lifting Metamodels to Ontologies: A Step to the Semantic Integration of Modeling Languages. MoDELS 2006: 528-542.

[11] Klusch, M. and Kaufer, F. WSMO-MX: A hybrid Semantic Web service matchmaker. Web Intelli. and Agent Sys. 7, 1 (Jan. 2009), 23-42.

97