Improving self-interpretation of XML-based business documents by introducing derived elements

19
Improving self-interpretation of XML-based business documents by introducing derived elements Oscar Dı ´az * , Felipe I. Anfurrutia Dpto. de Lenguajes y Sistemas Informa ´ ticos, University of the Basque Country, Apdo. 649 - 20080 San Sebastia ´ n, Spain Received 4 September 2003; received in revised form 12 July 2004; accepted 8 December 2004 Available online 23 May 2005 Abstract XML is becoming the standard for document description in a B2B setting, and XML Schema is gaining wide accep- tance as the schema language to define the structure of these documents. Business documents such as those currently found in B2B applications, frequently comprise derived elements, i.e. elements whose content can be calculated through a deriving function that examines the content of other elements. Based on this observation, this work advocates incor- porating the notion of derived elements in XML Schema. Despite its wide presence, the notion of derived elements is not yet supported in XML Schema. To this end two issues are addressed: (1) the implicitness of current approaches to deriving function support for XML documents, and (2) the externality of some deriving data which participate in the deriving function. These aspects are respectively tackled both (1) by moving the derivation semantics to the document schema, and (2) by proposing an attachment where the deriving elements are recorded. These ideas have been realized by extending a JAXP-compliant XML parser. Now, this parser can be configured to become ‘‘derivation aware’’. That is, the schemata is now enriched with deriving functions. At parsing time, these functions are interpreted, and the returned values become the content of the associated derived elements. Ó 2005 Elsevier B.V. All rights reserved. Keywords: XML schema; Derived data; B2B applications; Business documents 1. Introduction Problem statement. A business document is ‘‘a unit of business information exchanged in a business transaction’’ [1]. Order, billing or delivery forms are all examples of business documents that collect data which give support to a certain business func- tion, regardless of how this data is finally stored. Many projects or standards (e.g. UBL, EDIFAT, X12, IDA) have developed electronic order and in- voice messages. For instance, IDA is a European program which aims at specifying business 1567-4223/$ - see front matter Ó 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.elerap.2004.12.003 * Corresponding author. E-mail addresses: [email protected] (O. ´az), felipe@ si.ehu.es (F.I. Anfurrutia). Electronic Commerce Research and Applications 4 (2005) 264–282 www.elsevier.com/locate/ecra

Transcript of Improving self-interpretation of XML-based business documents by introducing derived elements

Electronic Commerce Research and Applications 4 (2005) 264–282

www.elsevier.com/locate/ecra

Improving self-interpretation of XML-basedbusiness documents by introducing derived elements

Oscar Dıaz *, Felipe I. Anfurrutia

Dpto. de Lenguajes y Sistemas Informaticos, University of the Basque Country, Apdo. 649 - 20080 San Sebastian, Spain

Received 4 September 2003; received in revised form 12 July 2004; accepted 8 December 2004

Available online 23 May 2005

Abstract

XML is becoming the standard for document description in a B2B setting, and XML Schema is gaining wide accep-

tance as the schema language to define the structure of these documents. Business documents such as those currently

found in B2B applications, frequently comprise derived elements, i.e. elements whose content can be calculated through

a deriving function that examines the content of other elements. Based on this observation, this work advocates incor-

porating the notion of derived elements in XML Schema. Despite its wide presence, the notion of derived elements is

not yet supported in XML Schema. To this end two issues are addressed: (1) the implicitness of current approaches to

deriving function support for XML documents, and (2) the externality of some deriving data which participate in the

deriving function. These aspects are respectively tackled both (1) by moving the derivation semantics to the document

schema, and (2) by proposing an attachment where the deriving elements are recorded. These ideas have been realized

by extending a JAXP-compliant XML parser. Now, this parser can be configured to become ‘‘derivation aware’’. That

is, the schemata is now enriched with deriving functions. At parsing time, these functions are interpreted, and the

returned values become the content of the associated derived elements.

� 2005 Elsevier B.V. All rights reserved.

Keywords: XML schema; Derived data; B2B applications; Business documents

1. Introduction

Problem statement. A business document is ‘‘a

unit of business information exchanged in a business

1567-4223/$ - see front matter � 2005 Elsevier B.V. All rights reserv

doi:10.1016/j.elerap.2004.12.003

* Corresponding author.

E-mail addresses: [email protected] (O. Dıaz), felipe@

si.ehu.es (F.I. Anfurrutia).

transaction’’ [1]. Order, billing or delivery forms

are all examples of business documents that collect

data which give support to a certain business func-

tion, regardless of how this data is finally stored.

Many projects or standards (e.g. UBL, EDIFAT,X12, IDA) have developed electronic order and in-

voice messages. For instance, IDA is a European

program which aims at specifying business

ed.

O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282 265

documents exchanged between a public sector

buyer and an external supplier during the ordering

and invoicing phases [1].

Business documents frequently hold derived

data, i.e. data which is calculated from other data.For example, the totalAmount of an order can be

obtained from the cost of each item included in

the order minus some applicable discount. Such

deriving functions support important business pol-

icies, including terms and conditions (e.g. policies

for price discounting), service provisions (e.g. pol-

icies for refunds), or surrounding services (e.g. pol-

icies for lead time to place an order). Such businesspolicies are the subjects of contracts among part-

ners, and can normally be found in catalogs, store-

fronts and marketplaces, as well as in bids and

requests communicated during negotiations, pro-

curements and auctions.

XML is becoming the standard for document

description in B2B settings. And XML Schema is

gaining wide acceptance as the schema languageto define the structure of these documents. Indeed,

the IDA programme uses XML Schema. However,

there is not yet a mechanism to describe derived

elements. This leads to deriving functions being

hard-coded into the applications which process

the document. For example, deriving functions

can be hidden within XSLT templates, or as scripts

within the application which process the XMLdocument. But hard-coding these functions not

only hinders development, it also jeopardizes

maintenance. Hard-coding, distributing and

repeating these functions along the applications

that process XML documents would certainly

complicate the modification of deriving functions,

and thus, of the business strategies these functions

support.An additional difficulty is that these deriving

functions frequently depend on both the facts –

explicitly stated on the document –, but also on

other data, external to the document itself. That

is, deriving data is not always recorded in the doc-

ument . For instance, applicableDiscount can be

obtained from distinct business policies, such as

‘‘5 percent discount if buyer has a history of loyal

spending at the store’’ or ‘‘20 percent discount if

the product is on offer’’. In this example, applica-

bleDiscount has two deriving facts (i.e. the con-

sumer�s loyalty history, and the product state),

that are kept outside the order document realm.

This makes business partners other than the is-

suer, have a partial picture of how the applicable-

Discount has been obtained, since the informationabout the state of the product (i.e. whether it is

on offer or not) is not recorded in the order. Such

a situation hampers trust construction between

partners. Besides, in case of a conflict, it will be

difficult to resolve about which discount was ap-

plied, as the business document misses important

facts.

These drawbacks could be tackled by incorpo-rating the deriving data (e.g. whether the product

is on offer or not) into the business document it-

self. However, this would increase the size of the

document, and even worse, the document schema

would be made dependent on the business policies.

That is, a change in the policies (better said, on the

deriving data consulted by the policies) would af-

fect the document schema, as the new derivingdata would need to be integrated into the

document.

Another alternative could be to let partners

consult the database of the document issuer to

check the deriving data (e.g: the consumer�s loyaltyhistory). Besides privacy matters, this only makes

sense if the query is posed at the time the order is

issued. Otherwise, the deriving data could havebeen updated and thus, no longer reflects the con-

text in which the order was processed. Even if a

temporal database is used that supports data ver-

sions, this still does not solve the problem, should

a conflict arise. If the case is brought to trial, the

data kept on the database cannot be used as evi-

dence as it could have been easily manipulated

by the document issuer. Neither the business part-ners nor the court will trust the data. This situation

is normally overcome by the existence of a

‘‘trusted third party’’ whose main task is to moni-

tor the business transaction.

This implies that the business document as well

as the business context at the time the document

was processed (i.e. the deriving elements) should

be made explicit and available.Our contribution. Previous observations high-

light two issues: (1) the implicitness of current ap-

proaches to deriving function support for XML

266 O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282

documents, and (2) the externality of some deriv-

ing data.

The first issue is tackled by moving the deriving

function to the document schema. This is aligned

with the strategy of externalization promoted byXML Schema. That is, while XML documents

let you externalize data representations in a plat-

form independent manner, XML Schema let you

externalize the types and structural constraints

that govern XML documents. Instead of imple-

menting these constraints programmatically in

each application which processes a given sort of

document, the constraints are moved to theXML Schema. In this way, the constraints that dic-

tate the structure and validity of the document are

externalized. This saves the trouble of implement-

ing these constraints in each application.

Likewise, moving the deriving functions to the

XML Schema (i.e. externalizing derivation) pro-

motes code reuse. Furthermore, it enhances consis-

tency as all applications share common derivationsemantics, and eases maintenance since changes to

the derivation semantics are localized in a single

place.

However, it is not only a technical issue. As

XML Schema enables integration across the board

for business-to-business connectivity, externaliza-

tion of the derivation semantics makes explicit

important business rules that govern the interac-tion between business partners, thus enhancing

partner trustworthiness and hindering repudiation.

This work presents a model for extending XML

Schema with derived elements. In this way, the

schema is not only responsible for validating the

instance document, but it is also in charge of gener-

ating the derived values by applying the corre-

sponding business policies. In so doing, this workeases the realization of the distinct contract vocab-

ularies currently emerging [2], that focus on captur-

ing the discovery-negotiation-execution life-cycle

that models a business transaction. These agree-

ments are frequently reflected in terms of business

policies, of which deriving functions are a special

realization.

The second issue refers to the externality ofderiving data (e.g: the consumer�s loyalty history).

Such externality can result in the loss of the ‘‘busi-

ness context’’ where the document was generated

(e.g. an order). The solution envisaged here is to

generate an attachment at the time the transaction

(i.e. the order) occurs. Such an attachment holds a

snapshot of the external deriving data which has

been used to obtain the derived values. Such snap-shots of the ‘‘business context’’ allow business

partners other than the issuer, to be aware of the

state of the business at the time the transaction

took place so as to validate the appropriate appli-

cation of the business policies. Such an attachment

is generated as a byproduct of the derivation pro-

cess, and attached to the business document

through an XML processing instruction.Interesting enough, this approach is aligned

with one of the main principles of the IDA pro-

gramme, namely, ‘‘for clarity purposes, all business

documents should as far as possible contain all infor-

mation (reiteration of items ordered, parties in-

volved, etc.). We thereby ensure that each business

document is self-sufficient in interpreting it’’ ([1],

page 15th). Attaching the deriving data to thebusiness document certainly serves this purpose.

The rest of the paper is structured as follows.

Section 2 introduces a running example that illus-

trates some of the issues raised by this work. Sec-

tion 3 outlines XML Schema. Section 4 addresses

how XML Schema can be extended to support de-

rived elements. Section 5 tackles the externality of

deriving elements. This approach has been realizedby making a JAXP-compliant parser ‘‘derivation-

aware’’. The architecture of this prototype is intro-

duced in Section 6. Finally, conclusions are given.

2. A running example

An order document is taken as an example. Anorder would usually contain information about the

customer who placed the order, the order number

and the individual line items on the order includ-

ing their quantities and product references. Fig. 1

gives an example using XML.

Business documents such as the previous one,

commonly comprise derived elements (shown in

bold in Fig. 1) whose values are computed accord-ing to business policies. Next some examples are

shown which have been taken or inspired by poli-

cies stated on Amazon�s site. As we go on, we will

Fig. 1. An order document. Derived elements in bold.

O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282 267

raise several issues such as the need for keeping

track of the derived elements being used, and the

need for prioritized conflict handling.

Example 1. (Derived element: subTotal)

� (Rule) The subTotal is the sum of all the partial-

Costs of the bought items

This is an example where the deriving elements

(i.e. partialCost) are local to the document, and

only one rule is needed to work out the derived

value. Notice also that derived elements can be

based on other derived elements (e.g. partialCost is

in turn a derived element).

Example 2. (Derived element: partialCost)

� (Rule) The partialCost is the multiplication of the

quantity and the unitPrice of the current product

Strictly speaking, a derived element should not

provide any new information except that derivedfrom other elements in the document. However,

268 O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282

such an approach is too restrictive to reflect most

of the policies that regulate current business trans-

actions. Most policies consult a business state that

is not always reflected in the document itself but in

other data resources (e.g. databases, other XMLdocuments such as catalogs, etc). Hence, we extend

the potential deriving elements to any data re-

source available to the document. In this example,

the unitPrice is not defined in the document itself

but retrieved from the company�s database.This externality prevents the document receiver

from having a complete picture of how the derived

elements have been calculated, and hence, makesthe verification of the fulfillment of the agreement

(reflected through the deriving function) difficult.

Besides, such external elements evolve indepen-

dently of the document itself so as to impede the

re-creation of the business state at the time the

document was generated (e.g. at the time the order

took place).

This situation is tackled through the deriva-

tionTrack document. Such a document records a

snapshot of the external deriving elements at the

time the derived elements were calculated. For

the partialCost example, this implies that the unit-

Price of the items is recorded in the document-

Track. The receiver of the order can then

consult this attachment to verify the correct

application of the business policies. Section 5 ad-dresses this issue.

Example 3. (Derived element: deliveryTime)

� (Rule A) If the shipping-speed is ‘‘Standard Ship-ping’’ then, the delivery time will be between 4

and 8 days;

� (Rule B) If the shipping-speed is ‘‘Two-day Ship-

ping’’, and at least three items are available

within 24 hours, then the delivery time will be 3

days;

� (Rule C) If the shipping-speed is ‘‘One-day Ship-

ping’’ and at least three items are available within24 hours, then the delivery time will be 2 days;

� (Rule D) If the shipping-speed is either ‘‘One-day

Shipping’’ or ‘‘Two-day Shipping’’, then the

delivery time will be 4 days;

� (Priority Rule) If both Rules B and D apply, then

rule B ‘‘wins’’, i.e. the delivery time will be 3 days;

� (Priority Rule) If both Rules C and D apply, then

rule C ‘‘wins’’, i.e. the delivery time will be 2 days.

Here, the delivery policy is realized as a set of

rules. We say a rule ‘‘applies’’ when the rule�s body(i.e. the antecedent) is satisfied. If more than one

rule applies simultaneously then, we can combine

the values returned by each rule (value-based

schema) or give priority to one of the rules (rule-based schema). The latter can be realized by

assigning a priority to each rule, and then, the rule

with the highest priority wins. However, this sche-

ma requires a global prioritization schema that is

difficult to maintain for large or evolvable rule

sets. Therefore, in a B2B setting we favour a par-

tial prioritization schema where conflicts are re-

solved among smaller sets. This is the role playedby the Priority Rule in this example.

Example 4. (Derived element: insuranceCost)

� (Rule A) If any item exists whose category is

‘‘cd’’, then the insurance cost will be 1.99€;

� (Rule B) If any item exists whose category is

‘‘book’’, then the insurance cost will be 2.99€;� (Rule C) If any item exists whose category is

‘‘electronic’’, then the insurance cost will be

5.99€;

� (Rule D) If any item exists whose category is

‘‘electronic’’ and needs special hanging, then the

insurance cost will be 6.99€;

� (Priority Rule) If more than one rule applies, the

final insurance cost will be the maximum of thedistinct values returned.

These rules realize the insurance policy. An or-

der can contain distinct categories of items. Eachcategory has a distinct insurance cost. Hence, sev-

eral rules can apply simultaneously (e.g. the order

contains ‘‘cd’’ and ‘‘book’’). Unlike the previous

example, now a value-based prioritization schema

is followed. That is, the final insurance cost is

not calculated as the result of a single rule but as

the combination of the values returned by all

applicable rules as stated by the Priority Rule.Summing it up, derived data can be local or re-

mote to the document itself, while the prioritiza-

tion schema can be rule-based or value-based.

O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282 269

The latter is expressed in the form of meta rules.

Such meta-rules also convey important business

policies. From an engineering viewpoint, a priori-

tized schema enables significantly easier modifica-

tion and more modular revision, both keyfeatures to face in the evolvable nature of the

B2B setting. Indeed, it has been reported that busi-

ness policies are among the most evolvable soft-

ware artifacts found in current information

systems. New rules (policies) can be added, or

meta-rules (strategies) can be updated with no im-

pact to the rest of the rules. Such aspects enhance

modularity and facilitate ‘‘locality in revision’’ [2].

3. An overview of XML Schema

XML is intended to be a self-describing data

format, allowing authors to define a set of ele-

ments and attribute names that describe the con-

tent of a document. As XML allows the authorsuch flexibility, a means is needed to restrict the

number of valid element and attribute names for

Fig. 2. A schema for o

documents, so that the processing application

knows what elements to expect and how to process

them. XML Schema serves this purpose.

An XML Schema document is an XML docu-

ment that describes structural and content-basedconstraints for a class of instance documents, e.g.

which elements can occur and how they can be

nested in the XML document [3], much in the

same way as the ‘‘create table’’ statement defines

the structure of the table tuples in SQL. Validating

parsers can then be used to check conformance of

documents claiming to be instances of a given

schema.Fig. 2 presents a schema for order documents.

The schema starts by declaring the distinct name-

spaces used within the document. In this example,

the ‘‘http://www.w3.org/2001/XMLSchema’’ name-

space is used. The rest of the document consists of

two distinct constructs: element declarations and

type definitions. The orderDate element exempli-

fies an element of a simple type that holds a date

value. The minOccurs attribute indicates the mini-

mum number of orderDate elements that must

rder documents.

270 O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282

appear in an order document. Simple types define

types of elements that have no attributes and only

contain text. By contrast, complex types define

types of elements that can have attributes and

other elements as part of their content [3,4]. Theorder element is a case in point. It holds the order-

Date, orderNumber, customer, etc., elements. The

sequence construct indicates that these (sub)

elements should be provided in the strict order

specified by the schema.

This work addresses how XML Schema can be

extended to specify derived elements. To this

end, available approaches include providing newconstructs to XML Schema, or keeping XML

Schema untouched and use the appinfo element

to add new schema information.

Extending XML Schema with new constructs

is possible through the <redefine> tag. An exam-

ple of this approach can be found at [5]. This is

the cleaner solution but, unfortunately, can re-

frain the schema from being shared with otherpartners unless their parsers have been similarly

upgraded.

By contrast, the <annotation> tag is found in

the standard XML Schema to allow the designer

to provide further information about elements.

Schematron [6] illustrates this approach. Annota-

tions may contain documentation or appinfo ele-

ments. The fomer is for human consumptionwhereas appinfo provides instructions targeted at

the processing application. Fig. 2 shows a docu-

mentation element which holds the author and

the purpose of this schema. This information is in-

tended for humans, hence a documentation element

is used. By contrast, derived element semantics are

targeted at the XML parser, so they will be en-

closed within appinfo tags.This second solution allows to share the schema

with other partners. Even if their parsers are not

derivation-aware, they can still parse the standard

part of the schema. This somehow preserves a kind

of backward-compatibility.

1 Another alternative would have been to raise an error that

indicates this situation.

4. Extending XML Schema with derived elements

This section addresses how deriving elements

are described (i.e. the knowledge model) and

how this specification is interpreted at run-time

(i.e. the execution model). The knowledge model

is an extension of a preliminary account reported

at [7].

4.1. The knowledge model

When describing deriving elements, the first is-

sue is the cardinality of derived elements. XML

Schema provides the minOccurs and maxOccurs

attributes to indicate the allowable occurrence

interval of a given element. However, it is not

clear what this interval should be for derivedelements. Consider the insuranceCost element, if

minOccurs is set to 1 the user is forced to intro-

duce the insuranceCost. But this is not the

expected behaviour. On the other hand, if min-

Occurs is set to 0 the insuranceCost becomes

optional, which is also not the right interpreta-

tion, particularly for the processing application.

As a result, the schema validator of a ‘‘deriva-tion aware’’ parser should re-interpret the meaning

of minOccurs when a deriving function has been

defined inside the element�s definition. In our

implementation, derived elements are set to a min-

Occurs of zero. This means that an XML text edi-

tor would allow the user to introduce a value for a

derived element. At parsing time however, this va-

lue is overridden by the output of the derivingfunction.1

As for the specification of the deriving function,

a rule-based approach has been followed. As an

overall representative approach, rules capture well

many aspects of what one would like to describe in

automated contracts where business policies are

reflected [2]. Broadly speaking, a rule includes an

antecedent or condition describing the situationwhere the rule applies, and an action, which indi-

cates how to calculate the derived element if the

antecedent is met.

When more than one rule can be applicable, a

prioritization schema must be provided. In this

case, the literature favours two approaches,

namely, ‘‘value-based’’ and ‘‘rule-based’’ [8,9].

O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282 271

The former provides a combining function to

‘‘merge’’ the values returned by the distinct

applicable rules. This means that all the applica-

ble rules are executed. By contrast, a ‘‘rule-

based’’ prioritization schema selects only one ruleto execute.

A namespace, XDerive, is introduced to specify

all the above aspects. This namespace is presented

here through examples. The schema can be found

in Appendix A. All XDerive expressions are kept

as content of the <appinfo> tag.

4.1.1. A simple example with local data: the

subTotal element

The value of subTotal is calculated from the

item�s costs found in the order. Its specification

using XDerive is as follows:

<xs:element name=‘‘subTotal’’ type=‘‘xs:deci-

mal’’ ‘‘minOccurs=‘‘0’’>

2 Th

defined

<xs:annotation><xs:appinfo>

<xd:derivingFunction>

<xd:rule id=‘‘rule-1’’ test= ‘‘//

partialCost’’>

<xs:documentation>

subTotal sums all the partial-Cost of the bought items.

</xs:documentation><xd:action>

<xsl:value-of select=‘‘sum(//par-

tialCost)’’/>

</xd:action>

</xd:rule>

</xd:derivingFunction>

</xs:appinfo>

</xs:annotation></xs:element>

A deriving function is described through its

namesake tag.2 Its content includes a set of rules

that realize the function (i.e. <xd:rule> tag). A rule

specification includes a description of its purpose

(i.e. the <xd:documentation> tag), the rule�s ante-

e prefix xd: indicates the namespace where this tag is

.

cedent (i.e. the ‘‘test’’ attribute) and the rule�s con-sequent (i.e. the <xd:action> tag). The ‘‘test’’

attribute holds an XPath expression, i.e. a predi-

cate on the content of the instance document. In

this case, the test checks that partialCost elementsexist. The <xd:action> tag comprises a set of stan-

dard XSLT instructions. The example uses

<xsl:value-of select= ‘‘. . ...’’> to obtain the value

of the derived element using the ‘‘sum()’’ XPath

function with the list of partialCost elements

as parameter. The latest instruction must be

<xsl:value-of. . .> for simple type values and

<xsl:copy-of. . .> for complex type values. Noticethat no prioritization schema has been included

as only one rule is provided.

4.1.2. An example with remote data and a value-

based prioritization schema: the insuranceCost

element

The value of insuranceCost is the maximum of

the costs associated with the category of the dis-tinct items included in the order. Its specification

in XDerive is shown in Fig. 3.

This derivation policy implies more than one

rule. If the order comprises items of distinct

categories, more than one rule can be applied.

The <prioritizationSchema> tag indicates how

to resolve this ambiguity. In this case, the

designer chooses to combine the values returnedby the distinct applicable rules (held by the

actionResult element of the rule), and takes the

highest one. This is supported by the max

function which is specified at the combiningFunc-

tion attribute. This attribute is held by the

<valueBased> tag.

This example also illustrates the use of remote

data. Here, the rule�s antecedent checks the item�scategory. As this data is not available in the base

document, the track document is consulted (re-

ferred to through the variable $derivationTrack).

The structure and generation of this document is

postponed till Section 5. Notice however, that

the source of the external data should be indicated.

The <source> tag serves this purpose. This tag

holds the name of the external deriving data(‘‘derivingFact’’ attribute), the connection configu-

ration to be used to retrieve this data (i.e. the data-

base�s URL, the driver and the like) (‘‘connection’’

Fig. 3. The specification of insuranceCost using XDerive.

272 O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282

attribute), and the SQL query that retrieves the va-

lue (<query> tag).Although our approach includes this connect-

ing information as part of the schema of the base

document, it is debatable whether this informa-

tion should have been better recorded in the

track document itself. The rationale for our deci-

sion is that the track document is a generated

document, not a specification document. So all

information should be included in the schemaof the base document. A second aspect is modu-

larity and locality. By associating the source of

the deriving data together with the rule that uses

such data, the removal/addition of rules is greatly

facilitated. Even if this leads to repeating the

source several times (if the same external data

is used in distinct rules), the benefits outweigh

this drawback.

4.1.3. An example with remote data and a rule-

based prioritization schema: the deliveryTime

element

The value of deliveryTime depends on the ship-

pingSpeed and the availability of the items in stock.

Its specification in XDerive is shown in Fig. 4.

This derivation policy also comprises several

rules. However, unlike the previous example, only

one rule is applied, even if the antecedent of sev-

eral rules is met. That is, the prioritization schemais now rule-based. This implies that the system

checks the rule�s antecedents, keeps this result in

a special element termed antecedentResult, and fi-

nally, applies the priority rules. The latter resolves

the conflict about which base rule to apply.

To this end, the <prioritizationSchema> tag

now contains a <ruleBased> tag, which in turn

contains the priority rules. Since a priority rule is

Fig. 4. The specification of deliveryTime using XDerive.

O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282 273

a rule, its specification is similar to a base rule. Butnow the subject matter are the base rules. That is, a

priority rule�s test checks whether base rules can be

applied or not, while its consequent indicates the

base rule to be applied. Fig. 4 gives an example.

The first priority rule in Fig. 4 checks whether

rule-17 and rule-18 are both met by the current or-

der. If so, the priority rule resolves the conflict by

applying rule-17.3 Priority rules are evaluatedsequentially till one is met. It is up to the document

designer to provide a coherent set of priority rules,

i.e. rules whose antecedents are disjoint. To ensure

completeness, a last rule is added in Fig. 4 whose

antecedent is always true. This accounts for the sit-

3 In the current version, the rule engine assumes a disjunctive

test for the priority rules.

uation where none of the previous priority rulesapply.

4.2. The execution model

This model addresses when deriving functions

are executed. In a database scenario, two an-

swers are possible, namely, derived attributes

are stored in the database or calculated on de-mand. The designer should balance: (1) the addi-

tional cost of storing the derived data and

keeping it consistent with the base data, with

(2) the cost of calculating the derived data each

time it is required [10].

Unlike derived data (as found in the database

area), a business document should not be kept

continuously consistent with the deriving data.Therefore, the ‘‘calculated-on-demand’’ option

274 O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282

has not been contemplated. The content of derived

elements are obtained as an aside process at pars-

ing time. This process takes the ‘‘base document’’

as an input, and generates the ‘‘derived document’’

plus the ‘‘track document’’ (see Section 5), wherederiving data is kept.4 The next section introduces

the derivationTrack document.

4.3. Framing this work into RuleML

The Rule Markup Initiative is striving to de-

fine ‘‘an open, vendor neutral XML-based rule lan-

guage standard . . . permitting both forward

(bottom-up) and backward (top-down) rules in

XML for deduction, rewriting, and further inferen-

tial-transformational tasks’’ (see www.ruleml.org).

The current version of RuleML 0.86 (in June

2004) only covers a very limited form of deriva-

tion rules, similar to Horn clauses [12]. This how-

ever, would be expressive enough to support

derivation rules (except meta-rules) as the onesused in this paper. In what follows, we highlight

the main differences between using RuleML and

the approach described here.

A RuleML rule contains facts, implications

and queries. It is then a complete description of

a production system as found in the Expert Sys-

tem realm. The query states an hypothesis to be

proved by applying the implications which finallyrest on the facts. In our approach, facts and

implications are decoupled: facts are found in

the business document whereas implications are

described in its schema. Hence, our approach is

better aligned with the database way. That is,

databases make a clear distinction between the

intension (i.e. the schema) and the extension

(e.g. the tuples) where each aspect greatly differsin size and volatility.

Another important difference stems from the

facts being XML business documents. Rather than

using a Prolog-like approach, our rule�s antecedent

4 Due to the static nature of business documents, this work

does not contemplate the possibility of updating the derived

element directly nor the updating propagation issue. The latter

involves how modifications on the derived data are propagated

to the deriving data. The challenge lies in the ambiguity of this

transformation as the propagation is not always unique [11].

is an XPath expression that checks the existence of

some data in the XML document. This leads to

more legible expressions but also makes these

expressions dependent on the document structure.

As for the rule�s consequent, it is expressed as anXSLT statement. It is then procedural rather than

declarative as for RuleML.

5. The derivationTrack document

A business document reflects an agreement be-

tween the distinct partners involved in the trans-action. Paper-based documents usually have an

attachment (e.g. conditions on sale) that describes

the context in which this agreement is set. Such

an attachment plays an important legal role in

case of any demand or complaint about the

transaction.

Likewise, ‘‘virtual documents’’ also need a sim-

ilar attachment. This is particularly so in the pres-ence of derived elements. Here, deriving data can

be updated once the transaction occurs (e.g. the

item price can be modified) so as to prevent the

customer from recreating the context in which

his order was processed.

To this end, this work proposes the derivation-

Track document (see Fig. 6), a document which

tracks the derivation process so that this deriva-tion can be replicated at any moment by any of

the partners involved. This document is automati-

cally generated during the derivation process, and

associated to the business document by means of

the <?xderive-derivationTrack . . . ?> processing

instruction (see Fig. 1, line 1).

Although such an approach can sound bizarre

at first glance, it is similar to current practices toindicate the presentation of an XML document.

In this case, how a document is rendered is sepa-

rately described in an XSLT file. When the docu-

ment is processed according to its ‘‘presentation

document’’, the rendering is obtained.

Likewise, how a document has been derived can

also be separately indicated in a distinct XML file:

the derivationTrack file. When this document isprocessed according to its ‘‘track document’’, both

the distinct business policies and the consulted

deriving values are made explicit.

Fig. 5. Document life-cycle: the activating, parsing and explanation stages.

O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282 275

The envisioned situation is depicted in Fig. 5

where the following stages are contemplated:

(1) Activating stage. The activator (e.g. either a

customer or an application that sends the order

automatically if the stock goes below a certain

threshold) kicks off the whole process by sub-

mitting the ‘‘base document’’,

(2) Parsing stage. At the issuer place, the deriva-

tion-aware parser takes the base document as

an input and generates both the ‘‘derived doc-ument’’ and the ‘‘track document’’.

(3) Explanation stage. Finally, either the activator,

the issuer or the trusted third party, can pro-

cess the ‘‘derived document’’ in explanation

mode so as to ascertain both the policies and

deriving facts applied.

Therefore, the derivationTrack file records thevalues of the external deriving data (the <derivin-

gElements> tag), and the rules which have been

applied during the derivation process (the <deriv-

edElements> tag). Fig. 6 gives an example.

The derivationTrack document keeps each

external deriving data as an element. It is worth

emphasizing that the parser only records the deriv-

ing data that is used in the derivation process. Thefirst deriving element of the sample in Fig. 6

follows:

<category ref= ‘‘TV-1’’> electronic </category>

An attribute, ‘‘ref’’, is added to relate the ele-

ment on the derivationTrack document with the

associated element in the base document. Here,the attribute ‘‘ref’’ indicates that this is the cate-

gory of the item ‘‘TV-1’’ to be found in the order

document. Therefore, this work is based on the

understanding that the base document should

provide the means to univocally identify the enti-ties participating in the transaction. In our run-

ning example, the base document should clearly

indicate the customer and the items bought. This

is normally achieved through a reference or code.

In our example, the customer is identified

through the <id> tag while an item holds a

<ref> tag.

Besides deriving elements, the track documentkeeps information about the derivation process

through the <derivedElement> tag. An example

follows:

<xdt:derivedElement select=‘‘/order/shipping-

Data/insuranceCost’’>

<xdt:activated rules=‘‘rule-9 rule-10’’/>

<xdt:decision type=‘‘combining function’’/><xdt:executed rule=‘‘rule-9’’>

<xdt:actionResult>5.99</

xdt:actionResult>

</xdt:executed>

<xdt:executed rule=‘‘rule-10’’>

<xdt:actionResult>7.99</

xdt:actionResult>

</xdt:executed></xdt:derivedElement>

The data recorded includes the element derived

(‘‘select’’ attribute), the rules activated (<acti-

vated> tag), the decision taken, if more than one

rule was applicable (the <decision> tag), and the

rules that were finally executed (<executed> tag).

The example above states that insuranceCost hasbeen calculated after applying a combining func-

tion to the results of the rules rule-9 and rule-10.

Fig. 6. A possible derivationTrack document which is attached to the order document in Fig. 1.

276 O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282

The rationale for having a derivationTrack doc-

ument is to allow business partners to validate the

conformance of the business transaction. As an

example, consider an application that renders

pending orders (e.g. similar to the one offered by

Amazon). Partnership trustworthiness can be en-

hanced by allowing the customer not only to check

the status and data about the order, but also easethe way for the customer to understand how the

delivery time or insurance cost has been calculated.

Currently, this involves the user reading through,

potentially, several pages at the retailer�s site. Thisis lengthy and cumbersome, and can put the user

off. The rationale here is similar to the one found

for the explanation mechanisms provided by some

Expert Systems: improving the confidence of the

user on the transparency and accuracy of the sys-

tem. This is also so in a B2B setting.

Enhancing customer trustworthiness can be as

easy as rendering the track document for the cus-

tomer to replicate the derivation process. Fig. 7 de-

picts a straightforward rendering of the track

document shown in Fig. 6. An index of order num-

bers is shown on the left. Once an order has beenselected, its track document is rendered. In this

case, the system presents another index of the dis-

tinct derived elements found in the selected order.

The user can then select one of these elements, and

both the business policy and derived elements are

shown on the right hand side.

More elaborate displays can be provided that al-

low partners to track, for instance, the number of

Fig. 7. Browsing how derived elements have been calculated.

O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282 277

orders that benefit from a certain offer, and in gen-

eral, the number of cases where a certain businesspolicy (realized as a deriving function) has been ap-

plied. In this way, partners can not only check the

fulfillment of the agreements, but also whether

these business policies have been beneficial.

6. Architecture

This section looks at supporting derived ele-

ments in current XML validators.

The <appinfo> tag allows the provision of addi-

tional information in the document schema. The

questions are now who will and how to interpret

this information. A possibility is to use a two-step

approach whereby the semantics of the new con-

structs are first compiled into a run-time valuator,normally using XSLT. Next, this validator is used

to check the compliance of the instance document

against the new constructs.

This work proposes an alternative way. Rather

than building a new validator that complements

the XML Schema parser, the XML parser itself

is extended to become ‘‘derivation aware’’. Appli-

cations can use the extended parser to parse the in-stance document as they did before except that

now, the new constructs (i.e. XDerive) are also

interpreted.

The main benefit is ‘‘parser transparency’’, that

is, parser enhancements do not imply changes to

the applications using the parser: the application

ignores whether the processed documents containsthe new constructs (in this case, derived elements)

or not. Of course, this option is feasible only for

open and modular parsers. However, such parsers

have seemed to increase in popularity since the

publication of JAXP (Java API for XML Parsing)

[13]. The XML Parser of the Apache Project,

Xerces2 [14], and Oracle�s XML Parser V2 for

Java [15], are examples of JAXP-based parsers.The latter has been used for this work.

JAXP advocates for configurable XML parsers.

In its most basic form, an XML parser just gener-

ates a DOM tree for the input document. Further

enhancements should be reflected through configu-

ration parameters. Specifically, JAXP provides two

initial configuration parameters, namely, name-

SpaceAware and validating (see Fig. 8). The formerallows a parser to tackle documents where distinct

namespaces are used, whereas validating indicates

whether the parser should also check the confor-

mance of the document against a DTD or XML-

Schema. Hereafter, companies can enhance their

ownXMLparsers by providing additional features.

Aligned with this approach, this work is about

enhancing an XML parser with derived elements.Thus, the parser has been extended with an addi-

tional configuration parameter, derivationAware,

that allows parsers to face derived elements. The

next subsections examine how this has been

achieved.

Fig. 8. Extensions that make a JAXP-compliant parser ‘‘derivation aware’’.

278 O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282

6.1. An outline on the JAXP standard

The JAXP architecture follows the ‘‘factory

method’’ pattern for the creation of both docu-

ment builders and nodes.5 This patterns is used

when ‘‘a class can�t anticipate the class of objects

it must create’’ or ‘‘a class wants its subclasses to

specify the objects it creates’’ [16]. The former situ-ation is found for instantiating document builders

whereas the latter arises for node creation. Fig. 8

shows this situation.

The creation of document builders is handled

through both the DocumentBuilderFactory and

the DocumentBuilder classes. Both classes are ab-

stract. DocumentBuilderFactory defines an abstract

method, newDocumentBuilder(), that knows whena new DocumentBuilder should be created, but it

ignores what kind of DocumentBuilder should be

created. This aspect is delegated to its subclasses

(e.g. the JXDocumentBuilderFactory class) that

redefines that abstract method to indicate the

appropriate DocumentBuilder to be used (e.g. the

JXDocumentBuilder class). The newDocument-

Builder() method is called a factorymethod becauseit is responsible for ‘‘manufacturing’’ an object.

5 A document is realized as a set of nodes arranged in an

arborescent way: the DOM tree.

As for nodes, the NodeFactory class plays the

factory role. Unlike the previous example, Node-

Factory is a concrete class, that is, this class al-

ready provides a default implementation. This

implementation is realized through the methods

createAttribute(), createElement(), etc., one for

each kind of node that can be found in an XML

document. These methods can then be specializedto account for special requirements.

This architecture can now be completed by par-

ser vendors. Fig. 8 shows this extension for Ora-

cle�s XML Parser V2 for java. Specifically, the

parser comprises the following classes:

� JXDocumentBuilder, which realizes the Docu-

mentBuilder JAXP class. It is in charge of (1)instantiating the XML DOM parser (imple-

mented by the DOMParser class), (2) starting

the parsing process and, (3) instantiating a

DOM Document object to build a DOM tree

(through the newDocument method).

� JXDocumentBuilderFactory, which realizes the

DocumentBuilderFactory JAXP class. It instan-

tiates a JXDocumentBuilder object. Once theJXDocumentBuilder instance creates the XML

DOM parser, the factory configures the just cre-

ated parser according to a previously estab-

lished configuration. As an example, consider

the way to set a parser to be ‘‘validation

O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282 279

aware’’. To this end, the factory has a setVali-

dating() function to set this feature. If this

property is set to ‘‘true’’, the parsers generated

from then on will be ‘‘validation aware’’.

� DOMParser, which supports the XML parseras such. It has an XML document as an input,

and generates a DOM tree in memory, by

means of the NodeFactory class. This process

is lead by the parse() method. The DOM tree

is an instance of the XMLDocument class.

� NodeFactory, which is responsible for creating

DOM objects during the parsing process.

Hence, distinct methods are available to tackleeach of the node types available in XML: creat-

eAttribute(), createElement(), createDocu-

ment() and the like.

These classes support an XML DOM parser.

Our aim is to make this parser ‘‘derivation-aware’’.

6.2. Making the parser ‘‘derivation aware’’

A JAXP parser is ‘‘derivation aware’’ if it (1)

interprets deriving functions (specified through

XDerive) and (2) generates the track document as

a byproduct of the parsing. To this end, the previ-

ous classes have been extended as follows. First,

the JXDocumentBuilder class is specialized into

the XDADocumentBuilder. Besides parsing the‘‘base document’’, this subclass also interprets

the deriving functions, and generates both the ‘‘de-

rived document’’ and the track document.

Second, a new configuration parameter is added

(i.e. derivationAware) by specializing the JXDocu-

mentBuilderFactory into the XDADocumentBuild-

erFactory (i.e. the setDerivationAware() method).

If this parameter is set, the factory will create aninstance of XDADocumentBuilder rather than an

JXDocumentBuilder object.

Finally, the NodeFactory class is extended by

the XDANodeFactory. Besides parsing the docu-

ment, an XDADocumentBuilder invokes the XDA-

DerivationEngine when a derived element is found.

This class supports the interpreter of XDerive tags.

Moreover, the XDADerivationEngine simulta-neously records the rules being applied, building

up the track document. This means that now the

parser has to cope with two documents: the de-

rived document and the track document. XDAN-

odeFactory specializes the createDocument()

method so as to handle the two documents simul-

taneously. The rest of the behaviour (i.e. the crea-

tion of the distinct node types available in XML) isleft untouched. The specialized createDocument()

constructs a DOM tree which can hold another

DOM tree. The first is for the derived document,

and the other is for the track document. The track

document is attached by the setDerivationTrack()

method. And finally, when the document is saved,

both documents are linked by adding the <?xde-

rive-derivationTrack . . .?> processing instructionto the derived document. This is achieved by the

writeNode() method (in the XDADocument class).

Fig. 9 depicts an interaction diagram that re-

flects the ‘‘derivation aware’’ parser at work:

� Firstly, the clientApplication configures the Doc-

umentBuilderFactory to create a ‘‘derivation

aware’’ XML DOM. To this end, the applica-tion creates a new instance of XDADocument-

BuilderFactory and sets the corresponding

configuration parameter, namely, sets XDAN-

odeFactory as the value of the parameter

NODE_FACTORY (using the setAttribute()

method) and assigns ‘‘true’’ to the DerivationA-

ware parameter (using the setDerivationAware()

method). After this, an instance of XDADocu-

mentBuilder is created which in turn, creates a

new instance of a DOMParser.

� Secondly, once the parser is in place, the

parse() method of XDADocumentBuilder is

invoked to initialize the parsing process of the

given document. This requirement is passed

over to the DOMParser. The DOMParser cre-

ates both the document and the elements, whichare appended to the document.

� Next, the XDADocumentBuilder retrieves the

created document. If this document has an

attached schema then, a new instance of XDA-

DerivationEngine is created. The derivation

engine loads the schema and compiles the deriv-

ing functions into an XSLT document (using

the load() method).� If there exists derivation rules and this docu-

ment has not been parsed yet (i.e. no track doc-

ument is attached) then, the deriveAllElements()

Fig. 9. A ‘‘derivation aware’’ parser at work.

280 O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282

method is invoked. This method both calculates

and appends all derived elements. Then, the

XDADerivationEngine passes the track docu-

ment to the XDADocument instance.

� Finally, the application saves the parsed

document by invoking the writeNode(aURL)

method. This method attaches the derivation-

Track document to the derived documentby appending the processing instruction

<?xderive-derivationTrack. . .?>. This instruc-

tion holds the URL where the track document

is stored (see Fig. 1). Finally, writeNode saves

both documents in the given URL.

7. Conclusions

This work provides some insights on how XML

Schema can be extended with derived elements. To

this end, two issues have been discussed: deriving

function specification and the externality of deriv-

ing data. The former has been addressed by intro-

ducing the XDerive vocabulary, a rule-based

approach to the definition of the deriving function.

As for the problems posed by the externality of

deriving data, this work proposes the existence of

a track document which records the state and busi-

ness policies used at the time the transaction

occurs.

This work also provides one of the first exam-ples of the benefits of the JAXP architecture.

Due to its modularity and openness, one of the fol-

lowers of JAXP, Oracle�s parser, has been ex-

tended to become ‘‘derivation aware’’.

Acknowledgements

This research was supported by the Secretarıa

de Estado de Polıtica Cientıfica y Tecnologica of

the Spanish Government under contract TIC

1999-1048-C02-02. Felipe I. Anfurrutia enjoys a

pre-doctoral grant by the University of the Basque

Country.

O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282 281

Appendix A. The XML Schema document of the XDerive namespace

282 O. Dıaz, F.I. Anfurrutia / Electronic Commerce Research and Applications 4 (2005) 264–282

References

[1] E. Comission, IDA e-procurement protocol XML schemas

initiative. Available from: http://europa.eu.int/ISPO/ida/

export/files/en/1996.pdf, 2004.

[2] B. Grosof, Y. Labrou, H. Chan, A declarative approach to

business rules in contracts: bourteous logic programs in

XML, in: Proceedings of the First ACM Conference on

Electronic Commerce (EC99), 1999, pp. 68–77.

[3] W3C, XML Schema Part 1:Structures. Available from:

http://www.w3.org/TR/xmlschema-1/, 2001.

[4] W3C, XML Schema Part 2:Datatypes. Available from:

http://www.w3.org/TR/xmlschema-2/, 2001.

[5] P. Marinelli, C.S. Coen, F. Vitali, SchemaPath, a

minimal extension to XML schema for conditional

constraints, in: Proceedings of the 13th International

World Wide Web Conference, New York, USA, 2004,

pp. 164–174.

[6] R. Jelliffe, A.S.C. Centre, The Schematron: An XML

Structure Validation Language using Patterns in Trees.

Available from: http://www.ascc.net/xml/resource/schema-

tron/schematron.html, 2001.

[7] F. Ibanez, O. Dıaz, J.J. Rodrıguez, Extending xml schema

with derived elements, in: IFIP WG8.1 Working Confer-

ence on Engineering Information Systems in the Internet

Context, 2002, pp. 53–67.

[8] Y. Ioannidis, T. Sellis, Supporting incosistent rules in

database systems, Journal of Intelligent Information Sys-

tems (1) (1992) 243–270.

[9] R. Hull, et al., Declarative workflows that support easy

modification and dynamic browsing, in: Proceedings ACM

International Joint Conference on Work Activities Coor-

dination and Collaboration (WACC�99), 1999, pp. 69–78.[10] T. Connolly, C. Begg, Database Systems, Addison Wesley,

Reading, MA, 1998.

[11] I.A. Chen, D. McLeod, Derived data update in semantic

databases, in: Proceedings of the 15th International Con-

ference on Very Large Data Bases, Amsterdam, The

Netherlands, 1989, pp. 225–235.

[12] D. Hirtle, H. Boley, C. Damassio, B. Grosof, S. Tabet, G.

Wagner, Specification of RuleML 0.86. Available from:

http://www.ruleml.org/0.86/, 2004.

[13] S.M. Inc., JavaTM API for XML Processing (JAXP) 1.2.

Available from: http://java.sun.com/xml/jaxp/.

[14] T.A.S. Foundation, Xerces2 Java Parser 2.0.1 Release.

Available from: http://xml.apache.org/xerces2-j/index.html,

2001.

[15] O. Corporation, The Oracles XML Parser for Java.

Available from: http://otn.oracle.com/tech/xml/xdk_java/

content.html, 2002.

[16] E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design

Patterns, Addison-Wesley, Reading, MA, 1995.