The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS

24
The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS Gianpaolo Cugola, Elisabetta Di Nitto, and Alfonso Fuggetta Abstract—The development of complex distributed systems demands for the creation of suitable architectural styles (or paradigms) and related runtime infrastructures. An emerging style that is receiving increasing attention is based on the notion of event. In an event- based architecture, distributed software components interact by generating and consuming events. An event is the occurrence of some state change in a component of a software system, made visible to the external world. The occurrence of an event in a component is asynchronously notified to any other component that has declared some interest in it. This paradigm (usually called “publish/ subscribe,” from the names of the two basic operations that regulate the communication) holds the promise of supporting a flexible and effective interaction among highly reconfigurable, distributed software components. In the past two years, we have developed an object-oriented infrastructure called JEDI (Java Event-Based Distributed Infrastructure). JEDI supports the development and operation of event-based systems and has been used to implement a significant example of distributed system, namely, the OPSS workflow management system (WFMS). The paper illustrates the main features of JEDI and how we have used them to implement OPSS. Moreover, the paper provides an initial evaluation of our experiences in using the event-based architectural style and a classification of some of the event-based infrastructures presented in the literature. Index Terms—Event-based systems, distributed systems, software architectures, workflow, business processes, object-orientation, publish/subscribe middleware. æ 1 INTRODUCTION C ONVERGENCE between telecommunication, broadcasting, and computing is opening new opportunities and challenges for a potentially large market of innovative network-wide services. The class of users interested by this revolution is significantly large: families, professionals, large organizations, government agencies, and administra- tions. The services range from home banking and electronic commerce, to coordination and workflow support for large dispersed teams, within the same company or even across multiple companies. Many research and industrial activities are currently being carried out to identify feasible strategies to develop and operate these services in an effective and economically viable way. The requirements and technical problems that have to be addressed are complex and critical: . Services must be able to operate on a wide area network with acceptable performance. . The software technology used to implement these services must be “light,” i.e., it should be scalable in terms of the number of both components and users involved and of their distribution. . The technology must enable a “plug and play” approach to support dynamic reconfiguration and introduction of new service components. . Finally, it is essential to support openness and interoperability between different platforms since the services are usually implemented in a hetero- geneous hardware infrastructure. To foster the diffusion of network-wide applications, we need to identify proper architectural styles and supporting infrastructures able to cope with the above requirements and challenges. Actually, there is a wide range of distributed architectural styles and middleware infrastruc- tures that have purposely been conceived to address the above issues. Most of these existing styles and infrastruc- tures are based on a point-to-point communication model. For instance, the basic service offered by CORBA [36], RMI [51], and DCOM [20] is the synchronous invocation of a remote service offered by some server over the network. The wide diffusion of the point-to-point communication model has been fostered by the availability of RPC, which is certainly an effective mechanism to implement a wide range of distributed systems. RPC is characterized by a tight conceptual coupling between the component that requests a service (i.e., the client) and the component that satisfies such request (i.e., the server). Before invoking a service, the client has to know the existence of a server capable of satisfying its request and it has to obtain a reference to such server. Even extensions and new facilities of advanced middleware infrastructures such as CORBA Naming Service [37] and CORBA Dynamic Invocation Interface do not depart significantly from the underlying RPC paradigm. Despite the effectiveness and conceptual simplicity of the point-to-point communication model, many situations require the availability of a more decoupled model. In particular, the communication among the components of a IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001 827 . The authors are with the Politecnico di Milano, Department of Electronics and Information, Piazza Leonardo da Vinci, 3220133 Milano, Italy. E-mail: {cugola, dinitto, fuggetta}@elet.polimi.it. Manuscript received 21 Sept. 1998; revised 26 Apr. 1999; accepted 7 Apr. 2000. Recommended for acceptance by M. Jazayeri. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number 107430. 0098-5589/01/$10.00 ß 2001 IEEE

Transcript of The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS

The JEDI Event-Based Infrastructure andIts Application to the Development of

the OPSS WFMSGianpaolo Cugola, Elisabetta Di Nitto, and Alfonso Fuggetta

AbstractÐThe development of complex distributed systems demands for the creation of suitable architectural styles (or paradigms)

and related runtime infrastructures. An emerging style that is receiving increasing attention is based on the notion of event. In an event-

based architecture, distributed software components interact by generating and consuming events. An event is the occurrence of some

state change in a component of a software system, made visible to the external world. The occurrence of an event in a component is

asynchronously notified to any other component that has declared some interest in it. This paradigm (usually called ªpublish/

subscribe,º from the names of the two basic operations that regulate the communication) holds the promise of supporting a flexible and

effective interaction among highly reconfigurable, distributed software components. In the past two years, we have developed an

object-oriented infrastructure called JEDI (Java Event-Based Distributed Infrastructure). JEDI supports the development and operation

of event-based systems and has been used to implement a significant example of distributed system, namely, the OPSS workflow

management system (WFMS). The paper illustrates the main features of JEDI and how we have used them to implement OPSS.

Moreover, the paper provides an initial evaluation of our experiences in using the event-based architectural style and a classification of

some of the event-based infrastructures presented in the literature.

Index TermsÐEvent-based systems, distributed systems, software architectures, workflow, business processes, object-orientation,

publish/subscribe middleware.

æ

1 INTRODUCTION

CONVERGENCE between telecommunication, broadcasting,and computing is opening new opportunities and

challenges for a potentially large market of innovativenetwork-wide services. The class of users interested by thisrevolution is significantly large: families, professionals,large organizations, government agencies, and administra-tions. The services range from home banking and electroniccommerce, to coordination and workflow support for largedispersed teams, within the same company or even acrossmultiple companies. Many research and industrial activitiesare currently being carried out to identify feasible strategiesto develop and operate these services in an effective andeconomically viable way. The requirements and technicalproblems that have to be addressed are complex and critical:

. Services must be able to operate on a wide areanetwork with acceptable performance.

. The software technology used to implement theseservices must be ªlight,º i.e., it should be scalable interms of the number of both components and usersinvolved and of their distribution.

. The technology must enable a ªplug and playºapproach to support dynamic reconfiguration andintroduction of new service components.

. Finally, it is essential to support openness andinteroperability between different platforms sincethe services are usually implemented in a hetero-geneous hardware infrastructure.

To foster the diffusion of network-wide applications, weneed to identify proper architectural styles and supportinginfrastructures able to cope with the above requirementsand challenges. Actually, there is a wide range ofdistributed architectural styles and middleware infrastruc-tures that have purposely been conceived to address theabove issues. Most of these existing styles and infrastruc-tures are based on a point-to-point communication model.For instance, the basic service offered by CORBA [36], RMI[51], and DCOM [20] is the synchronous invocation of aremote service offered by some server over the network.The wide diffusion of the point-to-point communicationmodel has been fostered by the availability of RPC, which iscertainly an effective mechanism to implement a widerange of distributed systems. RPC is characterized by a tightconceptual coupling between the component that requests aservice (i.e., the client) and the component that satisfies suchrequest (i.e., the server). Before invoking a service, the clienthas to know the existence of a server capable of satisfyingits request and it has to obtain a reference to such server.Even extensions and new facilities of advanced middlewareinfrastructures such as CORBA Naming Service [37] andCORBA Dynamic Invocation Interface do not departsignificantly from the underlying RPC paradigm.

Despite the effectiveness and conceptual simplicity of thepoint-to-point communication model, many situationsrequire the availability of a more decoupled model. Inparticular, the communication among the components of a

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001 827

. The authors are with the Politecnico di Milano, Department of Electronicsand Information, Piazza Leonardo da Vinci, 3220133 Milano, Italy.E-mail: {cugola, dinitto, fuggetta}@elet.polimi.it.

Manuscript received 21 Sept. 1998; revised 26 Apr. 1999; accepted 7 Apr.2000.Recommended for acceptance by M. Jazayeri.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 107430.

0098-5589/01/$10.00 ß 2001 IEEE

distributed system may involve more than two parties, andmay be driven by the contents of the information beingexchanged rather than by the identity of informationproducers and consumers. As an example, let us considera network management system. In this system, whenever anetwork node signals a failure, a procedure has to be startedto fix the failure. By using an event-based approach, thenode is simply required to notify the ªexternal worldº of thedetected failure and can therefore ignore how the failurewill be handled. The ªexternal worldº might be constitutedby a single application placed at a fixed location on the netin charge of executing the complete recovery procedure.Alternatively, it can be composed of different applicationsdynamically dispersed across the network and in charge ofdifferent steps of the recovery procedure (e.g., logging thefailure, reconfiguring a subsystem, etc.). As another exam-ple, consider a distributed workflow management system,where, as soon as an activity A terminates, other activitiesA1,...,An have to be launched. In this case, it is useful tohave a mechanism that hides the existence of A1,...,An,to A and allows A to simply notify the ªexternal worldº ofits termination. The effect of this notification is hidden to A,thus increasing information hiding and reducing thecoupling among activities. The two scenarios presentedabove are not unique as for their communication require-ments. In [4], other scenarios that will likely emerge in thenext future are presented.

A promising approach to address the above issue is theevent-based paradigm. The components of an event-basedsystem cooperate by sending and receiving events, aparticular form of messages. The sender delivers an eventto an event dispatcher. The event dispatcher is in charge ofdistributing the event to all the components that havedeclared their interest in receiving it. Thus, the eventdispatcher supports a high degree of decoupling betweenthe sources and the recipients of an event.

The relevance and potential impact of the event-basedparadigm has been acknowledged by OMG that has recentlydefined an event service on top of the CORBA framework(see Section 5). Nevertheless, we are still far from asatisfactory solution able to address in a coherent andcomprehensive way all the issues and problems related tothe creation of an effective, network-wide event distributioninfrastructure [45]. This observation can be easily verified bychecking the large number of initiatives being launched inthe area. Several new draft proposals have been submittedto the IETF (Internet Engineering Task Force). Furthermore,the event-based paradigm has been the focus of the firstworkshop of the series TWIST (The Workshop on Internet-scale Software Technologies) [60]. The workshop hasgathered together researchers from leading software in-dustries and from the academia to compare existingapproaches and steer future research work on the topic.

As a contribution to the ongoing research work, we havedeveloped an event-based, object-oriented infrastructure,called JEDI, (Java Event-based Distributed Infrastructure)that has been applied, among the others, to the develop-ment of a WorkFlow Management System (WFMS), calledOPSS (ORCHESTRA Process Support System).1 A WFMS[3], [23] is an environment for developing and executing a

process-based application, i.e., a coordinated set of activitiesinvolving both humans and computerized tools. Typical exam-ples of the activities supported by a WFMS are businessservices, such as customer care, interoffice procedures, andsoftware development processes.

This paper presents JEDI and OPSS by highlighting theirmain features and functionality. It also illustrates somelessons we have derived from the development andoperation of JEDI. This paper significantly extends apreviously published paper [15] by providing more detailson the design choices that guided the development of bothJEDI and OPSS, and by introducing new features that werenot presented in the previous paper. It also significantlyenriches the analysis of the state of the art and thecomparison and evaluation of the related work. Thecontributions of this paper can be summarized as follows:

. It describes JEDI, an event-based infrastructuresuitable to develop a wide range of distributedsystems.

. It introduces OPSS and discusses the OPSS featuresthat mostly benefit from the adoption of an event-based communication infrastructure.

. It presents our experiences in using the event-basedparadigm and provides a comprehensive compar-ison of our work with the state of the art in the field.

Consequently, the paper is organized as follows: Section 2presents JEDI basic concepts and implementation. Section 3provides an overview of the architecture of OPSS. Section 4provides an evaluation of our experience. Section 5 presentsthe related works. Finally, Section 6 draws some conclu-sions and proposes future research activities.

2 JEDI: A JAVA EVENT-BASED DISTRIBUTED

INFRASTRUCTURE

2.1 High-Level Architecture of JEDI

Fig. 1 describes the logical architecture of JEDI. Theinfrastructure is based on the notion of active object2 (AO).An AO is an autonomous computational unit performing anapplication-specific task. Each active object has its ownthread of control and interacts with other AOs by explicitlyproducing and consuming events.3 Events are a particular

828 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

Fig. 1. A logical view of JEDI architecture.

1. OPSS has been developed as part of the ORCHESTRA project [34].

2. We have not used the term ªcomponentº since it is heavily overloadedand could have induced some confusion.

3. In this paper, we often use the term ªeventº with the meaning ofªevent notification.º We believe that the precise interpretation of the termcan be easily derived from the context.

type of message. Conventional messages are sent from asource to one or more recipients, as specified by the sourceitself. Conversely, events do not include any informationabout their recipients. A JEDI event is an ordered set ofstrings: The first string is the event name, while theremaining strings are the values of the event parameters.4 Itwas a deliberate choice to keep the structure of JEDI eventsquite simple. We will discuss this choice in Section 4.6 andSection 4.7.

An event is generated by an AO and sent to a componentcalled the event dispatcher (ED). The ED notifies the event tothose AOs that have explicitly declared their interest inreceiving it (event recipients). An AO declares the classes ofevents it is interested in by invoking an event subscriptionoperation. It can also stop accepting events of a given classby invoking the unsubscribe operation. Event subscriptionand unsubscription can be invoked at any time during theAO lifetime. The notification of events is accomplishedasynchronously with respect to their generation.

2.2 Main Features of JEDI

2.2.1 Event Patterns

AOs can either subscribe to a specific event or to an eventpattern. An event pattern is an ordered set of stringsrepresenting a very simple form of regular expression.The first string of the pattern is the pattern name, while theothers are the pattern parameters. Each string of the patternmay end with an asterisk. Given a pattern p, an event ematches the pattern iff the following conditions hold:

. The name of e is equal to the name of p, if the latterdoes not end with an asterisk. Conversely, if thename of p ends with an asterisk then the name of emust start with the same characters of the name of p(excluding the asterisk). In other words, the asteriskhas the same semantics adopted by the Unix andDOS shells.

. e and p have the same number of parameters.

. Each parameter of e matches the parameter of p

having the same position using the asterisk semanticused for event names. This means that for each i, letbe ei the ith parameters of event e and pi the ithparameter of pattern p, then either ei equals pi (if pidoes not end with an asterisk) or ei starts with thesame characters of pi, excluding the asterisk.

For instance, pattern foo(aa*, bb) matches with all theevents whose name is foo and having two parameters, thevalue of the first one starting with ªaaº and the value of thesecond one being exactly ªbb.º Another example of patternis the following: *(*, *, *). This pattern matches with all theevents having three parameters, regardless of their namesand the values of their parameters.

2.2.2 Reactive Objects

An active object can invoke the basic operations offered byJEDI (e.g., event generation and subscription) in any order.According to our experience, however, some active objects

often operate according to a quite standard sequence ofoperations. Upon activation, an AO subscribes to someevents and then starts waiting for their occurrence. Whenone of these events is notified, the AO performs someoperations (possibly generating new events and subscribingor unsubscribing to events) and then starts waiting again.Therefore, it executes a standard loop: to wait for any eventamong those it has subscribed to and then process it. Weuse the term reactive object to indicate this particular kind ofactive object.

The JEDI framework provides programmers with stan-dard classes supporting the implementation of both activeand reactive objects (see Section 2.4). The JEDI class used toimplement reactive objects (i.e., the ReactiveObject

class) exports an abstract method (called processMes-

sage) that is automatically invoked each time the reactiveobject has to be notified of an event it has subscribed to.Programmers who want to implement a reactive objectshould subclass the ReactiveObject class and imple-ment the processMessage method.

2.2.3 Distribution of the Event Dispatcher

The event dispatcher is a logically centralized component sinceit must have a global knowledge of all the events that aregenerated and of all the subscription requests that areissued. However, a centralized implementation of the eventdispatcher can become a critical bottleneck for a distributedsystem. This happens, in particular, when the system iscomposed of several Internet-wide distributed AOs that areengaged in an intense communication. In this situation, it isworthwhile to decompose the event dispatcher in severaldistributed and cooperating components, in order toguarantee an acceptable level of performance. This decom-position, however, requires some coordination protocol tobe defined among the event dispatcher components. They,in fact, need to share information about generated eventsand subscriptions in order to guarantee that agentsconnected to different event dispatcher components com-municate properly. Such coordination protocol has to becarefully designed in order to limit the network loadgenerated by the intradispatcher coordination activity. Insome cases, in fact, it could happen that this coordinationtraffic grows more than the traffic generated by AOs, thusresulting in undesired and unacceptable performancedegradation.

In JEDI, we provide two implementations of the eventdispatcher: centralized and distributed. The centralizedversion is constituted by a single (operating system) processand has been developed to address the requirements ofsimple systems, composed of few AOs, running over a localarea network, and exchanging a limited number of events.The distributed version addresses the need of ªnetwork-intensiveº applications by exploiting a set of dispatchingservers (DSs) interconnected in a tree structure. Each DS islocated on a different node of the network and is connectedto one parent DS (except for the root DS) and to zero or moredescendant DSs. Each AO is connected to a DS (notnecessarily to the leaves of the tree).

There are several strategies that can be exploited todistribute events across the hierarchy of DSs (see Fig. 2).Two key issues have to be considered in defining such

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 829

4. In the remainder of the paper, an event will be represented using anotation similar to function calls in traditional programming languages,e.g., open(foo.c,read), where open is the name of the event and foo.c

and read are its parameters

strategies: handling of subscription and unsubscription

requests, and distribution of events. A first strategy (called

local subscription) exploits a very simple approach. Each

subscription request is recorded locally by the DS that has

received it from the issuing AO. When an event is generated,

it is distributed to all the DSs in the tree and each DS decides

autonomously to which AOs the event has to be sent. A

somewhat dual strategy (called distributed subscription) is

based on a radically different approach: subscriptions are

distributed to all the DSs in the tree. In particular, the DS

that has received the subscription from the issuing

AO registers itself to its parent and descendants, which, in

turn, register to their parent and descendants (with the

exclusion of the DS the subscription comes from), etc. In the

distributed version of the JEDI ED, we are exploiting anintermediate solution that we call hierarchical subscriptionstrategy. In this strategy, subscriptions are propagated onlyupwards in the DS tree. So that only the ancestors of the DSthat has accepted the event subscription request from an AOwill eventually receive it. Consequently, when a DS receivesan event from one of the objects that are connected to it(either an AO or another DS), it dispatches the event to thefollowing entities:

1. its parent, if this is not the one that has propagatedthe current event;

2. the subset of its descendants that are subscribed toan event pattern that matches the received event;

3. the AOs that are directly connected to the DS andthat are subscribed to an event pattern that matchesthe received event.

One may argue that JEDI events are dispatchedupward to the top of the ED hierarchy even if, inprinciple, this might turn out to be unnecessary. Forinstance, the event shown in the example of Fig. 2(bottom diagram) is generated and has to be received bythe AOs attached to the same subtree. Nevertheless, it hasto be propagated up to the top of the hierarchy since insome other subtrees a subscription matching the eventcould have been issued and this is unknown to theintermediate DSs handling the event.

The other strategies, however, have their own advan-tages and disadvantages too. For instance, the distributedsubscription strategy allows events to be distributedthrough the minimal path since each DS is able to buildthe path that events have to follow to reach all the partiesinterested in receiving them (see [7] for the presentation ofan optimized algorithm to calculate this path). Thisapproach, however, has the disadvantage of requiring apotentially high number of messages to be exchanged eachtime a new subscription (or unsubscription) request isissued. Therefore, this strategy is effective when the numberof events is sensibly larger than the number of subscriptionsand unsubscriptions. The approach adopted in JEDIrepresents a reasonable compromise that is expected tooperate satisfactorily in a variety of situations. Colleagues atthe University of Colorado at Boulder and the University ofCalifornia, Irvine, are conducting a detailed and quantita-tive analysis of possible alternative strategies [45].

As a concluding remark, notice that in JEDI, AOsbehavior is not influenced by the implementation strategychosen for the ED. The decision of exploiting the centralizedor the distributed version only affects the overall perfor-mance of the system, but it does not have any influence onthe way AOs are implemented.

2.2.4 Preservation of Event Ordering

In general, in a distributed system, a crucial issue is toestablish a relationship between the order according towhich messages are generated and the order in which theyare received. Actually, none of the communicationmechanisms traditionally used over the Internet guaranteesa total ordering of messages since latency is extremelyvariable. With RPC or RMI, for instance, two clients mighthave invoked the same remote method in an order that is

830 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

Fig. 2. Alternative strategies for distributed event dispatching.

different from the one seen by the component that receivesthe method invocations and implements the correspondingmethod. Consider for instance two clients, A and B, thatinvoked a method M at time t1 and t2, respectively, beingt1 < t2. Let's call d1 and d2 the time needed to deliver themethod invocation request to the machine storing themethod code (i.e., the server). Due to the variable latency ofthe network, it may happen that d1 > (t2 - t1 + d2). If this isthe case, the server observes two method invocationrequests in an order that is not consistent with the originalordering of the clients' operations. In most cases, devel-opers live with this problem since its solution wouldrequire the implementation of a global clock that would bereally expensive to manage. Also, most distributed systemsare built around a centralization element that naturallyintroduces a serialization of the messages it receives, thusdefining a total ordering that is not necessarily the actualorder in which messages are generated, but that can stillconsidered acceptable.

The ordering problem, however, becomes critical when adistributed event dispatcher comes into play. In this case,two events coming from different sources can be deliveredto some recipients without passing through the sameserialization point. This means that not only the order inwhich events are generated is not the same order in whichthey are received by one recipient, but also that differentactive objects can receive these events in a different order.The order of event delivery can be guaranteed only if eventsare tagged with a timestamp, a global clock among all theDSs is assumed, and the communication network providesa guaranteed fixed latency time. The problem with such anapproach is that it is not suited to wide-area networks,which have a largely variable latency time. All theseproblems are considered and discussed in detail in [35].

As discussed in Section 1, one of the main goals of JEDI isto support the development of distributed applicationscomposed of a large number of components, and thiscontrasts with the assumption of having a global clock.Therefore, we have chosen to guarantee only a particularform of partial ordering among events, i.e., causal ordering[30]. Events e1 and e2 are delivered according to a causalordering policy when if event e1 has been caused by thegeneration of event e2, then any AO registered to receiveboth e1 and e2 will receive e1 after e2 and not vice versa.

A special case of causality is the relationship amongevents generated by the same AO. Thus, JEDI ensures thatthe events generated by a given source are delivered to allthe interested recipients in the order they have beenpublished. While ordering among causally related eventsis guaranteed, JEDI users should not make any assumptionon the ordering of events not related by causality.

As a final remark, our experience in implementingseveral event-based applications, including OPSS, hasshown that causal ordering guarantees that pairs ofcomponents can synchronize through the generation ofevents. Therefore, we argue that our choice of guaranteeingonly causal ordering is acceptable compared to what isoffered by most applications where the order in whichevents (or messages) from different sources are receiveddoes not provide a trustable indication of the order in whichthey have been generated.

2.2.5 Mobility

The ability to move running application componentsacross the nodes of a network is currently a hot topic insoftware engineering research [22]. Mobility can be usedto reduce network traffic since applications can be moved(or can autonomously move) close to the resources theyneed for their execution. Indeed, it can be used toimplement applications whose graphical front-ends (andtheir state) follow nomadic users during their migrationfrom site to site.

Mobility is fairly orthogonal to the event-based para-digm. At the same time, our experience in using JEDIconvinced us that the event-based communication para-digm is particularly suited to support communicationamong mobile components (some basic form of eventcommunication is also provided by the Aglets mobilityplatform [31]). In fact, the decoupling among the compo-nents introduced by the event dispatcher allows eachcomponent to operate independently of the physicallocation of the other components.

Supporting mobile AOs imposes specific requirementson event-based infrastructures. In particular, if an AO canmove, it is natural to require that, while it is moving fromone place to another, the event-based infrastructure storesthe subscriptions it has issued and the events that aregenerated in the meanwhile and that match the subscrip-tions. JEDI offers two operations to handle mobility ofactive objects: moveOut and moveIn. By invoking themoveOut operation, an AO is able to temporarily discon-nect from the event dispatcher. Through the moveIn

operation, the AO can reconnect to the dispatcher at a latertime. While the AO is disconnected, the event dispatcherstores the event patterns the AO is subscribed to, so that,when it reconnects, it does not have to resubscribe.Moreover, at moveOut time, the AO can request the eventdispatcher to store all the events it has subscribed to for thetime it will be disconnected. When the AO reconnects, itreceives all the events generated during the disconnectionthat match its subscriptions. The event dispatcher deliversthese events according to the causal ordering rule presentedin Section 2.2.4. When the event dispatcher is distributed,the AO can either reconnect to the dispatching server it wasinitially connected to or it can connect to another dispatch-ing server. In this last case, the new dispatching serverengages a direct communication with the old dispatchingserver in order to obtain information about all thesubscriptions issued by the AO and all the events that havebeen buffered on behalf of it. Moreover, the new dispatch-ing server communicates with its parent dispatching serverto notify that all the new events addressed to the AO haveto be routed through a new path.

The moveOut and moveIn operations enable theexploitation of JEDI in conjunction with frameworks forbuilding mobile agents, such as Aglets [31] and �Code [42].Therefore, it is possible to implement generic mobile activeobjects (agents in the mobility community) that interactthough events. In this case, the code mobility environmenthandles the state of the moving AOs, while JEDI deals withenqueueing and redistributing events on behalf of tem-porarily disconnected active objects. We are currentlyexperiencing the integrated usage of JEDI and �Code [41].

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 831

Event-based infrastructures themselves can be profitably

enriched with some basic mobility features. These features

could be exploited by programmers who do not need all the

features (security, management of remote resources, nam-

ing services, etc.) provided by a full fledged mobility

framework, but can still take advantage from moving

components from one site to the other. To address this

need, JEDI offers a specific mechanism to move reactive

objects. In JEDI, a reactive object can move to a different

host by invoking the move operation, which causes the

following actions to occur:

1. The reactive object is temporarily disconnected fromthe ED (i.e., the moveOut operation is invoked) andthe thread of control executing the reactive objectmain loop is stopped.

2. The state of the reactive object (i.e., the value of itsattributes) is serialized and stored using standardJava facilities [50].

3. The reactive object is moved to the new locationthrough a network connection. At the destinationhost, the reactive object is restarted and it isreconnected to the ED (i.e., the moveIn operationis invoked).

2.3 Summary of the JEDI Operations and Features

In summary, JEDI offers the following set of operations that

can be invoked by any AO:

. open. It opens a connection with the event dis-patcher. This is the first operation that any activeobject has to invoke.

. close. It closes the connection with the eventdispatcher.

. subscribe. It subscribes the issuing AO to the setof events that matches a given event pattern.

. unsubscribe. It removes an existing subscription.

. dispatch. It allows AO to generate an eventnotification.

. getEvent. It retrieves the first event addressed to theAO from the queue of events associated to the AO.

. hasEvents. It checks if the queue of eventsassociated to the AO contains any event.

. moveOut. It is used to temporarily disconnect fromthe event dispatcher.

. moveIn. It is used to reconnect after a moveOut.

. move. It is used by reactive objects to move toanother location.

The event-based communication style used in JEDI is

characterized by the following properties:

. It is asynchronous.

. Only the subscribers of an event will receive it.

. The source of a communication does not specify thedestination of the communication.

. The destination of a communication does notnecessarily know the identity of the source.

. Events are guaranteed to be received according tothe causal relationships that hold among them.This property is guaranteed even in presence ofmobile AOs.

. An AO can disconnect from the dispatching server itis connected to and reconnect at a later time from adifferent host to a different dispatching server. JEDIstores the AO's subscriptions and, if required, theevents addressed to the AO while it is disconnected.

. Reactive objects are provided with a special opera-tion that allows them to autonomously move fromone host to another without loosing any of the eventsthey have subscribed to.

2.4 The Implementation of JEDI

JEDI has been implemented as a framework of Java classes.

The framework includes the event dispatcher and the

classes needed to develop active and reactive objects

(organized as two Java packages). Package polimi.jedi

contains the classes needed to implement active and

reactive objects. Package polimi.jedi.dispatcher,

includes the classes that implement the event dispatcher.

Fig. 3 and Fig. 4 describe the UML logical design of the two

packages.Each active object communicates with the event dis-

patcher through the methods offered by interface

ConnectionToED (Fig. 3). This interface includes all the

operations listed in Section 2.3. It hides the implementation

details of the communication between the AO and the

event dispatcher. By taking advantage of this design

832 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

Fig. 3. Package polimi.jedi.

Fig. 4. The event dispatcher (package polimi.jedi.dispatcher).

choice, it is possible to change the implementation of theED (e.g., to move from the centralized to the distributedED) without impacting on existing AOs. Currently, theinfrastructure provides two implementations for interfaceConnectionToED through classes RMIConnectionToED

and SocketConnectionToED. The former uses RMI toconnect to the event dispatcher, while the latter usesstandard TCP/IP sockets. An ad hoc, eight bit protocol hasbeen developed to send and receive events and eventsubscriptions over plain sockets. Plain TCP/IP commu-nication can be used to implement components that have torun in an environment that does not support RMI (e.g., aJava 1.0 virtual machine). Furthermore, TCP/IP connec-tions allow non-Java active objects to exploit the features ofthe JEDI event dispatcher.

JEDI provides an abstract class called ReactiveObject

to implement reactive objects. Users may easily implementnew reactive objects by creating subclasses of ReactiveObject. These subclasses have to provide a suitableimplementation for the abstract method processMessage.This method is called each time a new event is received. Eachreactive object uses a RMIConnectionToED instance tocommunicate with the event dispatcher.

Fig. 4 illustrates the most important Java classes used toimplement the event dispatcher (package polimi.jedi.

dispatcher) and their relationships. An instance of classEventQueue stores the queue of events that have beenreceived and not yet dispatched, while an instance of classRegister contains all the subscriptions that have beenreceived by the ED. RMIBasedED is the main class. EachRMIBasedED instance constitutes what we called a dis-patching server in Section 2.2.3. The relation betweenRMIBasedEDs shown in Fig. 4 models the connectionamong different dispatching servers to create a distributedevent dispatcher. Each RMIBasedED instance is anRMI server that exports the services used to publish,receive, subscribe to, and unsubscribe from events. More-over, each RMIBasedED instance acts also as a TCP/IP daemon, waiting for TCP/IP connections from the AOsthat are interested in using TCP/IP to communicate with theevent dispatcher. Each time a new TCP/IP connection isopened, a CommunicationThread instance is created tomanage such connection.

As discussed in more detail in Section 5.1, a keydistinctive feature of an event-based infrastructure is theset of mechanisms used to observe and notify events. Thereare basically two approaches: push and pull. In a pushapproach, events are pushed from the source to the eventdispatcher (observation) or from the event dispatcher to therecipient (notification). The pull model assumes that it is theevent dispatcher that ªpullsº events from the source(observation) or the recipient that ªpullsº them from theevent dispatcher (notification). In JEDI, event observation isalways based on a push approach, i.e., it is always theproducer that contacts the event dispatcher to deliver anevent. Conversely, event notification can be accomplishedusing either a pull or a push approach. Indeed, JEDI activeobjects exploit a pull approach, while reactive objects arebased on a push behavior.

3 OPSS

WFMSs (WorkFlow Management Systems) support humanbeings in the execution of processes (also called workflows).There are several examples of processes in many domainsof our society, ranging from traditional accounting andIS processes to engineering processes such as those used todevelop software. Usually, WFMSs exploit some form ofprocess model, i.e., a formal description of the steps to becarried out to pursue the business objective. The corecomponent of a WFMS is the process engine. It enacts theprocess model and, by doing so, it guides and supportshuman actors in the accomplishment of their activities. Italso guarantees that the actions performed by human actorsare coherent with the constraints and properties mandatedby the process model. Finally, it automates the execution ofrepetitive tasks. Process engines usually exploit a databasethat persistently stores the current state of the process.Another important element of WFMSs is the user interactionenvironment. It is usually composed of a number of toolssuch as an agenda, a mailing client, and other specializedelements. These tools allow human actors to be notified oftheir assignments and to proactively perform actions thatpush forward the state of the process.

OPSS (ORCHESTRA Process Support System) is a WFMSthat has been developed as part of the ORCHESTRA project.ORCHESTRA (Open aRCHitecture for supporting En-hanced Services in inTegRAted broadband networks) is aretailing infrastructure supporting the development, de-ployment, and operation of multimedia services [34], [17]. Itallows users distributed over a wide-area network totransparently access services from several types of term-inals. It also supports nomadic users: They can access theORCHESTRA environment without being constrained bytheir physical location. In ORCHESTRA, services can bedistributed and replicated across the network, dependingon load balancing needs. Users do not need to be aware ofsuch distribution and replication, since ORCHESTRA is incharge of locating and executing services on their behalf.

Within the ORCHESTRA context, OPSS has beenspecifically conceived to support the design and operationof sophisticated process-based services. Examples of suchservices are electronic commerce, customer care, and remoteeducation. We call them business services. Basic requirementsfor these services can be summarized as follows:

. Services have to be scalable with respect to thedistribution of the involved users and operators.

. Services have to cope with a number of users thatchanges dynamically. Depending on particularcircumstances, the number of customers can varyfrom tenths to thousands of people. Services must beable to cope with this variation. This is quite unusualin traditional process-based activities where thenumber and the identity of actors are quite stable.

. The user interaction environment provided byservices has to be dynamically deployed onto thecustomer terminals. In ORCHESTRA, in fact, wemake the assumption that each service is allowed toinstall on the user terminal all the componentsneeded to support service fruition.

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 833

To address these requirements, we implemented OPSS ontop of the JEDI framework. In the remainder of this section,we present the main characteristics of OPSS and itsarchitecture. (See Fig. 5.)

3.1 The Architecture of OPSS

In OPSS, the activities that constitute a process can beexecuted by human agents or by some computerizedsupport. The executors of process activities are collectivelycalled agents. Each agent receives an activity description (i.e.,a process model fragment) and executes it. An activitydescription may be specified in any language that can beunderstood by the agent that is supposed to execute it.OPSS exploits three types of agents: software agents, humanagents, and external tools.

Software agents are computerized interpreters of execu-table activity descriptions. In the current implementationof OPSS, we have taken a very simplistic approach:Activity descriptions for software agents are simplycoded in Java (exploiting a set of classes offering specificprocess semantics, see later on) and are defined assubclasses of ReactiveObject. Thus, software agentsare Java interpreters.5

Human agents are persons executing creative, human-specific activities (e.g., customer service operators). Activitydescriptions executed by human agents can be written innatural language or in any simple graphical format that isunderstood by the agent. Activity descriptions for humanagents are received and visualized by the Agenda tool.

External tools are components that execute business-specific activities (e.g., a configuration management proce-dure). The activity description for an external tool is just theset of information needed to launch and operate the tool(e.g., the initial parameters). External tools can be eitherOPSS-compliant or off-the-shelf tools. The latter have to beinterfaced with OPSS through some proper gateways/wrappers. JEDI class ConnectionToED supports theprogrammer in the implementation of both tools andgateways.

3.1.1 State Server

As any other WFMS, OPSS has a persistent repositorystoring the state of the enacting process. This componentis called State Server and mirrors the state of all the

process entities. According to the formalization proposedby the Workflow Coalition [60], the key entities of aprocess are activities, agents in charge of executing them,resources (e.g., tools and devices) used to carry out theactivities, and artifacts used as inputs or produced asresults by activities. These entities are represented in theState Server as a set of objects, called process entityrepresentatives, each containing a detailed description of aspecific process entity. These objects constitute a reification

of the process state [53].The State Server subscribes to events such as login of

users and creation of new activities, artifacts, or resources.When one of these events occurs (e.g., a new activity needsto be started), it creates the corresponding process entityrepresentative. Process entity representatives show areactive behavior themselves. In particular, they have astate, subscribe to events, and react to them according torules that define the set of admissible transitions betweenstates. Process entity representatives are organized in a classhierarchy rooted at ProcessElement (see Fig. 6), that, inturn, is a subclass of ReactiveObject. The subclasses ofProcessElement are the following ones:

. AgentInfo. This class defines the possible states ofprocess agents. They are Available (i.e., the agentcan be assigned to the execution of an activity) andNotAvailable.

. ActivityInfo. This class is used to maintaininformation on the activities of the process. Anactivity can be in one of the following states:Defined, Assigned, OnGoing, Suspended,Terminated, Aborted. These states will be pre-sented in more detail later on.

. ArtifactInfo. This class defines the informationconcerning documents and data manipulated in theprocess. The possible states are Created, OnEdit,Edited, and Destroyed.

. ResourceInfo. This class defines informationabout the tools that can be invoked or used by OPSS(e.g., the executable code of the Java interpreters orof an external tool, devices such as a printer or anaudio device). The possible states are Available

and NotAvailable.

834 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

5. In principle, it is possible to introduce additional software agents forfull-fledged process modeling languages without impacting on the generalarchitecture of the environment.

Fig. 5. The ORCHESTRA process support system.

Fig. 6. Process entity representatives and the State Server structure.

Each of the above classes is associated with a finite statemachine (FSM) called life cycle. A life cycle defines the setof events the process entity is interested in and the set ofadmissible transitions between states. A transition isdefined by a triple: triggering event, condition, and action.With this respect, transitions are similar to ECA rules inactive databases (see Section 5.1 for a brief description ofECA rules). When an object receives an event Ei in a stateSj, all the transitions having Sj as initial state and Ei astriggering event are evaluated for firing. One of thetransitions whose condition evaluates to true is nonde-terministically fired. The firing of the transition causes theexecution of the action part and moves the instance to thetarget state. The execution of the action part of a statetransition can produce new events that may influence thebehavior of agents and the state of other objects in theState Server.

As an example, Fig. 7 shows the life cycle associatedwith class ActivityInfo. Upon creation, the state of anobject AI of this class is set to Defined. In this state, AIis characterized solely by a unique identifier and by anactivity description. AI can enter state Assigned when itreceives event AssignAgent(activityID,agentID),i.e., an agent has been selected to execute the activity. Thetransition to state Assigned can only be executed if theinstance of class AgentInfo representing agent agentIDis in state Available. This state transition triggers theproduction of event AgentAssigned(activityID,

agentID) and the transition of the AgentInfo instanceinto the state NotAvailable, if the agent resultscompletely booked after the current assignment. Agendassubscribe to event AgentAssigned to provide humanagents with information about their assignments. Inthe state Assigned, when AI receives event Will-

ingToStartActivity(agentID, activityID), itchecks if the preceding activities have been terminated.If this is the case, it moves to state OnGoing, andproduces event ActivityStarted(activityID, AD-

URL). This event must be subscribed by the agentassigned to the activity or, if he/she is a human agent,by his/her Agenda. Parameter AD-URL contains the

location of the activity description to be executed. If forany reason the activity cannot be started when eventWillingToStartActivity is received (e.g., it has towait the termination of some other activity), AI producesan event to warn the requesting agent.

Beside its event-based interface, the State Server exportsa set of RMI services through which any OPSS componentcan query the state of the running process (i.e., of theprocess entity representatives). These services constitute asynchronous interaction mechanism that is not directlysupported by JEDI.

In the current OPSS prototype, the State Server isimplemented as a centralized component. This can havenegative effects on the scalability of the system: The StateServer can become a bottleneck for the operation of thesystem, especially when agents are distributed over awide area. We are working on developing a newdistributed State Server. Since process entities within theState Server are autonomous objects that evolve accordingto their own FSM and communicate with all the otherentities (including the ones that are running in the sameState Server) through events, it is relatively easy todistribute them over a number of State Servers. The mainissue to be dealt with concerns the creation of theseprocess entities. This activity is currently performed bythe centralized State Server based on the events receivedfrom agents. In the distributed implementation, StateServers would have to be coordinated in order to avoidthat more than one State Server creates a copy of the sameentity. Otherwise, multiple copies of the same entitywould generate duplicated events that would have to behandled by the other components of OPSS.

3.1.2 OPSS Viewer

OPSS Viewer is a monitoring tool that provides informationon the state of the process. When it is launched, it sends theevent StartMonitor to notify other OPSS components ofits creation. Each process entity representative has beenimplemented to subscribe to this event and to react to itsoccurrence by generating a proper response event. Theresponse event carries information on the current state ofthe process entity representative. The Viewer collects allthese events and exploits them to provide human agentswith an initial visualization of the process state. Afterterminating this initial setup, the Viewer listens to all theevents that notify specific state changes occurring duringthe normal execution of the process, and use their contentsto update the information offered to the human agent.

It is interesting to note that multiple viewers can coexistwithout interfering with each other and with the processbeing executed. They can subscribe to the same events and,based on the information carried out by such events, canprovide human agents with different and independentrepresentations of the same process. Fig. 8 and Fig. 9 showthe process representation of two different viewers we haveimplemented so far.

In the viewer shown in Fig. 8, the process is representedin terms of the process entities stored in the State Server.The rightmost window in the figure illustrates the set ofprocess entity representatives of the technology advisorprocess that will be presented in more detail in Section 3.2,

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 835

Fig. 7. The Activity life cycle.

while the leftmost window describes the lifecycle of aparticular process entity representative and its current state.

In the viewer shown in Fig. 9, the process isrepresented in terms of the sequence of activities thatconstitute the process and of the input-output and control-flow relationships. The description is given in a standardnotation called IDEF0 [18]. The diagram is animated bychanging the color of the activities being executed. Thecontrol signals represent the events received by theactivity representatives.

3.2 An Example of a Business ProcessImplemented in OPSS

To validate the JEDI/OPSS approach, we have implemen-ted an ORCHESTRA service called technology advisor. Thisservice provides users with information and recommenda-tions about technological problems. In particular, a user canlogin to the service and can browse through the subjectssupported by the technology advisor. Each subject isassociated with several multimedia documents and ser-vices. In general, documents are automatically downloadedand displayed on the user's computer. Services include thepossibility to set up synchronous conversations with

experts and to send them asynchronous multimediamessages. The technology advisor manages the interactionbetween users and experts, according to the subject and tothe experts' workload.

We have started the implementation of the technologyadvisor process by identifying its main entities and theirrepresentatives in the state server. The artifacts used orproduced during the process are technical papers, pre-sentations, and videos. The agents operating the processare the human experts and the Java interpreters in chargeof executing the automated activities. The resourcesexploited during process enactment are the remote con-ferencing system provided by ORCHESTRA, a searchengine that makes it possible for customers to browse theinformation provided by the service, and some agendasprovided both to customers and human experts. There arethree process activities, one executed by human agents andtwo by software agents. The automated activities aremanageUserAccess, in charge of authenticating userswho access the service, and manageUserInteraction, incharge of reacting to the requests of a specific user. A newinstance of manageUserInteraction is created eachtime a new user enters the service. The activity executed by

836 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

Fig. 8. The user interface of the basic OPSS viewer.

the human experts is called manageMeeting. Whenexecuting it, an expert instructs one or more customerson a specific subject. He/she interacts with customersthrough the ORCHESTRA remote conferencing service.

As an example of execution of the technology advisor,consider the case in which a customer requests throughhis/her agenda to have a synchronous conversation withan expert. As a result, the agenda generates an eventAskForMoreInfo. This event is handled by themanageUserInteraction activity description, which,in turn, queries the State Server (through its synchronousRMI interface) to check if at least one expert skilled inthe subject specified by the users is available. If this isthe case, it generates an event CreateNewActivity

that, in turn, causes a representative of the manage

Meeting activity to be created in the State Server. Assoon as manageUserInteraction is acknowledged ofthe creation of the new activity, it generates theAssignAgent event, specifying the identifier of one ofthe available agents as a parameter. This event isreceived by the manageMeeting representative that, ifthe agent is still available at that time, changes its state toAssigned (as specified in Section 3.1.1) and generates anAgentAssigned event. Upon receiving it, the expert's

agenda issues a willingToStartActivity event,

which, in turn, causes the manageMeeting representa-

tive to change its state to OnGoing and to request

ORCHESTRA to start a remote conferencing service

session between the customer and the expert. Notice that

if the delegated agent is not anymore available when the

manageMeeting representative receives the event

AssignAgent, an event notifying the error is generated.

This is received by manageUserInteraction that tries

to delegate a new agent.

4 EVALUATION

The development of OPSS has demonstrated that the main

advantage of the event-based paradigm supported by JEDI

is the easy reconfigurability of the system. However, our

experience has also identified some problems and open

issues that will be discussed hereafter.

4.1 Synchronous vs. AsynchronousCommunication

In JEDI, active objects communicate using a pure event-

based style. Namely, the only mean for an active object to

send (receive) an information is to generate (receive) an

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 837

Fig. 9. The user interface of the IDEF0-based OPSS viewer.

event. Events are sent and received in an asynchronousway. We have noticed that in many situations an activeobject, after generating an event, needs some response fromthe recipient(s) of the event in order to continue itsprocessing. For instance, consider the case in which anagent needs to notify the State Server that a new activity hasto be created and that this activity has to be assigned to acertain agent. The agent executes the following codefragment:

sendEvent("CreateNewActivity(ActID,

ActType)");

sendEvent("AssignAgent(ActID,AgentID)");

The execution of this code might be erroneous because ofpossible race conditions. For instance, the State Servermight be unable to react to event CreateNewActivityproperly. This may happen if the State Server fails increating the corresponding ActivityInfo object beforethe event AssignAgent is produced. As a result, the eventAssignAgent is lost since the ActivityInfo objectwould be late in subscribing to it. Thus, in this case, theState Server would not be able to properly keep track of theagent assignment.

To avoid this situation, it is useful for the agent to receivethe confirmation of the creation of the ActivityInfo

object before generating the next event. In OPSS, we haveobtained this behavior by programming the event recipientto produce an event that acts as a ªresponseº to the initialevent. This way, the source of the initial event can explicitlysubscribe to this event and wait for the event occurrencebefore producing the AssignAgent event. This solution isquite cumbersome and expensive since it requires theexchange of a high number of messages between the eventsource, the recipient(s), and the event dispatcher.

An alternative solution would be to explicitly define inJEDI the concept of ªreturn valueº from the eventrecipient(s) back to the agent that has generated the event,and to provide the programmers with mechanisms to easilymanage these values. In particular, we are currentlyintroducing, in JEDI, an additional synchronous operationfor event generation that requires a ªreturn valueº from therecipient(s) of the event. The execution of this operationallows an active object to send an event to the dispatcherand wait until some information is returned from the eventrecipient(s) or, if no object is interested in the event, fromthe event dispatcher. When the event has multiplerecipients, several policies can be envisaged to managethe return values. For instance, the source can wait for thefirst return value, or it can wait until all the recipients haveprovided a response. In this latter case the event dispatchershould inform the source of the number of return valuesthat it should receive. Notice that this additional synchro-nous mechanism still preserves the anonymity of therecipient(s) of the event since the exchange of return valuecan still be managed by the event dispatcher. More ingeneral, the mechanism preserves the basic semantics ofevents (multicast dispatching and anonymity of both sourceand recipients) and introduces a significant amount offlexibility and optimization in the management of complexagent interaction patterns.

4.2 Event Granularity

We have experienced a significant problem in identifyingthe events to be exchanged among agents. If the granularityof events is very fine, many events have to be generatedsince each of them has a poor or limited meaning. Thischoice might significantly complicate the programmingactivity, reduce the performance of the system, and make itdifficult to test and monitor the system. On the other side, atoo coarse-grained definition of events might hide insideagents significant operations that must be made visible tothe rest of the system. For instance, consider the examplepresented in the previous section. In that case, the eventsCreateActivity and AssignAgent (that gave us severalsynchronization troubles) could have been replaced by aunique event carrying the information about both thecreation of the activity and its assignment to the specifiedagent. This design choice reduces the number of exchangedevents but modifies the semantics of activities: Any activitycan be created only if a proper executing agent has beenalready designated and if the creator of the activity is awareof this designation.

There is no universal solution to this event designproblem. It is the designer's responsibility to evaluate thetrade-off and select the most suitable solution based on theconstraints and requirements of the problem being ad-dressed. Still, event-based infrastructures can supportdesigners in this decision by providing suitable eventcomposition languages and mechanisms that allow high-er-level events to be synthesized from lower-level events. In[13], we approach this problem by introducing into theevent dispatcher a new component called event filter. Theevent filter captures all the events generated by componentsand uses them to synthesize new events according to theguidelines provided by a set of filtering rules. A filtering ruleis composed of two main parts: an event expression and anevent generator. The event expression defines the specificcombination of input events to be recognized in order toproduce a new event called filtered event. The eventgenerator indicates how to compute the filtered event.Filtering rules can be posted to the event filter bycomponents according to their specific needs. Given acertain level of granularity of the events generated bycomponents, the event filter allows developers to increasethe granularity of the events received by other specificcomponents by defining proper filtering rules. We arecurrently assessing the approach sketched above. Similarapproaches are presented in Section 5.

4.3 Remote Procedure Call vs. Event-Based DesignParadigms

The event-based paradigm represents a significant shiftwith respect to the traditional synchronous remote proce-dure call approaches. In a remote procedure call approach,interaction between components occurs when a componentis not able to perform some operation and asks some othercomponent to do it on its behalf. In an event-basedapproach, components are autonomous entities that informthe ªexternal worldº of the main changes occurred in theirinternal state or in the state of the components and devicesthey can observe. The notification of an event is seen by acomponent as an external stimulus that can determine achange in its internal state. Thus, collaboration amongcomponents is indirect.

838 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

Based on this consideration, a main step in under-standing the advantages and drawbacks of the remoteprocedure call and event-based design paradigms shouldbe the identification of the classes of systems that better suiteach approach. Since they address different requirements,we are convinced that event-based and remote procedurecall approaches are not alternative. Instead, they can beprofitably integrated even in the same system. In OPSS, wehave tried to use the event-based approach to guaranteeautonomy of process agents and reconfiguration of thesystem, and we also exploited the remote procedure callapproach to query the global state of the process maintainedby the State Server. We are aware, however, that asystematic study and characterization of the problem isdefinitely needed.

4.4 Network-Wide Event Distribution

The development of OPSS has emphasized the need forpowerful and efficient mechanisms to support the notifica-tion and distribution of events on a network-wide scale(e.g., on the Internet). The event-based infrastructure mustguarantee that the services implemented on top of it aremade available to users dispersed over the Internet. Thehierarchical ED we implemented may represent an initialsolution to the problem. However, there are still a numberof issues to be addressed. First, as we have underlined inSection 2.2.3, several other event routing strategies can beenvisaged. Second, connection topologies alternative to thehierarchical one have to be evaluated. Finally, the impact ofthe expressive power provided by the subscription mechan-ism on the performance of the system has still to beanalyzed. Colleagues at the University of Colorado atBoulder and the University of California, Irvine, areaddressing these issues by defining and assessing newarchitectures for distributed EDs [44], [45].

4.5 Mobility

We argue that the features for event buffering andforwarding provided by JEDI to support the mobility ofactive objects represent a powerful mechanism for im-plementing sophisticated applications. However, thesefeatures may introduce several problems when combinedwith ED distribution. The ED, in fact, has to providespecific mechanisms to guarantee that mobile objects donot receive duplicated events and that the original orderingof events is respected. We provided a specific solution forour hierarchical ED, but the impact of this issue onalternative ED architectures is still to be understood. Also,we still lack an extensive experimentation of mobility sinceit was not exploited in the OPSS implementation.

4.6 Event Structure

A JEDI event is a simple sequence of strings. An alternativesolution could have been to define events as Java objects. Inprinciple, by exploiting the Java serialization API, it ispossible to transmit any Java object through a networkconnection. As a consequence, we could have defined aclass Event with a minimal set of attributes and methodsand allowed programmers to specialize it and to use theresulting subclasses to instantiate their own events. Asimilar solution has been adopted for the development of

the C2 event-based infrastructure (see Section 5.1). We havechosen a simple event structure for the sake of flexibilityand interoperability. Indeed, even if a more sophisticatedsolution had enriched the semantics of events significantly,it would have also introduced several constraints on theability of exploiting different languages to develop activeobjects. Moreover, the event structure we have selected,even if simple, makes it possible to implement an easy touse and expressive event subscription operation. Otherstructures for events are presented in Section 5.1.1.

4.7 Global vs. Local Type System

A critical issue in the development of JEDI has been theselection of the type system exploited to create anddistribute events. There are at least two basic alternatives.In a first scenario, the space of event types exchangedthrough the event dispatcher is global. This means that allthe components in the event-based infrastructure see anduse the same set of event types. A second alternative isbased on the assumption that there is no global type space.Each constituent of the event-based infrastructure canproduce events without referring to a specific type. Asubscriber is supposed to know the structure of the eventbeing received, while this structure is completely hidden tothe event-based infrastructure.

Certainly, a global type system may result desirablesince, in general, it makes it possible to perform significantconsistency and compatibility checks on the informationbeing exchanged. Still, we have preferred to adopt thesecond solution. This choice is justified by the fact that animportant requirement underlying the development of JEDIis its ability to operate over the Internet effectively. Theexperiences of the past years have demonstrated that it isextremely difficult to define type systems at the Internet-level, where it is necessary to cross company boundariesand involve millions of independent users. The issue is notmerely technical: It is also related to scalability and ease ofoperation. The Internet is inherently decentralized andbased on autonomous and independent operators. Typecompatibility cannot be enforced by an explicit, network-wide (and, thus, conceptually centralized) type system;rather, it is the result of a set of simple and voluntarilyconventions. MIME is a typical example of such anapproach. It does not define the structure of the differentfiles being exchanged over the Internet. MIME is used justto label the documents being exchanged so that each partycan access them according to agreed procedures and tools(e.g., a ªtextº file is what you can usually open with aneditor). This position is far from being consolidated andaccepted in the community, as demonstrated by the debatethat took place at WISEN 98 [60].

As a last remark, we argue that the two issues related toevent structure and event type system are fairly orthogonal.For instance, it is possible to offer the ability to createcomplex, structured events without using any global typespace. Conversely, one may use a very simple structuringmechanism and enforce a global type system. Certainly, theoverall complexity of the system tends to increase sig-nificantly as we integrate different features. This issue isfurther discussed in the next section.

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 839

4.8 Putting Everything Together

Several colleagues have argued that event-based infrastruc-tures are not particularly new. These infrastructures havebeen around for years now and, therefore, they might beconsidered consolidated technologies. Indeed, the growingnumber of commercial systems being introduced in themarket seems to support this claim.

We argue that this observation is only partially true. Weare certainly in a phase where event-based systems havereached a significant level of maturity. This makes themsuitable to implement complex and critical applicationssuch as a trading system for the stock market [57]. Still, ourexperience in the development of JEDI/OPSS and theanalysis we have conducted of the state of the art in the field(see Section 5) have identified two critical open issues:

. As discussed in the previous sections, there aremany facets and features that characterize an event-based infrastructure. The critical point is not tosupport these features singularly; rather, it is theidentification of a reasonable compromise to inte-grate them in a feasible and viable way. For instance,the interaction of features such as full support tomobility, distributed event dispatching, and a global,object-oriented type system may result in a verycomplicated and inefficient solution that is notapplicable in real, Internet-wide applications. JEDIis an attempt to identify a reasonable compromiseamong this number of often conflicting requirementsand features.

. It is not altogether clear if it is reasonable to envisiona single general-purpose event-based infrastructure.Given the wide variety of features and applicationdomains, one may even argue that it is necessary tocreate different event-based infrastructures, each ofthem offering only the features needed by a specificapplication domain. JEDI partially supports thisvision. For instance, we have purposely limited thelevel of abstraction of some of the JEDI features tomake it suitable to operate at the Internet scale (e.g.,by avoiding a global and powerful type system infavor of scalability and flexibility). Certainly, weargue that it is necessary to perform a systematicevaluation of the correlation among the technicalfeatures of event-based infrastructures and thecharacteristics of the application domains.

The next section is an initial attempt to address the twoissues above, by providing criteria and concepts to compareand analyze existing approaches and systems.

5 RELATED WORK

This section surveys event-based infrastructures and com-pares them with JEDI. Also, it compares OPSS with similarªInternet-wideº WFMSs.

5.1 Event-Based Infrastructures and Frameworks

As pointed out in Section 4.8, the idea of exploitingevents in software systems is not new and has beenadopted in several contexts. For instance, in the area ofactive databases, events are generated when updates are

performed on data. These events may trigger theexecution of actions, depending on the structure of someEvent-Condition-Action rules (ECA rules). These rulesspecify the set of triggering events (Event part), thecondition that has to be checked upon triggering(Condition part), and the set of operations that areexecuted if the condition is true (Action part) [21]. Inactive databases, both the generation of events and thereaction to their occurrence is local to the DBMS. In thisrespect, they differ significantly from event-based infra-structures that are devoted to support communicationamong distributed components. For this reason, evenissues that appear to be similar, in reality are addressedin the two domains with different emphasis and scope.For instance, the issues related with the distribution andscalability of the dispatching mechanism are irrelevant inactive databases. Conversely, they are of primary im-portance in event-based infrastructures. Indeed, whileapproaches to support the analysis and testing ofECA rules have been proposed in DBMSs (see, forinstance, [1]), the same issues have not been consideredso far in event-based infrastructures. For the abovereasons, we will not discuss active databases further.The focus of the remainder of this section will be onunderstanding and comparing characteristics and peculia-rities of event-based infrastructures devoted to supportcommunication among distributed components.

The first event-based infrastructures were proposed inthe 80s to solve specific problems such as the developmentof extensible CASE tools [43] and the integration ofapplications running on mainframes [40]. In the past years,the interest in this communication paradigm is explodeddue to the dramatic diffusion of distributed, component-based systems. Since we started the development of JEDI, anumber of new event service infrastructures have beenproposed either in academia or in industry. Moreover,several attempts to define the event-based architecturalstyle and to classify the event-based infrastructures accord-ing to well-defined frameworks have been proposed. One ofthese efforts is presented in [9]. That paper focuses on theidentification of the main functional components of anevent-based middleware and defines a type system that canbe further specialized to describe specific event-basedmiddlewares.

Fig. 10 summarizes the functional components of anevent-based infrastructure as they have been identified in[9]. Participants can either send or receive messages thatrepresent the occurrence of some event. Before sending orreceiving any message, a participant has to inform theRegistrar of its intention of doing so. The Router is in chargeof delivering the messages. It may contain a number ofinternal components, the Message Transforming Functions(MTFs) and the Delivery Constraints (DCs). MTFs are incharge of filtering the messages on behalf of some listener,while DCs define some constraints on the order in whichevents are received.

A more general framework is proposed in [44]. In thiscase, an event-based system is described by seven models:

. The object model characterizes the components ofthe system.

840 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

. The event model focuses on the characteristics ofevents.

. The naming model defines how components refer tothe other components and to events for the purposeof subscribing to them.

. The observation model focuses on the mechanismsthrough which the occurrences of events areobserved.

. The time model is related to the temporal aspects ofevents.

. The notification model focuses on the mechanism tonotify consumers of the occurrence of events.

. The resource model defines how computationalresources are allocated and accounted.

In this section, we introduce a classification that can be

considered a pragmatic specialization of the two frame-

works mentioned above. The objective is to provide the

reader with guide for a practical comparison between our

work and other efforts presented in the literature. We

classify systems depending on their event model, subscrip-

tion approach, observation and notification model, and,

finally, on the basis of their architecture.

5.1.1 The Classification Framework

As for the event model, we identify three different classes of

event-based systems:

. Tuple-based notifications are defined as a set of strings.For example, (UsefulStuff, 4, http://...) isa tuple-based event notification that indicates theavailability of Release 4 of product ªUsefulStuff,ºwhich can be downloaded from site ªhttp://....º

. Record-based notifications are defined as sets of typedfields characterized by a name and a value. Forinstance,

Struct NewRelease {

string ProductName = ªUsefulStuffº

integer ProductRelease = 4

string DownloadURL = ªhttp://...º

}

is a record-based notification composed of three

typed fields. Note that within the record-based

category, different event-based infrastructures could

be further classified depending on the richness of the

type system they offer.. Object-based notifications have both a state and a set

of methods. For instance, the following code definesa class of events called NewSoftwareRelease:

class NewSoftwareRelease: public Event {

public String ProductName;

public String ProductRelease;

private String DownloadURL;

NewSoftwareRelease(String name,

String Release, String URL);

public void DownloadAndInstall();

}

In this case, an event is created through the

invocation of the class constructor NewSoftware

Release:

NewSoftwareRelease NewProduct = new

NewSoftwareRelease( ªUsefulStuffº,

ª4º, ªhttp://...º );

Event NewProduct represents a new product being

delivered. It provides a method, DownloadAndIn-

stall, that can be invoked by the receiver of the

event to get the product downloaded from the

address contained in the variable DownloadURL

and installed on the local machine.6

The subscription approaches can be classified as follows:

. Content-free. Subscription is accomplished by speci-fying a channel. The subscriber receives all themessages that are posted to the channel.

. Subject-based. Each event is labeled with a subject.Subscriptions are specified by indicating the subjectof interest. Notice that the subject-based approach isa variation of the content-free concept. We introducethis distinction because it reflects a market trend. Inpractice, both subjects and channels can be used torepresent the ªkeyº of the events the subscriberwants to receive. Both approaches enable theexploitation of multicast communication infrastruc-tures and guarantee the high level of performanceneeded in several critical application domains (e.g.,thousands of events per second in stock marketapplications). The drawback is in the limited free-dom that subscribers have in expressing the eventcategories they want to receive.

. Content-based. Subscriptions are specified as expres-sions evaluated over the event contents. Within thecontent-based category, subscription language con-structs can be further classified depending on theexpressive power they provide to specify predicates:

- Disjoint elementary expressions. It is possible tospecify the value or the range of values for eachevent parameter. For instance, in a system

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 841

6. Notice that DownloadURL has been defined as private because wewant to ensure that the user of a NewSoftwareRelease instancedownloads the corresponding software only by invoking the methodDownloadAndInstall.

Fig. 10. Functional components of an event-based infrastructure.

supporting a record-based event model, wecould issue the following subscription: sub-

scribe(name = ª UsefulStuff º, release

> 4), where name and release are names ofevent fields and ªUsefulStuffº and 4 are twoconstant values.

- Compound expressions. It is possible to comparedifferent event parameters. For instance, in asystem supporting a tuple-based event model,the following subscription could be issued:subscribe(I parameter > 4, II para-

meter < I parameter).- Regular expressions. The subscription request is

expressed using regular expression. For in-stance, in a system supporting a tuple-basedevent model, the following subscription couldbe issued: subscribe(ª*Staffº, ª*º,

ª*.itº).. Event combination. It is possible to define subscrip-

tion expressions that require the combined occur-rence of more than one event. For instance, in asystem supporting a record-based event model, wecould issue the following subscription:

subscribe(A followed by B

where A.share = "IBM"

and B.share = "IBM"

and B.value = A.value+25).

This subscription will be issued by a component that

wants to be notified when the IBM share value

increases by 25.

Two characteristics of event-based infrastructure that can

influence the design of applications are the observation and

the notification model. These models can follow two different

communication styles, push or pull. In general, in the push

style, the provider of data (i.e., the event source in the

observation model and the event dispatcher in the notifica-

tion model) starts a communication with the receiver (i.e.,

the event dispatcher in the observation model and the event

recipient in the notification model). Conversely, in the pull

style, the receiver explicitly polls the provider. The adopted

communication style has an impact on the way active

objects are implemented and also on the performance of the

event dispatcher. For instance, if the observation mechan-

ism is pull, a producer of events should offer to the event

service a ªpolling serviceº through which it can be queried.

In turn, the event dispatcher should periodically exploit this

polling service, thus increasing its workload.An important factor that has an impact on perfor-

mances and scalability of event-based infrastructures is

the internal architecture of the event dispatcher. We

classify event-dispatcher architectures according to the

following categories:

. Direct Connection. No explicit event dispatcher exists.Events are dispatched by the sources to the interestedparties that are directly connected to the sourcesthemselves. In other words, the sources act as eventdispatchers of the events they want to notify.

. Broadcast. This is a special case of direct connectionin which sources exploit IP multicast [26] to deliverthe events to the destinations.

. Centralized. A single event dispatcher performs thedispatching of events.

. Distributed. A number of interconnected dispatchingservers cooperate to deliver events.

. Mixed. The mixed approach exploits broadcastmessages to deliver events within a LAN and acentralized or distributed event dispatcher to for-ward them across different LANs. This way it ispossible to take advantage of the broadcastingmechanisms still overcoming their limitations inWAN communications.

As a final comment, notice that, while support to componentmobility is a distinctive feature of JEDI, it does not appear in theabove classification since none of the other event-based infra-structures supports it.

5.1.2 A Comparison of Representative Event-Based

Infrastructures

Table 1, Table 2, and Table 3 classify representative event-based systems with respect to the characteristics we haveidentified in Section 5.1.1. In the following section, wedescribe all the infrastructures listed in the tables.

JavaBeans are reusable components written in Java [48].They support a simplified event-based model in whichcommunication is permitted among components (calledbeans) running in the same process. Despite this simplifica-tion, JavaBeans provide a powerful event model, in whichevents are instances of the class EventObject. A Java-Beans developer can define new classes of events byspecializing this class. JavaBeans approach does notexplicitly provide an event dispatcher. Instead, the sourcesof events are in charge of explicitly notifying each of theJavaBeans that has expressed interest in receiving an event.JavaBeans can subscribe to classes of events. This meansthat they will receive all the events belonging to that class orto one of its subclasses. Instead, they cannot expressrequirements on the specific content of the events they willbe notified of. Subscriptions are issued by directly contact-ing the sources of the interesting events. The observationand notification models follow the push approach andexploit the Java method invocation protocol.

OMG has defined a standard for the implementation ofan event service on top of the CORBA ORB [37]. Inparticular, the standard defines the IDL interfaces for threetypes of components that are involved in an event-basedinteraction. These are the event supplier, the event consumer,and the event channels. Event consumers may be directlyconnected to event suppliers. Alternatively, the distribu-tion of events can be mediated through an event channelthat allows multiple suppliers to communicate with multi-ple consumers asynchronously, thus providing a trueevent distribution mechanism. A component of the system(either supplier or consumer) may be connected to severalevent channels. According to our classification, theCORBA event service supports a record-based eventmodel. The channel can either be aware of the structureof events (typed approach) or not (untyped approach).

842 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

Current implementations of the event service specification

mainly support the untyped approach. In the untyped

approach, a consumer connected to an event channel

receives all the events that suppliers forward to that event

channel. Thus, the subscription approach is content-free.

Conversely, observation and notification models are quite

sophisticated. In fact, CORBA supports both the push and

pull approaches. They can be combined in different ways,by having both push and pull event observation andnotification in the same system. The CORBA-compliantevent channels that are currently available on the marketmostly present a centralized architecture. Event channelscan be pipelined. This constitutes a sort of distributedevent dispatching architecture. However, such distributionis not transparent to application developers. In fact, theyhave to explicitly manage it. In order to enhance thecapabilities of the event service, OMG is currently workingon specifying the interfaces of a notification service [38]. Itenriches the event service with a content-based subscrip-tion mechanism supporting compound expressions andevent combination.

C2 is an event-based architectural style that has beendesigned to support the development of GUI software [56].C2 is currently applied also to other classes of applications.The distinctive characteristic of C2 is that it providessupport for the dynamic reconfiguration of an application[39]. In C2, multiple software components can communicatethrough connectors (called C2 buses) that manage therouting of events. Each C2 bus has two terminations (calledbottom and top). Components connected to the bottom of aconnector are enabled to generate ªrequests.º The connectorforwards these requests to the components connected to itstop that are able to serve the request. In turn, componentsconnected to the top can generate ªnotificationsº that theconnector forwards to all the components connected to itsbottom. A component can be connected to the bottom of oneC2 bus and to the top of another one. This means thatcomponents and buses can define a kind of hierarchicalarchitecture. The constraints imposed on this architectureare that components cannot be interconnected directly and

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 843

TABLE 1Classification of the Event Models

TABLE 2Classification of the Subscription Approaches

that it is not possible to have either direct or indirect cyclicconnections. According to our classification, the eventmodel supported by the C2 infrastructure is object-based,as the parameters of events can be Java serialized objects.The subscription approach is content-free for notificationssince all the components connected to the bottom of a busreceive all the notifications that have been generated by thecomponents connected to the top of the same bus. Theinternal architecture of each bus is centralized, but thearchitectural style provides guidelines to compose thesebuses. Finally, the observation and the notification modelsare push-based, and the protocol used in both cases is RMI.

Smartsockets [55] is a commercial event-based infra-structure developed by Talarian. It provides a richenvironment for the development of event-based applica-tions and supports monitoring of the events exchangedamong the components of an application. Also, itprovides APIs for synchronous communication andsupports fault tolerant connections. As for the event-based communication mechanism, Smartsockets supportsa record-based event model and predefines a set ofcommonly used event types. Developers can either createevents as instances of these types or define their ownapplication-specific types. The subscription approachadopted by Smartsockets is subject-based. Subjects canbe organized in hierarchies (e.g., ª/stock/computerº is asubject defined within the context of the broader subjectªstockº). Subscriptions refer to subjects. They can containwildcards. For instance, ª/*/computerº matches all thesubjects containing a subsubject called ªcomputer.º Sub-jects are orthogonal to event types, in the sense thatevents of the same type can belong to different subjectsand, vice versa, a subject can be associated to events ofdifferent types. Therefore, a consumer cannot expressrequirements on the content of the event it wants toreceive. The internal structure of the Smartsockets event

dispatcher is distributed. Each dispatching server is awareof all the subscriptions that have been issued in somepoint on the system and is able to dynamically routeevents based on the cost of network connections and ontheir load. While this approach to distribution providesseveral advantages on a local area network in terms ofincreased performance, load sharing, and reliability, itsapplicability to wide-area network scenarios has to beanalyzed according to the discussion of Section 2.2.3.Smartsockets supports a push observation model andboth push and pull approaches for the notification model.

TIB/Rendezvous is a commercial infrastructure devel-oped by TIBCO for creating and maintaining large,distributed, event-based applications [57]. It has been usedover the past years to integrate financial and bankingapplications (especially, trading services for financialmarkets). It offers several interesting features includingreliable and scalable distribution of events. It exploits athree-level hierarchical event dispatcher: Each node run-ning one or more agents must also run a TIB/Rendezvousdaemon, which is in charge of filtering events for the agentsrunning in that node. TIB/Rendezvous daemons commu-nicate among themselves by means of broadcast messages.The delivery of events among nodes that do not belong tothe same subnet is achieved through two kinds of ªroutingdaemons:º a subnet routing daemon and a wide-area routingdaemon. The combination of TIB/Rendezvous daemons,subnet routing daemons, and wide-area routing daemonsdefines the three-level hierarchical structure mentionedabove. TIB/Rendezvous events are composed of a set oftyped data fields. The subscription approach is similar tothe one adopted by Smartsockets: Each event has anassociated subject, which plays the role of a special field.Agents may subscribe to one or more subjects. Observationand notification models follow the push approach.

844 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

TABLE 3Classification of Observation and Notification Models and of Event-Service Architectures

ToolTalk [47] is a product derived from FIELD [43]. Itwas originally conceived to support tool integration insoftware engineering environments. In Tooltalk, events arecomposed of a name and a set of attributes. Each attributecan be of type integer, string, or byte, and is associated witha textual comment that is used to describe its semantics toapplication developers. For the purpose of our classifica-tion, we can consider these events as record-based. Thecomponents of a ToolTalk system subscribe to events eitherstatically, at installation time, or dynamically, during theirexecution. If there is no dynamic subscription for a specificevent, ToolTalk exploits static subscriptions to start acomponent able to receive and handle the event. Also,subscriptions can have two different scopes: session andfile. A session is defined as the set of all the tools served bythe same ToolTalk server. If a component performs asubscription with a session scope, it can receive themessages that are sent to that session and that match thesubscription. When a component performs a subscriptionhaving a file scope, it receives all the messages that matchthe subscription and that refer to that file independently ofthe session in which they are generated. According to ourclassification, the ToolTalk subscription approach is basedon disjoint elementary expressions. As we have mentionedbefore, multiple ToolTalk servers can be instantiated. Theyinteract when events associated with files have to bedelivered. Therefore, the event service architecture isdistributed. The observation and notification models areboth push. ToolTalk also supports events with returnvalues, as described in Section 4.1.

Elvin [46] is an event-based infrastructure that has beendeveloped at the University of Queensland (Australia). Thefocus of this project is on defining a rich event subscriptionlanguage and on achieving scalability by supporting thefederation of event dispatchers. The event model supportedby Elvin is record-based. The types supported for fields arestring, integer, float, and date. The subscription mechanismthat is currently supported exploits compound expressions.The standard comparison operators can be used to comparea field with other fields of the same event or with constantvalues. The observation and notification models support thepush approach. Distribution of the ED is currently anongoing work on the last version of Elvin (Elvin4). In [4],authors distinguish between local and wide area federationof EDs. The first one takes place within the boundaries of asingle organization or business unit and is aimed atproviding reliability (if a fault on an ED occurs, the otherstake its place) and scalability with respect to the number ofusers (some form of load balancing allows new users to beconnected to the less loaded ED). The second type offederation is more close to the distributed approach wepropose in JEDI and focuses on issues such as minimizationof coordination messages among dispatchers and orderingof messages. An interesting characteristic of Elvin that is notshared with the other event-based infrastructures we areaware of is called quenching. It is the ability of componentsattached to the Elvin dispatcher to know (from thedispatcher itself) if some other components have issued asubscription compatible with some quench expression. Usingthis feature, a component can autonomously decide not to

generate events to which no one is interested. At a first lookthis approach seems to optimize the network traffic.However, a deeper evaluation is needed when we considerapplications in which components can subscribe to eventsanytime during their execution. In this case, the generator ofan event cannot issue a quenching request once for all, but ithas to renew it all the times it generates the event. When adistributed event dispatcher is considered, the managementof such quenching requests may be quite expensive in termsof network load.

Yeast is an event-action system [29]. It offers a powerfulmechanism for specifying and detecting the combinedoccurrence of events. The Yeast main component is acentralized server that observes event sequences and reactsto their occurrence according to some action specification. Itdiffers from the other event-based infrastructures discussedso far. Events in Yeast can either be operating system events(e.g., file changes) or messages produced by the compo-nents of the system. According to our classification, thestructure of events is record-based. The users of Yeastcontrol its execution by defining and posting some event-action specifications. These specifications define an eventpattern and the actions that have to be performed when theevent pattern is matched. Actions include any commandthat can be executed by the Unix shell (e.g., sending an e-mail message or creating a new file). An event pattern canbe composed of a number of event descriptors combinedusing some logical and temporal operators. For instance,ªfile foo mtime changed then in 10 minutesº is anevent pattern composed of two event descriptors. The firstone is of type file, is called foo, and has an attribute calledmtime. The second one is a temporal event descriptor. Theevent pattern is matched if file foo has changed and tenminutes have elapsed since then. Event patterns representthe Yeast subscription mechanism. According to ourclassification, they support event combination. As for eventobservation, Yeast supports a mixed approach. In particu-lar, operating system events (the authors call them pre-defined events) can be observed either through a pull or apush approach, while messages produced by components(called user-defined events) are observed only through apush approach. Yeast does not have an explicit notificationmodel since it is not supposed to automatically notifycomponents of the occurrence of events. However, being itsaction language the Unix shell language, it allows theprogrammer to notify components by exploiting thestandard e-mail mechanism. Yeast and JEDI (or any otherevent-based infrastructure) are quite different and comple-mentary. Yeast does not offer any event dispatchingfunctionality, but provides sophisticated mechanisms fordefining, observing event sequences, and reacting to theiroccurrence (see Section 4.2). Thus, it could be implementedas a component on top of many event-based infrastructures.

GEM (Generalized Event Monitor) [35] has been devel-oped to support network management. GEM architecture iscomposed of three types of nodes: event generators, eventmonitors, and event disseminators. Event generators emitevent notifications. Event monitors process the eventnotifications they receive from the other nodes of thenetwork. In particular, they filter and compose incoming

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 845

notifications and emit resulting notifications. They operateby interpreting filtering and composition rules that aredefined in a proper language. Event disseminators forwardevent notifications to the clients that subscribe to them. InGEM, the focus is on the definition of the language for eventfiltering and composition. This language can be consideredsimilar to the one implemented in Yeast. The maindifference is that the temporal aspects related with theevaluation of filtering and composition rules are managedin GEM by assuming that event monitors are distributed.This requires the definition of mechanisms for guaranteeingthe existence of a global clock. The event model provided byGEM is record-based. The observation model can be eitherpush or pull. The notification model is push. The eventservice architecture is distributed. From the documentationavailable, we could not understand if the components thatuse the system (the monitored objects and the final users ofthe event notifications) need to be aware of this distributionor not. The subscription approach is not explicitly describedin [35]. The language used for defining filtering andcomposition rules can potentially support an event combi-nation approach to subscription.

Gryphon [7] is a research project by IBM that is currentlyfocusing on defining efficient algorithms to match eventsagainst content-based subscriptions. The system supports arecord-based event model and a compound expression-based subscription approach. The architecture of the eventdispatcher is distributed. When receiving an event, adispatcher executes a matching algorithm that, based onsubscriptions allows it to determine the set of neighbors(either other dispatchers or application agents) that need toreceive the events. As we mentioned before, the main focusof the project, at the moment, is on defining an efficientalgorithm for performing such matching and limiting thenetwork traffic concerning event delivery. No attention iscurrently paid to how subscriptions are distributed to allevent dispatchers. Event dispatchers are assumed to besomehow informed of the subscriptions issued by all theconnected active objects. As we discussed in Section 2.2.3and as other researches have pointed out [4], the issue ofsubscription distribution cannot be disregarded since it candramatically increase the network traffic in a wide areanetwork.

The Java Messaging Service (JMS) [49] is an APIdeveloped by Sun Microsystems. It aims at representingthe standard, common interface for Java messagingproducts. Sun does not provide any implementation ofthis interface, and assumes that other tool vendors willadopt it. According to the specification, messaging pro-ducts can be broadly classified as either point-to-point orpublish-subscribe systems. Point-to-point (PTP) productsare built around the concept of message queues. Eachmessage is addressed to a specific queue; clients extractmessages from the queue(s) established to hold theirmessages. Publish-subscribe systems are what we definedas event-based infrastructures. Consistent with this classi-fication, JMS interfaces can be ideally split in two subsets:one tailored for point-to-point messaging products, theother tailored for publish-subscribe systems. To be ªJMScompliantº a tool vendor has to adopt either one of the two

sets of interfaces. In this paper, we will discuss only theinterfaces related to the publish-subscribe paradigm. JMSmessages are composed of a set of standard headers (eachone characterized by a name and a value); a set ofproperties, which can be user-defined or vendor-specific;and a body, which may include any stream of data. JMSmessages are addressed to a topic. Topics have a name andare organized in a hierarchy. JMS clients subscribe tomessages addressed to a given topic by specifying amessage selector. Any string that conforms to a subset of thestandard SQL92 conditional expressions syntax can be usedas a selector. It can reference message headers andproperties but cannot reference message bodies. Onlymessages whose headers and properties match the selectorare delivered to the subscribers. Messages can be receivedboth synchronously and asynchronously. Observe that,being a pure API, JMS does not specify how the dispatcheris implemented, i.e., as a centralized server or through a setof collaborating, distributed components.

5.1.3 Other Related Approaches

In this section, we present some systems that we do notconsider true event-based infrastructures. Nevertheless, webelieve that they offer specific functionality that are relevantto the discussion presented in this section.

Multicast RPC [10], [58], [59] (also known as group RPC)allows a client to invoke a service on a group of serversthat export the same interface. The servers ªregisterº to aclass of messages (service requests) by joining a group andby exporting the common interface defined for the group.This is quite different from the approach taken by JEDI. InJEDI, event consumers use a more powerful declarativeapproach to ªregisterº to a class of messages and they donot need to export any common interface. Moreover,multicast RPC is a synchronous communication mechan-ism in which an answer is required, while JEDI implementsan asynchronous communication mechanism withoutanswer. From this viewpoint, multicast RPC is comple-mentary to the JEDI approach, and could be similar to thesynchronous mechanism we advocated in Section 4.

Linda [12] is the precursor of a generation of languagesaiming at describing and supporting cooperative computa-tions. The basic idea is that different autonomous computa-tions can cooperate by reading and writing informationthrough a shared repository (or space) of information tuples.Each Linda program can read a tuple from the repository onthe basis of its contents, using a pattern matching mechan-ism. A read operation does not remove the tuple from therepository. Linda offers also a consume operation that readsthe tuple and removes it from the repository. There areseveral differences between Linda and JEDI. First, JEDImakes it possible to ªdeclare,º through the subscribeoperation, the class of events which an application isinterested in. As a consequence, the application will receiveall the events that conform with the subscribe declarationwithout requesting them further. In other words, events aredistributed by the ED to the application as they areproduced and asynchronously with respect to the maincontrol flow of the application. Conversely, in Linda, eachread/consume operation is independent of each other and issynchronously executed by the Linda program. Second,

846 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

JEDI (as any other true event-based approach) guaranteesthat all the parties that have declared their interest in anevent will eventually receive it. This is enforced by the JEDIruntime support based on subscription requests. In Linda,the only way to achieve a similar effect is to work at theapplication level. For instance, before removing the tuple, aLinda program might check for some global information tobe sure that all the other interested parties have alreadyread it. Another possibility is that each event producerwrites multiple copies of a tuple, one for each interestedparty. This means that the producer must know the numberof interested parties. In both cases, the correctness of theevent distribution semantics is left to the programmer'sresponsibility.

JavaSpaces by Sun [52] and Tspaces by IBM [33] bothfollow the Linda approach to support distributed coopera-tion. In both cases, the Linda paradigm is enhanced with asubscription mechanism that allows agents to be notifiedwhen new tuples matching their subscriptions are insertedinto the shared repository (Tspaces also support subscrip-tions to other kinds of events such as the elimination of atuple from the shared repository). Still, the style ofinteraction supported by such systems is different fromthe one supported by event-based infrastructures since, aswe discussed above, they do not really support multicastcommunication of tuple contents. This has to be managed atthe application level by ensuring that components do notremove a tuple from the shared repository before all theinterested parties have read it.

Event-based systems can be considered as an evolutionof a well-established class of products often called MOMs(Message-Oriented Middleware) [40].7 In MOMs, messagesare sent to explicit queues, which guarantee locationtransparency. Depending on the specific MOM, messagescan be tuples or records. In several MOMs, there can bemultiple consumers for the same message queue. Thisapproach makes MOM similar to Linda. As a consequence,we argue that MOMs exhibit the same problem of Linda. Infact, even if a MOM made it possible to just ªreadº amessage from the queue without removing it, this would bea decision left to the consumer. The delivery of the event toall the interested parties (and only to them) cannot beguaranteed by the platform. Notice that some new genera-tion MOMs (see for instance Oracle AQ) take advantage ofboth the persistency properties of the MOM approach andthe ability of multicasting messages that is typical of event-based infrastructures.

5.1.4 A Word of Wisdom

The list of systems we have presented is not meant to beexhaustive. Instead, it is supposed to provide an over-view of the issues that are being addressed by theexisting approaches. New systems are being releasedevery few months. For instance, Sun, HP, Toshiba, Oracle,and Microsoft have recently delivered their event-basedinfrastructures.

Compared with the other approaches, JEDI can beconsidered an interesting representative of the event-basedinfrastructure category. Even if some other systems providemore powerful event models and subscription approaches,JEDI is interesting for its unique ability of dealing withagent mobility and for its attempt to support Internet-wideapplications. As we have discussed in the previous sections,many of the systems we considered have a distributedarchitecture. However, the approaches adopted to distri-bute subscriptions have been designed to support faulttolerance and load balancing among pools of dispatcherslocated on the same LAN while their applicability to anInternet-wide setting is not discussed. As an exception, theresearchers of the Elvin project explicitly consider anddiscuss this problem, but they do not present any concretesolution. JEDI provides an initial proposal. We are awarethat this proposal needs to be improved and to take intoaccount critical issues such as fault tolerance and security.Still, we feel that it contributes to the identification andanalysis of the problem.

5.2 WFMSs

There are a number of WFMSs that, like OPSS, use event-based communication. These environments differ in thelevel of pervasiveness of event-based communication intheir architecture. SPADE [6], [8] is the first WFMS that hasbeen developed in our group. It uses the event-basedcommunication mechanisms provided by ToolTalk (seeSection 5.1.2) and DECFuse [19] (another FIELD spin-off) tosupport communication between the engine executing theprocess and the external tools. In SPADE, events are used ina quite limited way since they are exploited only to supportthe interaction between the centralized process engine andthe tools, but are not used to support other interactions inthe environment, such as those occurring between theprocess engine and the process state repository. Never-theless, event-based communication proved to be a valuableand nonintrusive mechanism for controlling external toolsin a flexible way [5].

ProcessWall and APEL introduce the idea of exploiting astate server to store the relevant information on the process.ProcessWall [25] is a server that provides storage forprocess state and operations for defining and manipulatingthe structure of the state. The applications that actuallyexecute the process operate as clients of such server. Clientsexecute the process activities and invoke the ProcessWalloperations to modify the state of the process according tothe result of their processing. An event dispatching systemis used to notify the interested clients of changes occurredin the state of the process. Differently from OPSS, thecommunication in the opposite direction, from clients to theProcessWall server is point to point. This limits thepossibilities of reconfiguring the system. It is not easy, forinstance, to replicate or distribute the ProcessWall stateserver without affecting its clients.

APEL [16] is an environment developed at IMAG(Grenoble, France). It exhibits several interesting featuressuch as a high-level graphical interface that incorporatesdifferent paradigms for process modeling (activity-basedand document-based). The aspect that is particularly relevantwithin the context of this paper is the APEL architecture and

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 847

7. Some authors consider MOMs as including both message queues andpublish/subscribe (i.e., event-based) infrastructures. In this paper, we referto MOMs as including only message queues as in [40].

its underlying technology. APEL is centered on an eventserver and a state server that jointly offer a service similar toProcess Wall. The event server distributes events to the othercomponents of the architecture. The requests from thesecomponents to the process server are accomplished byexploiting CORBA facilities. A similar architectural ap-proach is also implemented in PEACE+ [32].

Two other interesting environments are Serendipity [24]and PROSYT [14]. Both of them allow the execution of theprocess to be distributed over a wide area network and usethe event-based approach to support communicationamong distributed process engines. Differently from OPSS,both Serendipity and PROSYT do not store the state of theprocess separately from process engines. This results in thefact that when process engines connect (or reconnect) to thesystem, they explicitly synchronize with all the otherprocess engines. The management of such kinds ofsynchronization can be heavy and cumbersome as thenumber of engines grows.

Serendipity is more sophisticated than OPSS as far as themechanisms for defining distributed process models areconcerned. Also, it supports a temporary disconnection ofprocess engines. Even if, in principle, this is also possible inOPSS, we did not implement this feature yet. As for theevent-based communication, in Serendipity it is implemen-ted in an ad hoc way since system elements are connectedby point-to-point communication channels and all thefeatures concerning event publication and delivery areimplemented as part of Serendipity itself.

As OPSS PROSYT exploits the JEDI framework todistribute process model enactment. PROSYT has beenimplemented at Politecnico di Milano in parallel with OPSSand has contributed to highlight the pros and cons of theevent-based communication paradigm described in theprevious sections. Differently from OPSS, PROSYT focuseson providing the proper mechanisms to allow humans todeviate from the modeled process and to keep track of suchdeviations. The exploitation of JEDI as an underlininginfrastructure allows PROSYT to better monitor the actionsperformed by users on tools, thus limiting the occurrence ofdeviating actions outside the control of the system.

The interest of exploiting events in WFMSs is growingeven in the industrial context. Companies such as Netscapeand Oracle are actively participating in the IETF SWAP(Simple Workflow Access Protocol) working group thataims at defining a protocol for supporting communicationbetween process engines in an Internet-wide environment[54]. The protocol is based on HTTP and defines two mainroles for components: the process instance and the observer.The process instance is any process fragment that is beingexecuted. A process instance allows external components tostart, stop, resume, and terminate its execution. Observerscontrol the execution of process instances and are notifiedof their termination. The protocol assumes that suchnotification can be delivered using a general-purposeevent-based infrastructure whose specification is consid-ered to be outside the scope of SWAP. Besides the obvioustechnological diversities, the communication protocol de-fined in OPSS is more complex than the one proposed bySWAP. In OPSS, events are not just limited to notify the

termination of an activity, but are generated in several othersituations that can be defined by process modelers.

Endevours and OzWeb are two interesting web-basedWFMSs. They share with OPSS the possibility of dis-tributing process execution even if they are not based onthe event-based communication paradigm. Endevours[11], [28] supports distribution of process execution,lightweight installation and reconfiguration, and easyintegration of process fragment interpreters with toolsand hyperwebs of artifacts. Its architecture is composed ofthree main levels: the user level that is in charge ofmanaging the interaction with users, the system level thatdefines the main process abstractions (e.g., activities,artifacts, ...), and the foundation level that manages objectpersistency and distribution. The foundation level mayinteract with a number of HTTP servers (through theHTTP protocol) to operate on distributed process artifacts.Different Endevours installation can interact with thesame HTTP server. The server exploits a locking policy toprevent the installations to access an artifact when it is inan inconsistent state. Both Endevours and OPSS provide adecentralized execution of processes, i.e., they exploitmultiple process engines. The main difference is thatEndevours does not rely on the event-based approach tocoordinate the interaction of different engines (interpretersin the Endevours terminology): They interact by sharingthe artifacts and information stored in a passive repositorycontrolled by HTTP servers.

In OzWeb [27], a workflow support is introduced in thecontext of a subweb. A subweb is a collection of hyperlinksto web documents. To each hyperlink information such asthe content type and the access mode to the correspondingdocument is associated. Users access the subweb docu-ments using standard web browsers configured to use asubweb proxy as a mediator for all their communications.The proxy forwards the requests concerning the subweb toa subweb server. This server acts as a workflow service andchecks if the operation corresponding to the requests can beperformed based on the current state of the process. Theexecution of an operation in the subweb server can alsotrigger the automatic execution of other process fragments.The interesting aspect of OzWeb is its capability ofenhancing the behavior of web-based systems still main-taining their simplicity and worldwide accessibility.

6 CONCLUSION

In this paper, we have illustrated the experiences andlessons learned from the development of JEDI, an event-based infrastructure for the development of complexdistributed systems. JEDI exploits the notion of eventand adopts standard Internet technologies to provide thesoftware developer with a programming frameworkwhere multiple active objects cooperate by generatingand consuming events. JEDI offers a simple set ofmechanisms to create mobile active objects that inter-operate by exchanging events on the Internet scale. Theentire architecture is based on very simple and orthogo-nal concepts. Events are asynchronously distributed tosubscribers. All the operations related to event subscrip-tion and event notification are managed in a highlydynamic and flexible way.

848 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001

JEDI has been used to implement a significant exampleof a distributed system, namely, the OPSS ProcessSupport System. OPSS is a significant example of adistributed system whose development has greatly bene-fited from the availability of an event-based infrastruc-ture. By exploiting JEDI features, OPSS can offer anextremely flexible and dynamically changeable supportfor workflow management.

The main lessons we have learned from the workdescribed in this paper indicate that the event-basedapproach nicely complements traditional RPC and conven-tional point-to-point communication techniques, and it issuited when distributed components need to interactasynchronously and preserve anonymity. These advantagesare also demonstrated by the growing interest in thistechnology of both academia and industry. Nevertheless, anumber of technological issues concerning event-basedarchitectures have to be explored. In this respect, we arguethat the most critical issue is the identification of appro-priate design and implementation strategies that make itpossible to integrate different (and sometime conflicting)features such as Internet-wide scalability, enhanced eventmodel (e.g., object-oriented), synchronous and asynchro-nous event handling mechanisms, and event filtering.Moreover, we still miss effective methodological guidelinesto guide and support the design of event-based systems.With respect to these issues, we are currently addressingseveral aspects that we consider critical impediments to theeffective exploitation of the event-based architectural style.In particular, we are introducing extensions to the existingJEDI event model and operations. The main purpose ofthese extensions is to support return values. Also, we areenriching the structure of events in order to provide a moreflexible way to specify the information associated to anevent. The impact of these extensions is particularly criticalsince they have to be combined with other existing featuresof JEDI: mobility and distribution of the event dispatcher.Indeed, we have identified different strategies to implementthese extensions and we are currently evaluating theirimplementation cost and performance. To achieve this goal,we plan to reuse and further extend the work carried out atthe University of Boulder on the evaluation of differentarchitectures for the event dispatcher [45].

ACKNOWLEDGMENTS

The authors would like to thank Antonio Carzaniga, CarloGhezzi, Dennis Heimbigner, David Rosenblum, and AlexWolf for their important contribution to the accomplish-ment of the work described in this paper. They also wish tothank S. Beretta, C. Colombo, F. Coda, S. Montaruli,S. Sargenti, E. Tracanella, and F. VadalaÁ who provided anessential support in the development and implementationof JEDI and OPSS. OPSS development has been funded byTelecom Italia under a contract managed by ArmandoLimongiello. The views and the conclusions contained inthis document are those of the authors and should not beinterpreted as representing the official policies, eitherexpressed or implied, of Telecom Italia. Elisabetta Di Nittoand Alfonso Fuggetta have been partially supported by theUniversity of California, Irvine. Alfonso Fuggetta has beenalso supported by CNR.

REFERENCES

[1] A. Aiken, J.M. Hellerstein, and J. Widom, ªStatic AnalysisTechniques for Predicting the Behavior of Active Database Rules,ºACM Trans. Database Systems (ACM TODS), vol. 20, no. 1, pp. 3-41,1995.

[2] K. Alho, C. Lassenius, and R. Sulonen, ªProcess EnactmentSupport in a Distributed Environment,º Proc. IEEE Fourth Work-shop Enabling Technologies: Infrastructure for Collaborative Enter-prises, WET ICE `95, Apr. 1995.

[3] V. Ambriola, R. Conradi, and A. Fuggetta, ªAssessing Process-Centered Environments,º ACM Trans. Software Eng. and Methodol-ogy, vol. 6, no. 3, July 1997.

[4] D. Arnold, B. Segall, J. Boot, A. Bond, M. Lloyd, and S. Kaplan,ªDiscourse with Disposable Computers: How and Why You WillTalk to Your Tomatoes,º Proc. Usenix Work Embedded Systems(ES '99), Mar. 1999.

[5] S. Bandinelli, E. Di Nitto, and A. Fuggetta, ªSupporting Coopera-tion in the SPADE-1 Environment,º IEEE Trans. Software Eng.,vol. 22, no. 12, Dec. 1996.

[6] S. Bandinelli, A. Fuggetta, and C. Ghezzi, ªProcess ModelEvolution in the SPADE Environment,º IEEE Trans. SoftwareEng., Dec. 1993.

[7] G. Banavar, T. Chandra, B. Mukherjee, J. Nagarajarao, R.E. Strom,and D.C. Sturman, ªAn Efficient Multicast Protocol for Content-Based Publish-Subscribe Systems,º Proc. ICDCS '99ÐInt'l Conf.Distributed Computing Systems, 1999.

[8] S. Bandinelli, A. Fuggetta, C. Ghezzi, and L. Lavazza, ªSPADE: AnEnvironment for Software Process Analysis, Design, and Enact-ment,º Software Process Modelling and Technology, A. Finkelstein,J. Kramer, and B. Nuseibeh, eds., 1994.

[9] D.J. Barrett, L.A. Clarke, P.L. Tarr, and A.E. Wise, ªA Frameworkfor Event-Based Software Integration,º ACM Trans. Software Eng.and Methodology, vol. 5, no. 4, Oct. 1996.

[10] K.P. Birman and T.A. Joseph, ªReliable Communication inPresence of Failures,º ACM Trans. Computer Systems, vol. 5, no. 1,Feb. 1987.

[11] G.A. Bolcer and R.N. Taylor, ªEndevours: A Process SystemIntegration Infrastructure,º Proc. IRUS Conf. Software ProcessImprovement, Practice, and Experience, Jan. 1997.

[12] N. Carriero and D. Gelernter, ªLinda in Context,º Comm. ACM,vol. 32, no. 4, Apr. 1989.

[13] S. Ceri, E. Di Nitto, A. Discenza, F. Fuggetta, and G. Valetto,ªDERPA: A Generic Distributed Event-Based Reactive ProcessingArchitecture,º technical report, Centre for Research and Trainingin Information Technology, Milan, Italy, 1998.

[14] G. Cugola, ªTolerating Deviations in Process Support Systems ViaFlexible Enactment of Process Models,º IEEE Trans. Software Eng.,vol. 24, no. 11, Nov. 1998.

[15] G. Cugola, E. Di Nitto, and A. Fuggetta, ªExploting an Event-Based Infrastructure to Develop Complex Distributed Systems,ºProc. 20th Int'l Conf. Software Eng. (ICSE 98), Apr. 1998.

[16] S. Dami, J. Estublier, and M. Amiour, ªAPEL: A Graphical YetExecutable Formalism for Process Modeling,º Automated SoftwareEng. J., vol. 5, no. 1, Jan. 1998.

[17] M. Decina, E. Di Nitto, A. Fuggetta, V. Trecordi, and J. Wojtowicz,ªORCHESTRA: A Retailing Infrastructure for Network-WideServices,º internal report, Centre for Research and Training inInformation Technology, Milan, Italy, 1998.

[18] P.S. deWitte and C. Pourteau, ªIDEF Enterprise EngineeringMethodologies Support Simulation,º Magazine ManufacturingSystems: Information Technology for Manufacturing Managers,pp. 70-75, Mar. 1997.

[19] Digital Equipment Corporation, ªDEC FUSE HandbookÐVersion1.1,º Maynard, Mass., Dec. 1991.

[20] G. Eddon and H. Eddon, Inside Distributed COM. Redmond,Wash.: Microsoft Press, 1998.

[21] P. Fraternali and L. Tanca, ªA Structured Approach for theDefinition of the Semantics of the Active Databases,º ACM Trans.Database Systems, 1995.

[22] A. Fuggetta, G.P. Picco, and G. Vigna, ªUnderstanding CodeMobility,º IEEE Trans. Software Eng., May 1998.

[23] D. Georgakopoulos, M. Hornick, and A. Sheth, ªAn Overview ofWorkflow Management: From Process Modeling to WorkflowAutomation Infrastructure,º Distributed and Parallel Databases,no. 3, pp. 119-153, 1995.

CUGOLA ET AL.: THE JEDI EVENT-BASED INFRASTRUCTURE AND ITS APPLICATION TO THE DEVELOPMENT OF THE OPSS WFMS 849

[24] J.C. Grundy, M.D. Apperley, J.G. Hosking, and W.B. Mugridge,ªA Decentralized Architecture for Software Process Modeling andEnactment,º IEEE Internet Computing, Sept./Oct. 1998.

[25] D. Heimbigner, ªThe ProcessWall: A Process Server Approach toProcess Programming,º Proc. Fifth ACM/SIGSOFT Conf. SoftwareDevelopment Environments, pp. 9-11, Dec. 1992.

[26] M. Ramalho, ªIntra and Inter-Domain Multicast Routing Proto-cols: A Survey and Taxonomy,º IEEE Comm. Surveys and Tutorials,Vol. 3, no. 1, Jan./Mar. 2000.

[27] G.E. Kaiser, S.E. Dossick, W. Jiang, J.J. Yang, and S.X. Ye, ªWWW-Based Collaboration Environments with Distributed Tool Ser-vices,º World Wide Web, vol. 1, pp. 3-25, Jan. 1998.

[28] P.J. Kammer, G.A. Bolcer, R.N. Taylor, and A.S. Hitomi,ªSupporting Distributed Workflow Using HTTP,º Proc. Fifth Int'lConf. Software Process (ICSP' 5), June 1998.

[29] B. Krishnamurthy and D.S. Rosenblum, ªYeast: A GeneralPurpose Event-Action System,º IEEE Trans. Software Eng., vol. 21,no. 10, Oct. 1995.

[30] L. Lamport, ªTime, Clocks, and the Ordering of Events in aDistributed System,º Comm. ACM, vol. 21, no. 7, pp. 558-565, 1978.

[31] D.B. Lange and D.T. Chang, ªIBM Aglets WorkbenchÐProgram-ming Mobile Agents in Java,º IBM White Paper, Feb. 1997.

[32] S. Latrous and F. Oquendo, ªA Reflective Multi-Agent System forSoftware Process Enaction and Evolution,º Proc. First Int'l Conf.Practical Application of Intelligent Agents and Multi-Agent Technol-ogy, Apr. 1996.

[33] T.J. Lehman, S.W. McLaughry, and P. Wyckoff, ªTSpaces: TheNext Wave,º Proc. Hawaii Int'l Conf. System Sciences (HICSS-32),Jan. 1999.

[34] A. Limongiello, R. Melen, M. Roccuzzo, V. Trecordi, and J.Wojtowicz, ªAn Experimental Open Architecture to SupportMultimedia Services Based on CORBA, Java and WWW Technol-ogies,º IS&N '97, May 1997.

[35] M. Mansouri-Samani and M. Sloman, ªGEM A Generalized EventMonitoring Language for Distributed Systems,º IEE/IOP/BCSDistributed Systems Eng. J., vol. 4, no. 2, June 1997.

[36] Object Management Group, ªCORBA/IIOP 2.2 Specifica-tion,ºftp://ftp.omg.org/pub/docs/formal/98-07-01.pdf, Feb.1998.

[37] Object Management Group, ªCORBAservices: Common ObjectServices Specification,ºftp://ftp.omg.org/pub/docs/formal/97-07-04.pdf, July 1997.

[38] Object Management Group, ªNotification Service,ºOMG TCDocument telecom/99-07-01, http://www.omg.org/docs/tele-com/98-06-17.pdf, Aug. 1999.

[39] P. Oreizy, N. Medvidovic, and R.N. Taylor, ªArchitectureBasedRuntime Software Evolution,º Proc. 20th Int'l Conf. Software Eng.1998 (ICSE '98), Apr. 1998.

[40] OVUM, ªOVUM Evaluates: Middleware,ºOVUM Ltd., 1996.[41] D. Piantanida and E. Sanvito, ªJAMESÐJava Meeting Scheduler,º

Master Thesis (in Italian), Politecnico di Milano, Dipartimento diElettronica e Informazione, 1999.

[42] G.P. Picco, ª�Code: A Lightweight and Flexible Mobile CodeToolkit,º Proc. Second Int'l Work Mobile Agents (MA '98),K. Rothermel and F. Hohl eds., vol. 1477, pp. 160-171, Sept. 1998.

[43] S.P. Reiss, ªConnecting Tools Using Message Passing in the FieldEnvironment,º IEEE Software, July 1990.

[44] D.S. Rosenblum and A.L. Wolf, ªA Design Framework forInternet-Scale Event Observation and Notification,º Proc. SixthEuropean Software Eng. Conf., (Joint with SIGSOFT '98, Foundationsof Software Eng.), Sept. 1997.

[45] D.S. Rosenblum, A.L. Wolf, and A. Carzaniga, ªCritical Con-siderations and Designs for Internet-Scale, Event-Based Composi-tional Architectures,º Workshop Compositional SoftwareArchitectures, Jan. 1998.

[46] B. Segall and D. Arnold, ªElvin has Left the Building: A Publish/Subscribe Notification Service with Quencing,º Proc. AustralianUNIX and Open Systems User Group Conf. AUUG '97, Sept. 1997.

[47] SunSoft Inc., ªIntegrating Applications with the SPARCworks3.0.1 Toolset,ºSPARKworks Technical White Paper, TechnicalReport 801-4629-01, Jan. 1993.

[48] Sun Microsystems, ªJavaBeans,ºtechnical report, Sun Microsys-tem, July 1998.

[49] Sun Microsystem, ªJava Message Service Specification,ºtechnicalreport, Sun Microsystems, Nov. 1999.

[50] Sun Microsystems ªJava Object Serialization Specification,ºtech-nical report, ftp://ftp.javasoft.com/docs/jdk1.2/serial-spec-JDK1.2.pdf, Nov. 1998.

[51] Sun Microsystems, ªJava Remote Method Invocation Specifica-tion,ºftp://ftp.javasoft.com/docs/jdk1.1/rmi-spec.pdf, Feb. 1997.

[52] Sun Microsystems, ªThe JavaSpaces Specification,ºhttp://www.sun.com/jini/specs/js101.html, Nov. 1999.

[53] S.M. Sutton, D. Heimbigner, and L.J. Osterweil, ªAPPL/A: ALanguage for Software-Process Programming,º ACM Trans. Soft-ware Eng. Methodology, vol. 4, no. 3, July 1995.

[54] K. Swenson, ªSimple Workflow Access Protocol (SWAP),ºInternet Draft, Aug. 1998, http://www.ietf.org/internet-drafts/draft-swenson-swap-prot-00.txt.

[55] Talarian Corporation ªMission Critical Interprocess Communica-tionsÐAn Introduction to Smartsockets,ºWhite paper, 1997.

[56] R.N. Taylor, N. Medvidovic, K.M. Anderson, E.J. Whitehead Jr.,J.E. Robbins, K.A. Nies, P. Oreizy, and D.L. Dubrow, ªAComponent-Based Architectural Style for GUI Software,º IEEETrans. Software Eng., vol. 22, no. 6, June 1996.

[57] TIBCO ªTIB/Rendezvous,ºWhite Paper, http://www.rv.tibco.-com/rvwhitepaper.html.

[58] K.S. Yap, P. Tripathi, and S. Tripathi, ªFault Tolerant RemoteProcedure Call,º Proc. Eigth Int'l Conf. Distributed ComputingSystem, June 1988.

[59] X. Wang, H. Zhao, and J. Zhu, ªGRPC: A CommunicationCooperation Mechanism in Distributed Systems,º ACM OperatingSystem Review, vol. 27, no. 3, 1993.

[60] WISEN 98, 1998 Workshop Interned Scale Event Notification, IrvineResearch Unit on Software (IRUS), http://www.ics.uci.edu/wisen, July 1998.

[61] The Workflow Management Coalition, ªThe Workflow ReferenceModel,ºWFMC-TC-1003, ver. 1.1, http://www.aiim.org/wfmc/DOCS/refmodel/rmv1-16.html, Nov. 1994.

Gianpaolo Cugola received the PhD degree ininformation and automation engineering fromPolitecnico di Milano with a thesis titled ªIncon-sistencies and Deviations in Process SupportSystems.º The thesis was awarded in August of1998 by the Dimitri N. Chorafas Foundation.Since October 1997 he has been an assistantprofessor at the University of Italian Swiss inLugano.

Elisabetta Di Nitto received the PhD degree ininformation and automation engineering fromPolitecnico di Milano, in February 1996, with athesis on software engineering, in particular, onprocess-centered software engineering environ-ments. She is currently an assistant professor atPolitecnico di Milano and a researcher atCEFRIEL in the software engineering area.

Alfonso Fuggetta graduated with a degree inelectronic engineering from Politecnico di Milanoin 1982. He is a full professor at Politecnico diMilano and the deputy-director of CEFRIEL, aresearch and education institute established bymajor information technology industries, univer-sities, and public administrations. Currently, heis a visiting professor at the University ofCalifornia at Irvine.

. For more information on this or any computing topic, please visitour Digital Library at http://computer.org/publications/dlib.

850 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 27, NO. 9, SEPTEMBER 2001