Spatio-Temporal Composition and Indexing for Large Multimedia Applications

16
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/225745585 Spatio-temporal composition and indexing for large multimedia applications Article in Multimedia Systems · July 1998 DOI: 10.1007/s005300050094 · Source: DBLP CITATIONS 142 READS 55 3 authors, including: Yannis Theodoridis University of Piraeus 239 PUBLICATIONS 6,194 CITATIONS SEE PROFILE Timos Sellis Swinburne University of Technology 356 PUBLICATIONS 10,059 CITATIONS SEE PROFILE All content following this page was uploaded by Yannis Theodoridis on 04 December 2016. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately.

Transcript of Spatio-Temporal Composition and Indexing for Large Multimedia Applications

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/225745585

Spatio-temporalcompositionandindexingforlargemultimediaapplications

ArticleinMultimediaSystems·July1998

DOI:10.1007/s005300050094·Source:DBLP

CITATIONS

142

READS

55

3authors,including:

YannisTheodoridis

UniversityofPiraeus

239PUBLICATIONS6,194CITATIONS

SEEPROFILE

TimosSellis

SwinburneUniversityofTechnology

356PUBLICATIONS10,059CITATIONS

SEEPROFILE

AllcontentfollowingthispagewasuploadedbyYannisTheodoridison04December2016.

Theuserhasrequestedenhancementofthedownloadedfile.Allin-textreferencesunderlinedinblueareaddedtotheoriginaldocument

andarelinkedtopublicationsonResearchGate,lettingyouaccessandreadthemimmediately.

Multimedia Systems 6: 284–298 (1998) Multimedia Systemsc© Springer-Verlag 1998

Spatio-temporal composition and indexingfor large multimedia applicationsMichael Vazirgiannis, Yannis Theodoridis, Timos Sellis

Computer Science Division, Department of Electrical and Computer Engineering, National Technical University of Athens, Zographou, GR-15773 Athens,Greece; e-mail:{mvazirg, theodor, timos}@cs.ntua.gr

Abstract. Multimedia applications usually involve a largenumber of multimedia objects (texts, images, sounds, etc.).An important issue in this context is the specification of spa-tial and temporal relationships among these objects. In thispaper we define such a model, based on a set of spatial andtemporal relationships between objects participating in mul-timedia applications. Our work exploits existing approachesfor spatial and temporal relationships. We extend these rela-tionships in order to cover the specific requirements of multi-media applications and we integrate the results in a uniformframework for spatio-temporal composition representation.Another issue is the efficient handling of queries related tothe spatio-temporal relationships among the objects duringthe authoring process. Such queries may be very costly andappropriate indexing schemes are needed so as to handlethem efficiently. We propose efficient such schemes, basedon multidimensional (spatial) data structures, for large mul-timedia applications that involve thousands of objects. Eval-uation models of the proposed schemes are also presented,as well as hints for the selection of the most appropriate one,according to the multimedia author’s requirements.

1 Introduction

A multimedia application (MAP) involves a variety of indi-vidual multimedia objects presented according to the MAPscenario. The multimedia objects that participate in a MAPare transformed either spatially or temporally in order to bepresented according to the author’s requirements. Moreover,the author has to define the spatial and temporal ordering ofobjects within the application context and define the relation-ships among them. Finally, the way that users will interactwith the application as well as the way that the applicationwill treat application or system events have to be defined.

Real-world MAPs may be very large and complex withrespect to the number of involved objects, transformations ofthe objects in the scope of an application, and relationshipsamong them. We consider a MAP as a container that includes

Correspondence to: M. Vazirgiannis

objects that are transformed and interrelated in the MAPcontext. It is obvious that in a complex MAP it may be verydifficult to describe all possible functionality and paths theuser or the application may follow. Therefore, one can thinkof a MAP as anevent-basedenvironment, in which there isa rich set of events that may occur and define its flow. Forinstance, the end of a video clip, the spatial coincidence oftwo objects in the application window, or the occurrence ofa pattern in a media object are events that may be exploitedto trigger other actions in an application.

A crucial part MAPs’ modeling is related to temporal andspatial composition of objects in the context of the applica-tion. The well-known sets of temporal [Hamb72, Alle83]and topological [Egen91] relationships are not adequate torepresent all different semantics of multimedia objects com-position, since they do not convey this kind of information.For instance, the topological relationship ‘disjoint’ betweentwo spatial objects A and B (as in Fig. 1a) does not suffice torepresent the their relative position as well as their relativedistance (i.e., B is on the right side of A at a distance of8 cm). Another similar example is the successive presenta-tion of two sounds (A1, A2) with a temporal gap of 8 s (seeFig. 1b).

It is desirable that the spatial and temporal specifica-tions, defined by the author in a high-level GUI, are ex-plicitly transformed to a uniform representation that retainsthe spatio-temporal relationships among the objects. Thus,the need for high-level declarative representation of multi-media object composition arises. Authoring complex MAPs(for instance, 3D synthetic movies [Krie96]) that involve alarge number of objects (typically≥ 104) may be a verycomplicated task, keeping in mind the large set of possiblespatio-temporal relationships that may be encountered in theapplication context. Normally, in a 90-min synthetic movie,the number of participating objects is expected to be 104 ormore with respect to the order of magnitude. Taking intoaccount the vast number of possible events and their com-binations based on (user and object) interaction, the numberof the entities that have to be managed by the MAP authorsis considerable.

A powerful authoring procedure should provide the toolsfor declarative high-level complete specification of the MAP.

285

Fig. 1a,b. Simple media spatial and temporal relationships in the contextof a MAP

Such tools are based on languages that reflect the under-lying model primitives. Our objective is the definition ofa model that could fulfill this need and also support thequeries submitted by the authors for verification purposesduring MAP development. Thus, during the development ofa digital movie, the authors/directors would perhaps submitqueries related to:

– spatial (screen) layout at a specific time instance duringthe movie,

– temporal layout of the movie in terms of temporal inter-vals,

– spatio-temporal relationships among objects (actors) (i.e.,“does object A spatially overlap with object B?” or “ whichobjects temporally overlap with object A?”)

In this paper, we define a model for the representation ofspatio-temporal composition in MAPs and we also proposeand evaluate indexing schemes to support queries related tothe spatio-temporal content of a MAP.

The model is based on a set of spatial and temporal re-lationships between media objects. Our work exploits exist-ing approaches for spatial [Papa97] relationships. We extendthose relationships in order to cover the specific require-ments of MAPs and we integrate the results in a uniformframework for spatio-temporal composition representation.Moreover, we define a set of operators for representation oftemporal compositions. These operators aredeclarativeandcomplete(i.e., able to represent any temporal scenario andconvey semantics about the temporal composition).

We also propose indexing schemes (i.e., disk-residentstructures organizing spatio-temporal features of media ob-jects) for large MAPs in order to assist authors:

– manage the large number of objects in the MAP underdevelopment,

– acquire spatial and temporal layouts of the MAP underdevelopment, for verification purposes,

– submit queries regarding spatio-temporal relationshipsamong objects.

The proposed indexing schemes are based on the R-tree in-dex [Gutt84], which is widely used for indexing of spatialdata in several applications, such as geographic informa-tion systems (GIS), CAD and VLSI design, etc. We adaptR-trees in order to index either spatial, temporal or spatio-temporal occurrences of objects and relationships betweenthem. Moreover, we evaluate the proposed schemes againsttwo simple indexing cases: the primitive one, based on se-rial storage of objects’ spatio-temporal coordinates, and asimple indexing scheme, which keeps disk-resident arraysof pre-sorted object coordinates according to each direction

(i.e., lower x- or y-coordinate and start point at the t-axis).We also provide hints to multimedia database designers, inorder to select the most efficient scheme according to therequirements of MAP authors.

In the literature, there is no previous work, according toour knowledge, on indexing spatio-temporal characteristicsof MAPs. Research has mainly focused oncontent-basedim-age indexing, i.e., fast retrieval of objects using their contentcharacteristics (color, texture, shape). In [Falo94a], a sys-tem, calledQBIC, (coupling several features from machinevision with fast indexing methods from the database area)supports color-, shape- and texture-matching queries. Near-est neighbor queries (based on image content) are addressedin [Chiu94]. In general, indexing multimedia objects’ con-tents is an active research area, while indexing objects’ ex-tents in the spatio-temporal coordinate system sets a newdirection. In this paper, we do not consider some aspects ofMAPs, such as interaction handling, network and distribu-tion requirements and scenario rendering. These issues maybe confronted during the implementation of such a system.Moreover, the proposed model refers mostly to specificationand retrieval rather than to storage, and execution of a MAP.

The paper is organized as follows: In Sect. 2, we presentbackground work on temporal and spatial relationships tobe exploited. Furthermore, we discuss the requirements thata, specific to MAPs, spatio-temporal composition schemeshould fulfill. In Sect. 3, we discuss and define the spatio-temporal relationships and a generic composition model forMAPs. In addition, we present a sample MAP and its rep-resentation according to the proposed model. In Sect. 4, wepropose indexing schemes for MAPs to support queries in-volving the operators introduced in Sect. 3. In particular,we propose a simple one, based on sorted arrays and twomore sophisticated ones, based on the R-tree spatial indexstructure, in order to support these operators. In Sect. 5, weevaluate analytically the proposed schemes for some rea-sonable spatio-temporal MAP configuration. We concludein Sect. 6, by summarizing our work and giving hints forfuture research.

2 Related work

In the past, the termsynchronizationhas been widely used todescribe the temporal ordering of objects in a MAP [Litt93].A MAP specification should describe both temporal and spa-tial ordering of objects in the context of the application. Thespatial ordering (i.e., absolute positioning and spatial rela-tionships among objects) issues have not been adequately ad-dressed. We claim that the term “synchronization” is poor forMAPs. Instead we propose the term “composition” to repre-sent both temporal and spatial ordering of objects. Hereafter,we review existing systems and approaches for temporal andspatial composition.

2.1 Temporal composition

Many existing models for temporal composition of multi-media objects in the framework of a MAP are based on 13relations defined in [Hamb72, Alle83]:before, meets, during,

286

overlaps, starts, ends, equaland the inverse ones (does notapply to equal). These relations are not adequate for tem-poral composition description. They are descriptive, hence,they do not reflect causal dependencies between intervals.They depend on interval durations and may lead to tempo-ral inconsistency. More specifically, the problems that mayarise when trying to use these relations are the following[Duda95].

– The relations are designed to express relationships be-tween intervals of fixed duration. In the case of MAPs, itis required that a relationship holds independently fromthe duration of the related object (i.e., the relationshipshould not change when the duration changes).

– Their descriptive character does not convey the causeand the result in a relationship.

Other models for temporal composition representation maybe classified in two categories [Duda95]:point-basedandinterval-based. In point-based models, the elementary unitsare points in time and space. Each event has an asso-ciated time point. The time points arranged according tosome relations (such as “precede”, “simultaneous” or “af-ter”) form complex multimedia presentations. An exampleof the point-based approach is thetimeline. Interval-basedmodels consider elementary media entities as temporal in-tervals, ordered according to some relations. Existing mod-els are mainly based on the relations defined in [Hamb72,Alle83] for expressing knowledge about time.

An interesting mechanism for temporal composition ispresented in [Duda95]. This work presents a model that takesinto account the semantics of temporal relationships betweenobjects. The resulting set of operators represent the causalrelations between intervals. In [Hirz95], a temporal modelfor interactive scenarios is presented. This model is basedon the timeline approach and provides the primitives forspecification of synchronous and asynchronous interactivemultimedia temporal compositions. The timeline approachis extended to a tree of timelines. Each branch of timelinesrepresents the different scenarios that may be selected by theuser.

Other approaches use Allen’s relations [Alle83] to spec-ify a multimedia database schema. Little and Ghafoor[Litt93] propose an OCPN (object composition Petri nets)model equivalent to Allen’s relations. This approach doesnot take into account the possible unknown durations of in-tervals. Thus, in order to prepare an instantiated presentation,the tree of interval relations must be traversed to obtain dead-lines to be used in the presentation schedule. There are alsoother approaches based on interval temporal logic [King94].Although such formalisms have a solid mathematical back-ground, the specification of multimedia presentations is awk-ward, since the specification does not correspond explicitlyto the author’s perception of the multimedia composition.

In [Hand96], a synchronization model is presented. Thismodel covers many aspects of multimedia synchronization,such as: incomplete timing, hierarchical synchronization,complex graph type of presentation structure with optionalpaths, presentation time prediction and event-based synchro-nization. The events are considered as presentations con-strained by unpredictable temporal intervals. There is neitherthe notion of event semantics nor the notion of a composition

scheme. In [Schn96], a presentation synchronization modelis presented. Important concepts introduced and manipulatedby the model are the object states (“Idle”, “Ready”, “In-process”, “finished”, “complete”). Although events are notexplicitly presented, user interactions are treated. There aretwo categories of interaction envisaged: buttons and userskips (“forward”, “backward”).

As referred to in [Blak96], event-based representationof a multimedia scenario is one of the four categories formodeling a multimedia presentation. There, it is mentionedthat events are modeled in HyTime [Newc91, Bufo96] andHyperODA. Events in HyTime are defined as presentationsof media objects along with their presentation specificationsand FCS coordinates. According to [Erfl93], HyTime mod-eling primitives are sufficient for temporal composition rep-resentation. HyperODA events are instantaneous happeningsmainly corresponding to the start and end of media objectsor timers. All these approaches suffer from poor semanticsconveyed by the events, and moreover they do not provideany scheme for composition and consumption architectures.

2.2 Spatial specification and composition

The issue of spatial composition modeling is rather under-addressed in the current multimedia authoring environmentsand synchronization models. One of the few efforts that inte-grate both spatial and temporal aspects is [Iino94], a modelfor spatio-temporal multimedia presentations. The temporalcomposition is handled in terms of Allen’s relationships,whereas spatial aspects are treated in terms of a set of op-erators for binary and unary operations. The model lacksthe following features: there is no indication of the tempo-ral causal relationships (i.e., what are the semantics of thetemporal relationships between the intervals correspondingto multimedia objects). The spatial synchronization essen-tially addresses only two topological relationships:overlapandmeet, giving no representation means for the directionalrelationships between the objects (i.e., object A is to the rightof object B) and the distance information (i.e., object A is10 cm away from object B). The modeling formalism in thisapproach is oriented more towards execution and renderingof the application rather than to authoring.

Moreover, spatial specification is addressed by documentpreparation and modeling systems like Latex, SGML andHyTime. In Latex [Lamp90], spatial specification is feasibleincluding the direction and the distance (i.e., an image tobe placed 2 cm west to another of 10 cm north to the text).The specifications are expressed in terms of “float parame-ters”. HyTime [Newc91, Bufo96] provides a rich specifica-tion spatio-temporal scheme with the FCS (finite coordinatespace) space. FCS provides an abstract coordinate scheme ofarbitrary dimensions and semantics, where the author maylocate the objects of a MAP and define their relationships.Nevertheless, the formalism is awkward and, as practice hasproved, it is rarely used for real-world applications. Hy-time is not an efficient solution for MAP development, since“there are significant representational limitations with regardto interactive behavior, support for scripting language inte-gration, and presentation aspects” [Bufo96].

287

3 Spatio-temporal composition model

As mentioned in the previous section, there is a lack of an in-tegrated approach for representation of all functional aspectsof multimedia presentations. Such an application involves

– transformation of objects, in order to be aligned with thepresentation specifications,

– specification of the composition of objects in space andtime (spatial and temporal ordering through the definitionof relationships among the media objects),

– definition of the application functionality (i.e., the appli-cation scenario) which is of two kinds:pre-orchestratedandevent-based. The termpre-orchestratedimplies thatcertain actions will take place at specific time instants,while event-basedimplies that actions are triggered byevents that occur in the application context either dueto the user or the system or due to entities participatingin the application (media objects, media compositions,etc.). The fundamental entities of the scenario are calledscenario tuples[Vazi96b] and are triggered by the oc-currence of an event (simple or complex).

3.1 Temporal relationships

The topic of relations between temporal intervals was orig-inally addressed by Hamblin [Hamb72] and, later, by Allen[Alle83], as we have already mentioned in Subsect. 2.1. Inthis section, we define a set of concepts to be exploited forthe representation of temporal composition in the context ofa MAP. We consider the presentation of a multimedia objectas a temporal interval (hereafter multimedia instance). Weexploit the start- and end-points of a multimedia instance asevents and distinguish the end of a multimedia instance innatural (i.e., when the media object finishes its presentation)and forced (i.e., when an event explicitly stops the presen-tation of a media object). Furthermore, we are interested inthe well-knownpause(temporary stop of presentation) andresumeactions (start the presentation from the point wherethe pause operation took place).

An important concept is thetemporal instance: we con-sider it as an arbitrary temporal measurement, relative tosome reference point (i.e., the application temporal startingpoint in our case, hereafterE). Based on the above descrip-tions, we define the following operators attached to the cor-responding events.

Definition 1. Let A be a multimedia instance;A> repre-sents the start of the multimedia instance,A< the naturalend of the instance,A! the forced stop,A|| the pause andA|> the resume actions.

Definition 2. Let A, B be two multimedia instances, thenthe expressionAop1 t Bop2 represents all temporal relation-ships between the two multimedia instances, whereop1 ∈{>, <, ||, |>} andop2 ∈ {>, !, ||, |>} andt is a vacant tem-poral interval.

Definition 3. Let A be a multimedia instance; we define astAop temporal instances corresponding to the eventsAop,whereAop ∈ {>, <, !, ||, |>}.

Table 1.Temporal relationships and the corresponding operator expressions

Temporal relationship Equivalent Constraintsoperator expression

Definition 4. Let A be a multimedia instance; we define asdA the temporal duration of the multimedia instanceA.

The above operators are complete in the sense that all tem-poral relationships can be expressed using these operatorsas long as the appropriate conditions are fulfilled (Table 1).In addition, the proposed operators capture the semantics ofthe temporal relationships among the multimedia instances(i.e., A meets B may be expressed asA< 0B> or B> 0A!).Moreover, the proposed set of operators may be used for ahigh-level mechanism of temporal scenario specification.

3.2 Spatial relationships

Another aspect of composition concerns the spatial orderingand topological features of the participating objects. Spatialcomposition aims at representing three aspects:

– the topological relationships between the objects (dis-joint, meet, overlap, etc.),

– the directional relationships between the objects (left,right, above, above-left, etc.)

– the distance/metric relationships between the objects(outside5 cm, inside2 cm, etc.).

As for the first aspect, a complete set of topological rela-tionships between two objects, called4-intersection model,

288

Fig. 2. Directional relationships between two spatial objects (includingtopological information)

was proposed in [Egen91]. Thus, two objectsp, q may co-incide (equal), intersect (overlap), touch externally (meet),touch internally (coversand the reversecoveredby), be in-side (and the reversecontains), or bedisjoint.

Concerning directional relationships, there is a com-plete set of relationships defined in [Papa97] (see Fig. 2).This set of 169 (132) relationshipsRi j (i = 1, . . . , 13 andj = 1, . . . , 13) arises from exhaustive combination of the13 relations defined in [Hamb72, Alle83] regarding rela-tionships between temporal intervals. This set also coverstopological relationships, since any topological relationshipof the 4-intersection model could be expressed as a subsetof the set of 169 relationships [Papa95].

In the context of a MAP, an author would like to placespatial objects (text windows, images, video clips, anima-tion) in the application window in such a way that theirrelationships are clearly defined in a declarative way, i.e.,“ text window A is placed at the location (100, 100), text win-dow B appears 8 cm to the right and 12 cm below the upperside of A” (see Fig. 3). This declarative definition shouldbe transformed into an internal representation that capturesthe topological and directional relationships, as well as thedistance between the objects in a uniform and correct way.In the next subsection, we propose a definition model tosupport these needs.

3.3 The model definition

Current MAP modeling schemes do not provide powerfultools for the complete description of the spatial and temporalcomposition that takes place in a complex application (anoverview of related work was presented in Sect. 2).

We define a set of operators for representing temporaland spatial composition. Here, we have to make the distinc-tion between pre-orchestrated and interactive applications.The term “pre-orchestrated” implies that certain actions willtake place at specific time and/or spatial instants (i.e., tempo-ral location relative to the applications start or spatial loca-tion in the application window), while “event-based” impliesthat actions are triggered by events that occur in the appli-cation context either by the user or the system or by entities

Fig. 3. Spatial composition generalized modeling

participating in the application (media objects, media com-positions, etc.)

The resulting requirement is for a set of operators thatallows users to represent any spatio-temporal relationshipbetween objects in the context of a MAP in a declarativeway. As for temporal composition of objects, we exploit theoperators defined above. As far as it concerns spatial com-position, we are based on the complete set of topological-directional relationships illustrated in Fig. 2, and proposethe following generalized methodology for representing thedistance between two spatial objects1. In order to achieve auniform approach, we impose the constraint that the distancewill be expressed in terms of distance between the ‘closestvertices’. For each spatial objectO, we label its verticesas O.vi (i = 1, 2, 3, 4), starting from the bottom left vertexin a clockwise manner. As “closest”, we define the pair ofvertices (A.vi, B.vj) with the minimum Euclidean distance.

The author of a MAP must be able to express spatialcomposition predicates in an unlimited manner. For instance(see Fig. 3), the author could describe the appearing compo-sition as: “object B to appear 12 cm lower that the upper sideof object A and 8 cm to the right”. The model we proposewill translate such descriptions into minimal and uniformexpressions, as imposed by the requirements for correct andcomplete representations.

For uniformity reasons, we define an object namedΘ,that corresponds to the spatial and temporal start of the ap-plication (i.e., the upper left corner of the application win-dow and the temporal start of the application). Another as-sumption we make is that the objects that appear in thecomposition include their spatio-temporal presentation char-acteristics (i.e., size, duration, etc.) [Vazi95]. In the rest ofthis section, we exploit the EBNF formalism to represent themodel primitives.

Definition 1. Assuming two spatial objectsA, B, we definethe generalized spatial relationship between these objects as:S R = (rij , vi, vj , x, y), where rij is the identifier of thetopological-directional relationship betweenA and B (de-rived from Fig. 2),vi, vj are the closest vertices ofA andB, respectively, andx, y are the horizontal and vertical dis-tances betweenvi, vj .

Next, we define a generalized operator expression tocover the spatial and temporal relationships between objectsin the context of a MAP. It is important to stress the factthat, in some cases, we do not need to model a relationshipbetween two objects but have to declare the spatial and/or

1 We assume that spatial objects are rectangles. More complex objectscan also be represented as rectangles by using their minimum boundingrectangle (MBR) approximation.

289

temporal position of an object relative to the application spa-tial and temporal start pointΘ (i.e., object A to appear atthe spatial coordinates (110, 200) on the 10th second of theapplication).

Definition 2. We define a composite spatio-temporal opera-tor that represents absolute spatial/temporal coordinates orspatio-temporal relationships between objects in the applica-tion: ST R(sp rel, temp rel), wheresp rel is a spatial re-lationship (S R), while temp rel is a temporal relationship,as defined in Subsect. 3.1.

The spatio-temporal composition of a MAP consists ofseveral independent fundamental compositions. The term‘independent’ implies that objects participating in them arenot related implicitly (either spatially or temporally), exceptfor their implicit relationship to the start pointΘ. Thus,all compositions are explicitly related toΘ. We call thesecompositionscompositiontuples, and these include spatiallyand/or temporally related objects.

Definition 3. We define the compositiontuple in the contextof a MAP as:composition tuple = Ai[{ST RAj}], whereAi, Aj are objects participating in the application,ST R isa spatio-temporal relationship (as defined in Definition 2).

Definition 4. We define the composition of multimedia ob-jects in the context of MAPs as a set of compositiontuples:composition = Ci{, Cj}, whereCi, Cj are compositiontuples.

The EBNF definition of the spatio-temporal compositionbased on the above definition follows:

composition ==:compositiontuple {[,compositiontuple]}

compositiontuple ==:Θ {[spatio temporalrelationship action]}

action ==:object [{spatiotemporalrelationship object}]

— “(“ object spatiotemporalrelationship object “)”— object— spatiotemporalinstance

spatiotemporalrelationship ==:“[(“[spatial operator — spatialinstance“)”,“(“temporal operator — temporalinstance”)]”

temporaloperator ==:Θ — t event tinterval TAC operationt event ==: “>” — “ <” — “!” — “— >” — “——”TAC operation ==: “>”—“!”—“— >”—“——”spatiotemporalinstance ==:

Θ— (spatialinstance, temporalinstance)

spatialinstance ==:“(“x “,” y “)”

temporalinstance ==:TIME — event

spatialoperator ==:(rij , vi, vj , x, y)

x ==:INTEGER

y ==:INTEGER

Θ ==:application start: (0, 0, 0)

Fig. 4. Spatial composition of the ‘Newsclip’ MAP

whererij denotes a topological-directional relationship (fromFig. 2) between two objects andvi, vj denote the closestvertices of the two objects (see definition above).

In this section, we proposed a model for representingspatio-temporal composition in pre-orchestrated MAPs. Inthe next subsection, we illustrate the potential of the modelby means of a sample application.

3.4 A sample multimedia composition

In this section, we describe a composite MAP correspondingto a TV news clip in terms of spatio-temporal relationshipsas defined above. The high-level scenario of the applicationis the following.

“The News clip starts with presentation of image A (lo-cated at point50, 50 relative to the application originΘ).At the same time, a background music E starts. Ten secondslater a video clip B starts. It appears to the right side (18 cm)and below the upper side of A (12 cm). Just after the end ofB, another MAP starts. This MAP (calledFashion clip)is related to fashion. The Fashionclip consists of a videoclip C that presents the highlights of a fashion show andappears 7 cm below (and left-aligned to) the position of B.Three seconds after the start of C, a text logo D (e.g., the de-signer’s logo) appears inside C, 8 cm above the bottom sideof C, aligned to the right side. D will remain for 4 s on thescreen. Meanwhile, at the 10th second of the News clip, theTV channel logo (F) appears at the bottom-left corner of theapplication window. F disappears after 3 s. The applicationends when music background E ends.”

The spatial composition (screen layout) of the above sce-nario is illustrated in Fig. 4, while the temporal one is illus-trated in Fig. 5.

The objects to be included in a composition tuple of aMAP are those that are spatially and/or temporally related. Inour example (Newsclip), A and B and Fashionclip shouldbe in the same composition tuple, since A relates to B andB relates to Fashionclip. On the other hand, F is not relatedto any other object, neither spatially nor temporally, so itcomposes a different tuple. The above spatial and temporalspecifications defined by the author in a high-level GUI aretransformed into the following representation according tothe model primitives defined in Subsect. 3.3.

// News Clip

290

Fig. 5. Temporal composition of the ‘Newsclip’ MAP

composition ={r1, r2}r1 = Θ [( , , , , ), (>0>)]

E [( , , , , ), (<0!)]News

r2 = Θ [(r1 1, , v2, 50, 50), (>0>)]A [(r11 13, v3, v2, 18, 12), (>10>)]B [(r13 6, v1, v2, 0,−7), (>0>)]Fashionclip

r3 = Θ [( , , v1, 0, 300), (>10>)]F

// Fashion clipcomposition ={r4}r4 = Θ [( , , v2, 0, 0), (>0>)]C [(r9 10, v4, v4, 0, 8), (>3>)]D

It is important to stress thatΘ in composition tupler4 rep-resents the spatio-temporal origin of the Fashion clip. Inthis example, we have a composition of MAPs. It has to bestressed that, when the host MAP (i.e., Newsclip) ends,all the MAPs started by it are also stopped (i.e., Fash-ion clip). There is an issue regarding the mapping of thespatio-temporal specifications into the composition tuples:the classification of involved objects. The proposed proce-dure is the following. For each objectAi, we check whetherit is related to objects already classified in an existing tuple.If the answer is positive,Ai is classified in the appropriatecomposition tuple (a procedure that possibly leads to reorga-nization of the tuples). Otherwise, a new composition tuple,composed byΘ andAi, is created.

The composition model should satisfy the following cri-teria:

– completeness: i.e., the available operators of the modelsuffice for representing any spatio-temporal relationshipbetween objects, and

– correctness: i.e., each specification of a spatial compo-sition si leads to a different representation expressionri.

We claim that the proposed model iscompleteand correct.In particular, it is complete because the set of operators thatwas exploited may represent all spatio-temporal relationshipsamong objects in a MAP. However, we do not provide for-mal proof in this paper. This is an issue of our current re-search.

The objects to be included in a composition tuple arethose that are spatially and/or temporally related to each

other. During the application development process, it is prob-able (especially in the case of complex and large applica-tions) that authors would need information related to thespatio-temporal features of the MAP (TV clip in the case ofthe example). The related queries, depending on the spatio-temporal relationships that are involved, may be classifiedin the following categories.

– pure spatial or temporal query: only a temporal or a spa-tial relationship is involved in the query. For instance,“which objects temporally overlap the presentation of testlogo D?”, “ which objects spatially lie above object D inthe application window?”,

– spatio-temporal query: where such a relationship is in-volved. For instance, “which objects spatially overlapwith object D during its presentation?”.

– layout query: spatial or temporal layouts of the applica-tion. For instance, “what is the screen layout at the 22ndsecond of the application?”, “ which objects are presentedbetween the 10th and the 20th second of the application?”(temporal layout).

A simple serial storage scheme which includes the ob-jects’ spatial and temporal coordinates is an inefficient so-lution, since typical MAPs include thousands of objects.Hence, indexing techniques that would efficiently handlespatial and temporal characteristics of objects need to beadopted. In the next section, we propose such efficient in-dexing mechanisms, in order to support such queries in thecontext of MAP authoring. Here, it should be stressed that,in this research work, we do not deal with queries related tothe content of the objects (content-based queries), but onlyto their spatio-temporal extents and relationships.

4 Indexing techniques for large MAPs

As discussed in previous sections, MAPs usually involvea large number of media objects, such as images, video,sound and text. The quick retrieval of a qualifying set,among the huge amount of data, that satisfies a query basedon spatio-temporal relationships is necessary for the effi-cient construction of a MAP. The multimedia scenario rep-resented in terms of composition tuples essentially representsthe spatio-temporal relationships among the multimedia ob-jects according to the authors requirements. From these re-lationships the absolute objects’ spatio-temporal coordinatesmay be determined. Spatial and temporal features of objectsare then identified by six coordinates: the projections on x-(pointsx1, x2), y- (pointsy1, y2), and t- (pointst1, t2) axes2.A serial storage scheme, maintaining the objects character-istics as a set of seven values (id,x1, x2, y1, y2, t1, t2) andorganizing them into disk pages, is not an efficient solution,since lack of ordering leads to the access of all pages foranswering any query, like the example queries of Sect. 3.However, this scheme will be used as the baseline for theevaluation of our proposals in Sect. 5.

A more efficient, but still simplistic, solution (as will bepresented next) is based on the maintenance of three disk

2 We adopt a unified 3D workspace for space (two dimensions) and time(one dimension) features.

291

Fig. 6. A simple (spatial and temporal) indexing scheme

arrays that keep low coordinates of objects (i.e.,x1, y1, andt1) separately in a sorted order3. Several queries involvingspatio-temporal operators, among the ones presented at theend of Sect. 3, require the retrieval of one array only, using“divide-and-conquer” techniques. Temporal layout queries(such as query 5) belong to this group. However, the major-ity of queries involves information about more than one axis.Hence, the retrieval of more than one array and the subse-quent combination of the answer sets is necessary for suchcases. As a conclusion, efficient indexing mechanisms thatcould combine spatio-temporal characteristics of objects inorder to efficiently support a wide range of spatio-temporaloperators need to be present in a MAP authoring tool. Inthe next subsections, we propose two indexing schemes andtheir retrieval procedures.

4.1 Indexing schemes

4.1.1 A simple spatial and temporal indexing scheme

A simple indexing scheme that could be able to handle spa-tial and temporal characteristics of media objects consists oftwo indices:

– a spatial (2D) indexfor spatial characteristics (id, andx1, x2, y1, y2 values) of the objects, and

– a temporal indexfor temporal characteristics (id, andt1,t2 values) of the objects.

In the literature concerning the area ofspatial databases,several data structures have been proposed for the manipu-lation of spatial data (a survey can be found in [Same90]).Among others, R-trees [Gutt84, Beck90] seem to be themost efficient ones. On the other hand, the manipulationof temporal information can be supported either by one-dimensional versions of the above data structures (since allof them have been designed forn-dimensional space in gen-eral) or by specialized temporal data structures (e.g., segmenttrees [Bent75]). For uniformity reasons, we select a singlemulti-dimensional data structure (R-tree) to play the role ofthe spatial (2D R-tree) and temporal (1D R-tree) index. Theabove indexing scheme is illustrated in Fig. 6.

3 Instead of using low- coordinates one can select high- coordinates (orsix arrays with low- and high- coordinates). It is a decision that affectsneither the discussion that will follow nor its conclusions.

The R-tree is a height-balanced tree which consists ofintermediate and leaf nodes. Objects’ approximations (com-monly MBRs) are assumed to be stored in the leaf nodesof the tree. Intermediate nodes are built by grouping rectan-gles (or hyper-rectangles, in general) at the lower level. Anintermediate node is associated with some rectangle whichencloses all rectangles that correspond to lower level nodes.Formally,

– a leaf nodeis of the form (oid, RECT ), whereoid is anobject identifier and is used to refer to an object in thedatabase andRECT is the MBR approximation of thedata object, i.e., it is of the form (pl−1, pl−2, . . . , pl−n,pu−1, pu−2, . . . , pu−n) which represents the 2n coordi-nates of the lower left (pl) and the upper right (pu) cornerof an n-dimensional (hyper-) rectanglep, and

– an intermediate node is of the form (ptr, RECT ), whereptr is a pointer to a lower level node of the tree andRECT is a representation of the rectangle that enclosesspatially the children nodes.

Currently, the R-tree index is integrated in commercialDBMSs such as ILLUSTRA [Ubel94]. In the case of othertraditional database systems (like ORACLE, SYBASE, etc.),the R-tree code cannot be integrated, thus an interface layershould be implemented so that the R-tree implementationcould communicate with the database. Concerning perfor-mance, in principle, the architecture does not affect the per-formance, since R-tree is a disk-resident structure and theonly factor that is taken into account for estimating perfor-mance is the number of disk accesses.

We claim that the adoption of the above indexing schemeimproves the retrieval of spatio-temporal operators comparedto the “sorted arrays” scheme. Even for complex operators,where both tree indices need to be accessed (e.g., for theoverlapduring operator), the cost of the two indices’ re-sponse times are expected to be lower than the retrieval costof the (three) arrays.

A weak point of the above scheme has been alreadymentioned. The retrieval of objects according to their spatio-temporal relationships (e.g., theoverlapduring one) withothers demands access to both indices and, in a second phase,the computation of the intersection set between the two an-swer sets. Access to both indices is usually costly and, inmany cases, most of the elements of the two answer setsare not found in the intersection set. In other words, mostof the disk accesses to each index separately are useless.An efficient solution to that problem is the merging of thetwo indices (the spatial and the temporal one) in a unifiedmechanism. This scheme is proposed in the next subsection.

4.1.2 A unified spatio-temporal indexing scheme

The proposed unified indexing scheme consists of only oneindex: aspatial (3D) indexfor the complete spatio-temporalinformation (location in space and time coordinates) of theobjects. If we assume that the R-tree is an efficient spatialindexing mechanism, then the unified scheme is illustratedin Fig. 7.

The main advantages of the proposed scheme, whencompared to the previous one, are the following:

292

Fig. 7. A unified (spatio-temporal) indexing scheme

– the indexing mechanism is based on a unified framework.Only one spatial data structure (e.g., the R-tree) needs tobe implemented and maintained.

– Spatio-temporal operators are more efficiently supported.Using the appropriate definitions, spatio-temporal oper-ators are implemented as 3D queries and retrieved usingthe 3D index. So the need for the intersection procedureis eliminated.

The evaluation of the two proposed indexing schemes againsteach other, against the “sorted-arrays” and the serial storageones, will be described in Sect. 5, where analytical modelsthat predict the performance of each scheme will be pre-sented. In the rest of this section, we will describe the re-trieval process of such operators when the unified indexingscheme is available within a MAP authoring tool.

4.2 Retrieval of spatio-temporal operators using R-trees

The majority of multidimensional data structures, such asthe R-tree family, have been designed as extensions of theclassic alphanumeric index, the B-tree. They usually dividethe plane into appropriate sub-regions and store these sub-regions in hierarchical tree structures. Objects are repre-sented in the tree structure by an approximation (the MBRapproximation being the most common one) instead of theiractual scheme, for simplicity and efficiency reasons.

Unfortunately, the relative position of two MBRs doesnot convey full information about the spatial (topological,direction, distance) relationship between the actual objects.For this reason, spatial queries involve the following two-step strategy [Oren86].

– Filter step.The tree structure is used to rapidly eliminateobjects that could not possibly satisfy the query. Theresult of this step is a set of candidates which includesall the results and possibly some false hits.

– Refinement step.Each candidate is examined (by usingcomputational geometry techniques). False hits are de-tected and eliminated.

In order to retrieve objects that belong to the answer setof a spatio-temporal operator, with respect to a referenceobject q, we have to specify the MBRs that could enclosesuch objects and then search the R-tree nodes that couldcontain such MBRs. This technique was proposed and im-plemented in [Papa97], in order to support spatial operatorsof high resolution (e.g.,meet, contains) that are popular inGIS applications.

As an example, Fig. 8 shows how the MBRs correspond-ing to the representations of the objects are grouped andstored in the 3D R-tree of our unified scheme. We assume abranching factor of 4, i.e., each node contains at most fourentries. At the lower level, MBRs of objects are groupedinto two nodesR1 andR2, which, in turn, compose the rootof the index. Assume a spatio-temporal query involving theoverlapduring operator, with D being the reference objectq. In order to answer this query, onlyR2 is selected forpropagation. Among the entries ofR2, objects C and (ob-viously) D are the ones that constitute the qualified answerset. Note that only the right sub-tree of the R-tree index ofFig. 8a was propagated in order to answer the query. Therate of the accessed nodes heavily depends on the size ofthe reference objectq and, of course, on the kind of the op-erator (more selective operators result in smaller number ofaccessed nodes).

Consider now a spatial query involving theoverlapop-erator, with D being the reference objectq. Since the querygives no temporal information on the reference object, theunified scheme transforms it to a large cube that covers thewhole t-axis. In this case, the simple scheme, could be moreefficient, since the 2D R-tree which is dedicated to spatialinformation of objects is able to answer the query. Simi-larly, a query involving theduring operator could also beefficiently supported by the simple scheme.

A special type of queries, which are of interest in MAPauthoring, includesspatial or temporal layout retrieval. Inother words, queries of the type “Find the objects and theirposition on screen at theT0 second” (spatial layout) or “Findthe objects that appear in the application during the(T1, T2)temporal segment and their temporal duration” (temporallayout) need to be supported by the underlying scheme. Aswe will present next, both types of queries are efficientlysupported by the unified scheme, since they correspond totheoverlapduring operator and an appropriate reference ob-ject q: a rectangleq1 that intersects the t-axis at pointT0,or a cubeq2 that overlaps the t-axis at the (T1, T2) segment,respectively. The reference objectsq1 andq2 are illustratedin Fig. 9a. In a second step, the objects that compose the an-swer set are filtered in main memory in order to design theirpositions on the screen (spatial layout) or the intersection oftheir t-projections to the given temporal segment (temporallayout).

The ‘layout’ type of queries could be processed as de-scribed above. In particular, the screen layout at the 22ndsecond of the application, could be obtained by exploitingthe reference objectq1 at the specific time instanceT0 = 22 s.The result would be a list of objects (the identifiers of theobjects, their spatial and temporal coordinates) that are dis-played at that temporal instance on the screen. This resultmay be visualized as a screen snapshot with the objects thatare included in the answer set drawn in that (Fig. 9b). As for

293

a

b

Fig. 8a,b. Retrieval ofoverlapduring operator using 3D R-trees

temporal layout query of Subsect. 3.4, it could be answeredusing a 3D cubeq2 with area (Xmax−0)·(Ymax−0)·(T2−T1)as the reference object, whereXmax · Ymax is the screen areaand (T2 − T1) is the requested temporal interval;T1 = 10and T2 = 20 in our example. The result would be a list ofobjects (the identifiers of the objects, their spatial and tem-poral coordinates) that are included or overlapped with cubeq2. This result can be visualized towards a temporal layoutby drawing the temporal line segments of the retrieved ob-jects that lie within the requested temporal interval (T2−T1)(Fig. 9c).

On the other hand, the simple indexing scheme (consist-ing of two index structures), as well as the ‘sorted-arrays’scheme, are not able to give straightforward answers to theabove layout queries, since information stored in multipleindices needs to be retrieved and combined.

In this section, we proposed several schemes for the in-dexing of objects that appear in MAPs and presented theretrieval procedure that concerns spatio-temporal operatorson these objects. In the next section, all schemes will be an-alytically evaluated and compared to each other. Their com-parison will result in general conclusions on the advantagesand disadvantages of each solution.

5 Estimation of the retrieval cost

We present an analytical model that estimates the perfor-mance of R-trees on the retrieval ofn-dimensional queries.The analytical formula is applicable to both R-tree-basedindexing schemes, if we keep in mind that the simple oneconsists of one 2D R-tree and one 1D R-tree, while the uni-fied one consists of one 3D R-tree. Using this model, wecan estimate the performance of both schemes and comparetheir efficiency using several spatio-temporal operators.

a

b

c

Fig. 9a–c.Spatial and temporal layout retrieval using 3D R-trees.a Querywindows for spatial and temporal layout.b Spatial layout.c Temporal layout

5.1 Cost analysis of R-trees

Most of the work in the literature has dealt with the expectedperformance of R-trees for processingoverlap queries, i.e.,the retrieval of data objectsp that share a common area witha query windowq [Page93, Falo94b, Theo96b]. More par-ticularly, let N be the total number of data objects indexedin an R-tree,D the density of the data objects in the globalspace andf the average capacity of each R-tree node. If weassume that the average size of a query windowq is

∏ni=1 qi,

then the expected retrieval cost (number of disk accesses) ofan overlapquery using R-trees is [Theo96b]:

C(q) = 1+

1+dlogf Nf e∑

j=1

{N

f j·

n∏i=1

((Dj · f j

N

)1/N

+ qi

)},(1)

294

where the average density of the R-tree nodesDj at eachlevel j is given by

Dj =

{1 +

(Dj − 1)1/N − 1f1/N

}n

. (2)

Hence,Dj can be computed recursively usingD0 whichdenotes the densityD of the data MBRs, or, in other words,node density at each level of the R-tree is a function ofthe density of the dataset. Qualitatively, this means that theretrieval costC(q) of an overlapquery is estimated by onlyusing knowledge of the dataset (numberN and densityDof data) and the query windowq with no need to constructthe tree structure.

Since Eq. 1 expresses the expected performance of R-trees onoverlap queries using a query windowq, in orderto estimate the retrieval cost of a spatio-temporal operatorR(p, q), we need the following transformation:R(p, q) ⇒overlap(p, Q). In other words, the retrieval of a spatio-temporal operator using R-trees is equivalent (in terms ofcost) to the retrieval of anoverlap query using an appro-priate query windowQ. The necessary transformation Qfor each operatorR should take into consideration the cor-responding constraint of the intermediate nodes, becauseonly these nodes are important when estimating the re-trieval cost [Papa95]. For the spatio-temporal operators thatwe consider in this paper, the appropriate query windowQ = (Qx1, Qx2, Qy1, Qy2, Qt1, Qt2) for the unified scheme(or Q = (Qx1, Qx2, Qy1, Qy2), Q = (Qt1, Qt2) for the simplescheme) is defined in Table 2, as a function of the originalquery windowq = (qx1, qx2, qy1, qy2, qt1, qt2).

Using information from Table 2 and Eq. 1 we can es-timate the expected cost for the query windowQ, whichequals the expected costC(R) for the retrieval of a spatio-temporal operatorR. The accuracy of the above analyticalmodel has already been evaluated on spatial relationships ofvarying selectivity (e.g.,inside, near, northeast, and com-binations) in [Theo95]. Intuitively, we assume that the uni-fied scheme should be the most efficient solution when bothspatial and temporal information are included in the query,while, in the rest of the cases, the simple scheme seems tobe preferable. The accuracy of these intuitive conclusionswill be examined in the next subsection, where the aboveanalytical model will be used as a basis for the analyticalcomparison of the proposed schemes.

5.2 Analytical comparison of the indexing schemes

In order to compare the efficiency of each proposed schemeon the retrieval of spatio-temporal operators, we assumed aMAP including 10,000 objects of the following distribution:

– a portion of 75% characterized by small projections onthe three axes (x, y, t), e.g., text or video that cover asmall space on the screen and last a short time,

– a portion of 15% characterized by zero projection on thetwo axes (x, y) and small projection on the third axis (t),e.g., sounds that cover zero space on the screen and lasta short time,

– a portion of 5% characterized by small projections onthe two axes (x, y) and large projection on the third axis

(t), e.g., heading titles or logos that cover a small spaceon the screen and last a long time, and

– a portion of 5% characterized by large projections on thetwo axes (x, y) and small projection on the third axis (t),e.g., full text or background patterns that cover a largespace on the screen and last a short time.

We consider the above distribution to be a typical distri-bution of media objects in a MAP and we use it for thecomparison of the alternative indexing schemes. Differentdistributions of objects are also supported in a similar wayby adapting their densityD.

For the analytical estimates we used Eq. 1 and the fol-lowing values: amount of data objectsN = 10, 000 (8,500)for the 1D and 3D (2D) R-tree indices, density of data ob-jects D = 145, 145, 1.6 for the 1D, 2D, and 3D indices,respectively, and average node capacityf = 0.67·M , whereM = 84, 50, 35 for 1D, 2D, and 3D R-trees, respectively4.The sizes of the reference objectsq varied from 0% up to50% of the global space per axis, while the correspondingquery windowsQ for each combination of R-tree index andoperator were formulated in Table 2. Table 3 summarizes thecomparative results for the operators discussed in the paper.For uniformity reasons, we set the cost of serial retrieval tobe 100% and express the costs of the “sorted-arrays” schemeand the indexing schemes proposed in Sect. 4 as portions ofthat value.

The cost of serial retrieval is computed as follows. Eachobject representation requires a space of 28 bytes (4 bytes× 7 numbers). If we set the size of a disk page to be1024 bytes, then a page contains 36 (= 1024/28) objects.Hence, 278 pages are required to store 10,000 objects. Allof these pages should be accessed in order to answer anyspatio-temporal operator.

The cost of the ‘sorted-arrays’ scheme is computed asfollows. The scheme consists of three arrays which containthe id plus the low (as primary key) and high (as secondarykey) coordinate of each object per axis. Hence, each objectrepresentation requires a space of 12 bytes (4 bytes× 3 num-bers). Since a page of 1024 bytes contains 85 (= 1024/12)objects, each array includes 118 (= 10, 000/85) pages. Theretrieval cost per operator is a ratio of the total amount of118 pages and is computed by using classic “divide-and-conquer” techniques with respect to the constraints that char-acterize each operator (i.e., logarithmic cost per array for se-lective almost exact match queries, such asduring and about50% of the total cost per array for non-selective queries, suchasoverlap, above, before, etc.).

The costs of the indexing schemes have been alreadydiscussed in Subsect. 5.1, with Eq. 1 being used for theircomputation.

Several conclusions arise from the analytical comparisonresults presented in Table 3.

4 The number of data objects stored in the 2D index is less than thosestored in the 1D and 3D indices, because zero-space objects (e.g., sounds)are not included in the dataset of the 2D index. TheD values are impliedfrom the above distribution if we assume that small (large) space corre-sponds to 5% (50%) of the screen and a short (long) period of time cor-responds to 1% (10%) of the whole duration of the application. The 67%capacity is a typical value for R-trees and variants, while theM valuesrepresent the maximum node capacity for pages of 1024 bytes.

295

Table 2. Query windowsQ for spatio-temporal operators

Operator 1D R-tree 2D R-tree 3D R-treeoverlap – Q = (qx1, qx2, qy1, qy2) Q = (qx1, qx2, qy1, qy2, 0, 1)above – Q = (0, 1, qy2, 1) Q = (0, 1, qy2, 1, 0, 1)during Q = (qt1, qt2) – Q = (0, 1, 0, 1, qt1, qt2)before Q = (0, qt1) – Q = (0, 1, 0, 1, 0, qt1)overlapduring – – Q = (qx1, qx2, qy1, qy2, qt1, qt2)overlapbefore – – Q = (qx1, qx2, qy1, qy2, 0, qt1)aboveduring – – Q = (0, 1, qy2, 1, qt1, qt2)abovebefore – – Q = (0, 1, qy2, 1, 0, qt1)

Table 3. Comparison of indexing schemes (with respect to serial storagecost)

Operator “sorted-arrays” Simple scheme Unified schemescheme (one 1D plus (one 3D R-tree)

one 2D R-tree)overlap 40%–45% 5%–10% 5%–15%above 20%–25% 45%–50% 80%–95%during 1% 2%–10% 25%–45%before 20%–25% 25%–35% 80%–95%overlapduring 40%–45% 5%–20% 1%–5%overlapbefore 60%–70% 35%–40% 3%–10%aboveduring 20%–25% 55%–60% 15%–25%abovebefore 40%–50% 70%–85% 50%–65%

– The intuitive conclusion that the simple R-tree schemewould outperform the unified one when dealing with op-erators that keep only temporal or spatial information,while the opposite would be the case for spatio-temporaloperators, is valid. The first four operators are more effi-ciently supported by the simple scheme, while the cost ofthe unified scheme is usually two or three times higher.The reverse situation appears for the last four operators.

– Both schemes based on R-trees are much more efficientthan the serial storage scheme for all operators. For themost selective ones (overlap, during, overlapduring),the improvement is at a level of one or even two ordersof magnitude, compared to the serial cost. For the leastselective ones (above, before, abovebefore), the cost ofthe most efficient scheme is a 1/4 up to a 1/2 portion ofthe serial cost.

– The ‘sorted-arrays’ scheme is shown to be a compet-itive solution. It always outperforms the serial storagescheme (its cost being usually a 1/5 up to a 3/5 portionof the serial cost). In comparison with the two indexingschemes based on R-trees, it is the winner when opera-tors of very low selectivity (above, before, abovebefore)are involved, while, for the rest of the cases, it remainsan efficient alternative solution.

A graphical comparison of the three schemes as comparedto the serial one appears in Fig. 10.

The above conclusions are, more or less, expected. How-ever, in real-world cases, a mixture of temporal, spatial andspatio-temporal operators needs to be supported. Then, se-lecting the most efficient scheme for such mixed require-ments arises. In [Theo96a], we propose guidelines for deal-ing with this issue. The average costs of the alternative in-dexing scheme based on R-trees are evaluated when: all eightoperators are used, or only the most selective (inclusive)ones are involved, or when only the least selective (exclu-sive) operators are involved. The main conclusion from this

discussion is the following. If we distinguish between highand low selective operators, then the thresholds shift right(high selective operators) or left (low selective operators).In other words, when dealing with selective operators, thesimple scheme is sometimes preferable, even if the majority(up to 65%) of the queries involve spatio-temporal infor-mation. It is a choice of the multimedia database designerto select the most preferable solution, with respect to therequirements of the MAP author.

6 Conclusion

Authoring complex MAPs that involve a large number ofmedia objects is a complicated task, keeping in mind thelarge set of possible relationships and events that may beencountered in the application context, as well as the var-ious potential combinations of these parameters. Thus, theneed for a scheme that will support the authors in manag-ing the large number of objects and spatio-temporal rela-tionships among them is required. Current authoring toolsdo not provide such facilities. The mechanism we proposeprovides support for queries, before application execution,related to the application scenario, and more specifically,to spatio-temporal relationships among media objects (i.e.,“does object A spatially overlap with object B in the appli-cation?” or “ which objects temporally overlap with objectA?”). Moreover, authors may request spatio-temporal lay-outs of the application at specific spatial and/or temporalinstances (i.e., “which objects appear in the application at aspecific time instance”, or “ what is the spatial (screen) layoutat a specific time instance during the application”, or “ whatis the temporal layout of the application in terms of temporalintervals”).

In this paper, we presented

– a model for the declarative representation of spatio-temporal composition in the context of large MAPs,

– an efficient indexing mechanism for such applicationsbased on R-trees.

With regard to the MAP model, the motivation for thisresearch work was the lack of a complete declarative ap-proach for representation of spatio-temporal compositionof objects in current multimedia document standards (Hy-Time [Newc91], MHEG [ISO93]) and authoring tools (Ap-ple / SCRIPTX [Scri96], Assymetrix / ToolBook [Assy94],Macromedia / MacroMind Director [Makr94]). An inte-grated high-level model to facilitate the above-mentionedrequirements would be of benefit to application designersand developers and presents the following advantages.

296

Fig. 10.Comparison of the retrievalcost of the three indexing schemes(% of the serial storage cost)

– Explicit mapping between author high-level spatio-tem-poral specifications into a declarative uniform specifica-tion. Spatio-temporal relationships (instead of their ab-solute coordinates) are retained. This enables answeringqueries based on spatio-temporal relationships among theobjects.

– Formal specification of a MAP, which will allow theuse of software-engineering methodologies (quality andmaintenance) in this area.

– Separation of application specification from the appli-cation content, which enables reusability of the MAPfunctionality specifications for other cases with similarfunctionality but different content.

As for the indexing schemes, we are based on indexing spa-tial and temporal presentation features of the media objectsduring the application. We propose two indexing schemesbased on the R-tree data structure; the first scheme includesone 1D and one 2D R-tree that separately index temporal andspatial characteristics of objects, respectively, while the sec-ond scheme includes one 3D R-tree that indices the spatio-temporal characteristics of objects, considering time to bethe third axis of the coordinate system. We evaluated thetwo schemes against the serial storage scheme and a schemeusing disk-resident sorted arrays, and presented guidelinesthat help one to select the most appropriate solution.

The composition model we proposed has a considerablelimitation: it does not support interaction handling in termsof events, while it covers the case of pre-orchestrated scenar-ios (i.e., the spatio-temporal ordering of objects in the appli-cation is pre-defined). Specifically for the indexing mecha-nism, a limitation of our approach is that it does not supportinteractive scenarios, due to the non-deterministic spatial andtemporal occurrences of the objects.

We claim that there is a lot of potential in this area. Weplan to address the following research issues in the future:

– automatic composition tuple extractionfrom spatio-tem-poral specifications.

– design of a GUIthat fulfills the requirements of design-ing, verifying and testing complex spatio-temporal com-position of objects in the context of MAPs,

– modeling of events, in order to provide the ability of han-dling user interactivity. Modern applications are heavilybased on user interaction, which may be represented interms of events. Embedding of such events in our modelwould result in a powerful tool for authors of interactivemultimedia applications,

– constraint checking: a multimedia scenario involves a lotof objects which may be related in various ways. Theserelations may lead to inconsistencies. Such issues shouldbe further investigated.

In a parallel way, the proposed unified indexing schemecould be further extended towards:

– scenario rendering based on the indexing scheme: themodel we proposed could also be used during the exe-cution phase of the scenario. In this case, the appropriatemedia would be quickly located on the basis of the sce-nario,

– indexing of interactive scenarios: the indexing schemeshould be modified, in order to cover the case of inter-active scenarios, where the spatio-temporal presence ofan object depends on the occurrence of events.

297

References

[Alle83] Allen JF (1983) Maintaining Knowledge about Temporal Inter-vals. Comm ACM 26(11):832–843

[Assy94] Assymetrix (1994) Assymetrix ToolBook User’s Manual[Beck90] Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The

R∗-tree: An Efficient and Robust Access Method for Points andRectangles. In: Garcia-Molina H, Tagadish HV (eds) Proceed-ings of ACM SIGMOD International Conference on Manage-ment of Data, ACM-Press

[Bent75] Bentley JL (1975) Multidimensional Binary Search Trees Usedfor Associative Searching. Comm ACM 18:509–517

[Blak96] Blakowski G, Steinmetz R (1996) A Media SynchronisationSurvey: Reference Model, Specification, and Case Studies.IEEE J Sel Areas Commun 14(1):5–35

[Bufo96] Buford J (1996) Evaluating HyTime: An Examination and Im-plementation Experience. In: Proceedings of ACM Hypertext’96 Conference, ACM-Press

[Chiu94] Chiueh T (1994) Content-Based Image Indexing. In: Pro-ceedings of the 20th International Conference on Very LargeDatabases (VLDB)

[Duda95] Duda A, Keramane C (1995) Structured Temporal Compositionof Multimedia Data. In: Proceedings of the 1st IEEE Interna-tional Workshop for MM-DBMSs, IEEE Computer Society, LosAlamitos, Calif

[Egen91] Egenhofer M, Franzosa R (1991) Point-Set Topological SpatialRelations. Int J Geogr Inf Sys 5(2):161–174

[Erfl93] Erfle R (1993) Specification of temporal constraints in multime-dia documents using HyTime. Electronic Publishing 6(4):397–411

[Falo94a] Faloutsos C, Equitz W, Flickner M, Niblack W, Petkovic D,Barber R (1994) Efficient and Effective Querying by ImageContent. J Int Inf Sys 3:1–28

[Falo94b] Faloutsos C, Kamel I (1994) Beyond Uniformity and Inde-pendence: Analysis of R-trees Using the Concept of FractalDimension. In: Proceedings of the 13th ACM Symposium onPrinciples of Database Systems (PODS), ACM-Press

[Gutt84] Guttman A (1984) R-trees: A Dynamic Index Structure forSpatial Searching. In: Yormark B (ed) Proceedings of ACMSIGMOD International Conference on Management of Data,ACM-Press

[Hamb72] Hamblin C (1972) Instants and Intervals. Fraser JT (ed) TheStudy of Time. Springer, Berlin Heidelberg, pp 325–331

[Hand96] Handl M (1996) A New Multimedia Synchronisation Model.IEEE J Sel Areas Commun 14(1):73–83

[Hirz95] Hirzalla N, Falchuck B, Karmouch A (1995) A Temporal Modelfor Interactive Multimedia Scenarios. IEEE Multimedia Maga-zine 2(3):24–31

[Iino94] Iino M, Day YF, Ghafoor A (1994) An Object Oriented Modelfor Spatio-Temporal Synchronization of Multimedia Informa-tion. In: Proceedings of the 1st IEEE Conference on Multime-dia Computing and Systems (ICMCS), IEEE Computer Society,Los Alamitos, Calif

[ISO93] ISO/IEC (1993) Information Technology – Coded represen-tation of Multimedia and Hypermedia Information Objects(MHEG)

[Krie96] Krieg P (1996) Digital Hollywood: The turbulent Marraige ofComputer, Telecom and Media Industry. In: Proceedings ofInternational Workshop on Multimedia Software Development(MMSD), IEEE Computer Society, Los Alamitos, Calif

[King94] King PR (1994) Towards a temporal logic-based formalismfor expressing temporal constraints in multimedia documents.Technical Report 942, LRI, Universite de Paris-Sud, Orsay,France

[Lamp90] Lamport L (1990) LaTEX: A Document Preparation System.Addison-Wesley, Reading, Mass.

[Litt93] Little T, Ghafoor A (1993) Interval-Based Conceptual Modelsfor Time Dependent Multimedia Data. IEEE Trans Knowl DataEng 5(4):551–563

[Macr90] Macromind Inc. (1990) Macromind Director, Interactivity Man-ual

[Newc91] Newcomb S, Kipp N, Newcomp V (1991) The HyTime, Hy-permedia Time based Document Structuring Language. CommACM 34(11):67–83

[Oren86] Orenstein J (1986) Spatial Query Processing in an Object-Oriented Database System. In: Zanido C (ed) Proceedings ofACM SIGMOD International Conference on Management ofData, ACM-Press

[Page93] Pagel B-U, Six H-W, Toben H, Widmayer P (1993) Towardsan Analysis of Range Query Performance. In: Proceedings ofthe 12th ACM Symposium on Principles of Database Systems(PODS), ACM-Press

[Papa95] Papadias D, Theodoridis Y, Sellis T, Egenhofer M (1995) Topo-logical Relations in the World of Minimum Bounding Rectan-gles: a Study with R-trees. In: Carey M, Schneider D (eds)Proceedings of ACM SIGMOD International Conference onManagement of Data, ACM-Press

[Papa97] Papadias D, Theodoridis Y (1997) Spatial Relations, MinimumBounding Rectangles, and Spatial Data Structures. Int J GeogrInf Sys 11(2):111–138

[Same90] Samet H (1990) The Design and Analysis of Spatial Data Struc-tures. Addison-Wesley, Reading, Mass.

[Schn96] Schnepf J, Du DH-C (1996) Doing FLIPS: Flexible Interac-tive Presentation Synchronisation. IEEE J Sel Areas Commun14(1):114–125

[Scri96] ScriptX Technical Overview (1996) available at:http://dev.info.apple.com/scriptx/techdocs.html

[Theo95] Theodoridis Y, Papadias D (1995) Range Queries InvolvingSpatial Relations: A Performance Analysis. In: Proceedings ofthe 2nd International Conference on Spatial Information Theory(COSIT), IEEE Computer Society, Los Alamitos, Calif

[Theo96a] Theodoridis Y, Vazirgiannis M, Sellis T (1996) Spatio-Temporal Indexing for Large Multimedia Applications. In: Pro-ceedings of the 3rd IEEE Conference on Multimedia Computingand Systems (ICMCS), IEEE Computer Society, Los Alamitos,Calif

[Theo96b] Theodoridis Y, Sellis T (1996) A Model for the Prediction of R-tree Performance. In: Proceedings of the 15th ACM Symposiumon Principles of Database Systems (PODS), ACM-Press

[Ubel94] Ubell M (1994) The Montage Extensible Datablade Architec-ture. In: Snodgrass R, Winslet M (eds) Proceedings of ACMSIGMOD International Conference on Management of Data,ACM-Press

[Vazi95] Vazirgiannis M, Hatzopoulos M (1995) Integrated MultimediaObject and Application Modeling Based on Events and Scenar-ios. In: Proceedings of the 1st IEEE International Workshop forMM-DBMSs, IEEE Computer Society, Los Alamitos, Calif

[Vazi96a] Vazirgiannis M, Theodoridis Y, Sellis T (1996) Spatio-Temporal Composition in Multimedia Applications. In: Pro-ceedings of International Workshop on Multimedia SoftwareDevelopment (MMSD), IEEE Computer Society, Los Alami-tos, Calif

[Vazi96b] Vazirgiannis M, Sellis T (1996) Event And Action Represen-tation And Composition For Multimedia Application ScenarioModelling. In: Proceedings of ERCIM Workshop on InteractiveDistributed Multimedia Systems and Services, Springer Verlag,Heidelberg

298

Michael Vazirgiannis received hisDiploma degree in Physics in 1986from the University of Athens, Athens,Greece. In 1989, he received the M.Sc.degree from Heriot Watt University, Ed-inburgh (UK) and in 1994 the Ph.D.degree from the University of Athens,Athens, Greece. In 1995, he joined theKnowledge & Data Bases Laboratoryin the Computer Science Division ofthe National Technical University ofAthens, Athens, Greece, where he is apostdoctoral researcher. His research in-terests include multimedia informationsystems, spatio-temporal databases andfuzzy logic. He has published articles in

refereed journals and international conferences in the above areas. Dr. Vazir-giannis worked as guest scientist in GMD/IPSI in Darmstadt, Germany(1996) and in Fernuniversitaet in Hagen, Germany (1997). Dr.Vazirgiannisis a member of ACM and IEEE.

Yannis Theodoridis received hisDipl. Eng. (1990) and Dr. Eng. (1996)degrees from National Technical Uni-versity of Athens (NTUA), Greece. Heis currently a Research Engineer atthe Knowledge and Data Bases Sys-tems Laboratory, NTUA. His researchinterests include geographical databases,multimedia systems, spatial data struc-tures and algorithms. He is a member ofACM and IEEE.

Timos Sellis received his Diploma de-gree in Electrical Engineering in 1982from the National Technical Universityof Athens, Athens, Greece. In 1983, hereceived the M.Sc. degree from HarvardUniversity and in 1986 the Ph.D. de-gree from the University of Californiaat Berkeley, where he was a member ofthe INGRES group, both in ComputerScience. In 1986, he joined the depart-ment of Computer Science of the Uni-versity of Maryland, College Park as anAssistant Professor, and became an As-sociate Professor in 1992. Between 1992and 1996 he was an Associate Professorat the Computer Science Division of the

National Technical University of Athens, in Athens, Greece, where he iscurrently a Full Professor. His research interests include extended relationaldatabase systems, active database systems, and spatial, image and multime-dia database systems. He has published several articles in refereed journalsand international conferences in the above areas. Prof. Sellis is a recipientof a Presidential Young Investigator (PYI) award for 1990–1995. He is amember of the Editorial Boards of the International Journal on IntelligentInformation Systems: Integrating Artificial Intelligence and Database Tech-nologies, and Geoinformatica. Dr. Sellis is a member of IEEE and ACM.