cTraj: efficient indexing and searching of sequences containing multiple moving objects

30
1 23 Journal of Intelligent Information Systems Integrating Artificial Intelligence and Database Technologies ISSN 0925-9902 J Intell Inf Syst DOI 10.1007/s10844-011-0180-5 cTraj: efficient indexing and searching of sequences containing multiple moving objects Zaher Al Aghbari

Transcript of cTraj: efficient indexing and searching of sequences containing multiple moving objects

1 23

Journal of Intelligent InformationSystemsIntegrating Artificial Intelligence andDatabase Technologies ISSN 0925-9902 J Intell Inf SystDOI 10.1007/s10844-011-0180-5

cTraj: efficient indexing and searchingof sequences containing multiple movingobjects

Zaher Al Aghbari

1 23

Your article is protected by copyright and

all rights are held exclusively by Springer

Science+Business Media, LLC. This e-offprint

is for personal use only and shall not be self-

archived in electronic repositories. If you

wish to self-archive your work, please use the

accepted author’s version for posting to your

own website or your institution’s repository.

You may further deposit the accepted author’s

version on a funder’s repository at a funder’s

request, provided it is not made publicly

available until 12 months after publication.

J Intell Inf SystDOI 10.1007/s10844-011-0180-5

cTraj: efficient indexing and searching of sequencescontaining multiple moving objects

Zaher Al Aghbari

Received: 13 December 2010 / Revised: 4 August 2011 / Accepted: 26 September 2011© Springer Science+Business Media, LLC 2011

Abstract Indexing sequences containing multiple moving objects by all features ofthese objects captured at every clock tick results in huge index structures due to thelarge number of extracted features in all sampled instances. Thus, the main problemswith current systems that index sequences containing multiple moving objects are:huge storage requirements for index structures, slow search time and low accuracydue to lack of representation of the time-varying features of objects. In this paper,a technique called cTraj to address these problems is proposed. For each objectin a sequence, cTraj captures the features at sampled instances. Then, it maps theobject’s features at each sampled instance from high-dimensional feature space into apoint in low-dimensional distance space. The sequence of points of an object in low-dimensional space is considered the time-varying feature trajectory of the object.To reduce storage requirements of an index structure, the sequence of points ineach trajectory is represented by a minimum bounding box (MBB). cTraj indexesa sequence by the MBBs of its objects using a spatial access method (SAM), such asan R−tree; thus, greatly reducing storage requirements of the index and speeding upthe search time. The cTraj technique does not result in any false dismissal, but theresult might contain a few false alarms, which are removed by a two-step refinementprocess. The experiments show that the proposed cTraj technique produces effectiveresults comparable to those of a sequential method, however much more efficient.

Keywords Indexing sequences · Representing moving objects ·Time-varying features · Searching sequences of multiple objecs ·Feature trajectory

Z. Al Aghbari (B)Department of Computer Science, University of Sharjah,P.O.Box 27272, Sharjah, United Arab Emiratese-mail: [email protected]

Author's personal copy

J Intell Inf Syst

1 Introduction

Huge numbers of digital sequences with multiple objects have been producedrecently and made easily accessible to users in the Internet, digital libraries, etc. Inaddition, several new applications, which produce and utilize sequences with multiplemoving objects, have emerged such as medical imaging, multimedia messaging,surveillance systems, distance learning, news-on-demand, public visual informationsystems, just to name a few. For practical reasons, these systems require fast andeffective indexing and search algorithms.

Content-based search of sequences is a complex task and commercial systems,such as YouTube and Google Video, largely rely on pure textual search (Broiloet al. 2010). However, some research works in video indexing and retrieval, suchas (Flickner et al. 1995; La Cascia and Ardizzone 1996; Smith and Chang 1996), haveconcentrated on extracting visual features (color, motion, shape, etc.) from videoshots to represent the visual contents of these shots. These systems do not representa shot by the contents of all its frames, but rather by the contents of a few selectedframes, called keyframes. Then either the visual features of each keyframe as a whole(global feature representation approach), or the visual features of individual objectsthat exist in the keyframes (local feature representation approach), are extracted.In either of these representation approaches, the cross-keyframe correspondancebetween objects is not captured. That is, the relationships between the contentsof neighboring keyframes are not considered, such as which object in keyframeki corresponds to which object in keyframe ki+1. Thus, a shot is inadequatelyabstracted as multiple still images. Moreover, these systems represent the sequencesby the extracted low level features ignoring the challenging issue of representing thesequences by its semantics. To bridge this semantic gap between low level featuresand user’s semantics, some researchers use motion information analysis (Morris andTrivedi 2008) or feature trajectories (Aghbari et al. 2003).

The main difference between still images and videos stems from the motion ofobjects in videos. As objects move, the visual features (e.g. colors) of individualobjects may change over time. Therefore, an effective video representation shouldcapture the time-varying visual features of individual objects. Traditionally, aneffective indexing and searching of large-scale video databases had captured theattention of many research works. For example, in (Flickner et al. 1995; Koblaet al. 1998; Korn et al. 1998), a feature vector of an object (e.g. color histogram)is mapped into a point in a low-dimensional space. A spatial access method (SAM),such as an R−tree (Guttman 1984), R+−tree (Sellis et al. 1987), R*-tree (Beckmannet al. 1990), SR−tree (Katayama and Satoh 1997, etc.), is used to index and retrieverelevant shots efficiently.

In this paper, a technique called cTraj that extracts the features of individualobjects in a sequence (e.g. video) at sampled instances is proposed. To reduce dimen-sionality, the extracted features of an object at each sampled instance are mappedfrom high-dimensional feature space into a point in low-dimensional distance space.Thus, every object will be represented by a point in low dimensional space at everyinstance it exists in. In a sequence, say a video, an object is represented by a chainof points in low-dimensional space and this chain of points is considered the time-varying feature trajectory of the object. The chain of points in each trajectory isrepresented by a minimum bounding box (MBB). Then, cTraj indexes a sequence

Author's personal copy

J Intell Inf Syst

by the MBBs of its objects. These MBBs are inserted into a spatial access method(SAM), such as an R−tree, instead of the individual points; thus, greatly reducingstorage requirements of the index and hence speeding up the search time.

Using this framework, a query in high-dimensional feature space of the form “Findobjects similar to the query object Q” becomes a query in low-dimensional distancespace of the form “Find points that are in the proximity of the query point q”. That isa query becomes a range query or a K-nearest-neighbor query in the low-dimensionalspace (Korn et al. 1998; Faloutsos 1996).

Definition 1 A range query is of the form “Find points that are within distance ∈from the query point q.”

Definition 2 A K-nearest-neighbor query is of the form “Find the closest K pointsto the query point q.”

The proposed technique, cTraj, provides a user interface through which a userformulates a query that describes the time-varying features (trajectories) of objectsin a wanted sequence(s). Therefore, cTraj supports two types of queries. The firsttype is a range query that finds sequences that contain trajectories that are withinsome ∈ distance from the given user formulated trajectories. This type will retrieveall sequences that contain objects whose features vary in time in a similar manner asthose formulated by the user. The second type is a K−nearest-neighbor query thatretrieves K most similar sequences that contain similar object trajectories that aremost similar to the trajectories formulated by the user.

Using a SAM to index and retrieve points, which represent features of objects,poses two main problems:

• Curse of dimensionality Most SAMs scale exponentially for high dimensionaldata, eventually they reduce performance to that of a sequential scanning(Faloutsos 1996).

• Correctness of results Guaranteeing that the result of a query does not miss anyqualifying points.

Below, the main contributions of cTraj are presented.

• Capturing time-varying feature cTraj capture the time-varying color feature of anobject however the idea can be easily extended to other features, such as motionand shape. Colors of an object captured by a camera might change over time dueto several reasons: (1) movement of an object in front of a camera, Where thefront view and back view of the same object could result in two different colorhistograms, see Fig. 1, (2) movement of the camera, where different angles ofview of the same object might result in different color histograms, (3) change inthe state of objects, such as the reaction of two substances (say, gases), and (4)occurance of an accident to an object, such as the start of fire in a car. Therefore,capturing these time-varying changes in colors of an object allows users to simplyformulate queries describing complex color trajectories of objects. Such queriesimprove the effectiveness of the search.

Author's personal copy

J Intell Inf Syst

Fig. 1 An example of time-varying changes in color as an object moves: a colors’ proportions of amoving car change as it turns in front of the camera and fire starts off at its back, b colors’ proportionschange and new colors appear as a bike-rider flips over in the air because the colors on the bike-rider’shirt are different; dark brown at the front and white at the back

• Reducing dimensionality of features A novel dimensionality reduction method,called topological mapping, is proposed. This method maps the features, such asa color histogram, of an object captured at each instance from high-dimensionalfeature space into a point in 3-dimensional distance.

• Guaranteeing correctness of query results A new Lemma to prove that the topo-logical mapping method results in a contractive mapping of objects is presented.That is, distances between objects in the low-dimensional space become smallerthan, or at least equal to, their distances in the high-dimensional feature space.Thus, when issuing a range query, or a K−nearest-neighbor query, the resultshould not contain any false dismissals, but it might contain a few false alarmsthat are easily removed by a subsequent filter step. Where,

Definition 3 false dismissals are the points that satisfy the user query but notreturned in the result of a query.

Definition 4 false alarms are the points that do not satisfy the user query butreturned in the result of a query.

• Optimizing storage requirement If a sequence S contains NO objects and thefeatures are extracted at NK sampled instances, then S is represented by NO ×NK points. Without loss of generality, here it is assumed that all objects exist inall sampled instances. Therefore, S is represented by a large number of points.In very large sequence database environment, inserting such a large number ofpoints into a SAM, such as an R−tree, to index each Si in the database, wouldresult in a huge R−tree and, consequently, a great degradation in performance.In cTraj, the number of points representing a sequence are greatly reducedsince each object is represented by an MBB that bounds the chain of points

Author's personal copy

J Intell Inf Syst

representing the object’s feature trajectory ζi in low dimensional space. Thus,each Si is represented by NO MBBs instead of NO × NK points; as a result, thesize of the R−tree is greatly reduced.

• Optimizing efficiency of a search Efficiency of a search is largely related to theproblem of optimizing storage requirement of the index structure (R−tree). Thatis, reducing the size of the R−tree results in the speed up of the search time, asshown by our experiments (see Section 4).

The rest of this paper is organized as follows: In Section 2, the object trajectorycomputation, the mapping mechanism and indexing of the mapped data are pre-sented. The cTraj matching algorithm is presented in Section 3. Section 4 presentsand analyzes the conducted experiments. In Section 5, a survey of related work isdiscussed. Finally, the paper is concluded in Section 6.

2 Object trajectory: representation and indexing

This section describes the sequence structure, the time-varying feature extraction,the object trajectory computation, the dimensionality reduction by mapping objecttrajectories into low dimensional space, and indexing these trajectories using a SAMsuch as an R−tree. The notation used in this paper is summarized in Table 1 (seebelow).

2.1 Sequence structure

In this paper, video shots collected from the Internet are used as sequences thatcontain multiple moving objects. The video shot structure of (Aghbari et al. 2003)is used, where a shot structure is represented by an acyclic graph that consists of fourlayers: a shot layer (upper layer), event layer, keyframe layer, and object layer (lowerlayer). As discussed in (Aghbari et al. 2003), a shot undergoes several processingsteps: event detection, keyframe selection, and object segmentation.

• Event detection Each shot constitutes one or more events, E1, E2, . . . , ENe. Anevent, Ei, is a subsequence of consecutive frames that express a particular activityand contain a fixed number of semantically meaningful objects. Thus, the start

Table 1 Notation used in the paper

Symbol Description

S A sequence such as videoOi An object in a sequenceNO Number of objects in a sequenceKi A sampled instance of a sequence such a keyframeNK Number of sampled instancesζi Feature trajectory of an objectCi Color histogram of Oi captured at one sampled instanceQ User queryζ

qi Query object trajectory

Nqo Number of queried objectsCq

j Query color histogram of an object at one instance

Author's personal copy

J Intell Inf Syst

and end of an event are detected by the appearance of a new object into thescene or the disappearance of an existing object from the scene. The main goalof dividing a shot into events is to identify the frames that are most similar toeach other. Then, by extracting keyframes from each event, those keyframes willbest represent the content of a shot.

• Keyframe selection Due to the close similarity between frames of an event Ei,only two keyframes (first and last frames of an event) are extracted to representthe event, but if the event is longer than two seconds (60 frames), one keyframeper second is extracted.

• Object segmentation From each keyframe, a set of semantically meaningful ob-jects (O1, O2, . . ., ONo) are extracted. An object Oi is a semantically meaningfulphysical entity within a frame. There are many methods, e.g. (Chang et al. 1997;Ferman et al. 1998), for segmenting and tracking objects from videos that arecoded by non-object-based encoders, such as MPEG-1 or MPEG-2. The systemin (Chang et al. 1997) uses a method that segments and tracks image regionsbased on fusion of color, edge, and motion information. On the other hand, thesystem in (Ferman et al. 1998) uses a contour-based method to segment andtrack objects. However, the approach of the system in (Aghbari et al. 2003) isfollowed, which assumes that shots are coded by an object-based video encodersuch as MPEG-4 (Overview of MPEG-4 Standard 1999). In MPEG-4 objectswith arbitrary shapes are encoded apart from their background. Therefore,segmentation and tracking information of objects is provided in the video inputstream.

2.2 Feature extraction

Generally, color distribution is a useful feature for representing objects of a se-quence. Color distribution of an object is expressed by a color histogram. Theadvantage of color histograms is that they are invariant under changes in angle ofview, changes in scale, or occlusion (Furft et al. 1996). Therefore, a color histogramis a powerful mean to represent the color feature of an object.

In this paper, the RGB color space is quantized into 64 equally spaced colors,where each color consists of 3 components: red (r), green (g), and blue (b). Byexamining the characteristics of color histograms that represent objects, it is foundthat the color feature of an object is largely defined by the major 10 colors in a colorhistogram. Mathematically, the color histogram Cj of object Oi at a sampled instance(e.g. keyframe Ki) is defined by:

C j : {(r1, g1, b 1, p1) , (r2, g2, b 2, p2) , . . . , (rl, gl, bl, pl)} (1)

Where, pi is the percentage of color ci that is represented by the ri, gi, and bi colorcomponents. In the experiments, l = 10 is used, which is the maximum number ofmajor colors in a color histogram of an object.

2.3 Object trajectory

Each color histogram Cj: 1 ≤ j ≤ Nk of object Oi represents the instantaneous colorsof Oi at keyframe Kj. A feature trajectory ζi of Oi is viewed as a sequence of its

Author's personal copy

J Intell Inf Syst

color histograms that are extracted at different instances (e.g. keyframes) in whichOi exists. More formally,

Definition 5 The feature trajectory ζi represents the time-varying colors of object Oi

at the different time instances within the sequence at which the object exists.

ζi : O1, O2, . . . , O j, 1 < i < NO (2)

Where, j is the number of sampled instances (keyframes) at which Oi exist.j ≤ NK. Furthermore, the sequence 1, 2, . . .., j does not necessarily have a 1−to−1correspondence with the sequence of extracted keyframes because an object maydisappear at some keyframe and reappear at a later time in the shot.

Since each Oi is represented by one ζi, then each sequence S is represented by anumber of object trajectories equal to number of objects in the sequence NO.

S : ζ1, ζ2, . . . , ζNo (3)

2.4 Dimensionality reduction

The increase in the number of accessible digital videos in the world wide web(WWW) and the emergence of new applications, which utilize videos on the WWW,require fast and effective methods to index and retrieve shots. Therefore, to clusterthe shot database, the proposed technique uses a SAM in order to narrow down thesearch space and speed up the retrieval of shots. As shown in Fig. 2, the database ofshots is clustered using an R−tree. So that when a query is issued, the most similarcluster to the user query is retrieved. The retrieved cluster, which is a small subset ofthe database of shot, contains all similar shot to the user query.

In this paper, an R−tree is used to index the object trajectories, where each objecttrajectory consists of several color histograms (see 2). However, color histograms arehighly dimensional and thus cannot be efficiently indexed by an R−tree, or any SAM(Faloutsos 1996). As discussed in Section 2.2, a color histogram C j of object Oi is apoint in a 64-dimensional color space. According to Korn et al. (1998), Katayama andSatoh (1997), Faloutsos (1996), Yi et al. (1998), an R−tree and its variants are onlyefficient when used to index low-dimensional data. Therefore, the apparent solutionis to, first, reduce the dimensionality of color histograms by mapping them into points

Fig. 2 Clustering the database of shots to narrow down the search space and speed up retrieval

Author's personal copy

J Intell Inf Syst

in low-dimensional space, then these mapped points can be indexed by an R−treeefficiently.

A special care must be exercised when choosing the mapping method, becausethe mapping method should guarantee the ‘correctness of results’. That is, when aquery, say ‘f ind shots that are similar to this query’, is issued the result should becomparable to that of a sequential scanning method. Thus, cTraj uses a topologicalmapping to map high-dimensional color histograms into points in low-dimensionaldistance space.

Below the topological mapping method is explained and a Lemma to prove thecorrectness of results is presented.

2.4.1 Topological mapping

The topological mapping me thod maps a color histogram into a 3-dimensionaldistance space and guarantees the correctness of results. Where, the 3 dimensionsare the red (Rd), green (Gd) and blue (Bd) distance spaces (or topological spaces).As shown in Fig. 3, mapping the color histogram into theRdGd Bd distance space isachieved by, first, computing the distance between the color histogram C and a virtualobject that is assumed to be at the origin of the topological axis (distance space) andthen a point representing a color histogram is placed on the topological axis basedon the computed distance.

For example, the distance D(C, OVR) between C and a virtual red object OVR,which is assumed to be at the origin of the red topological axis, is computed and thena point representing C is placed on Rd based on the computed distance D(C, OVR),which is discussed in Section 3 (19). Similarly, the distanceD(C, OVG) between C anda virtual green objectOVG is computed, and also the distance D(C, OVB) between Cand a virtual blue object OVB is computed. Combining the three distance points, asshown in Fig. 3, will result in a point representing C in a 3-dimensional distance space(RdGd Bd distance space).

Color histograms of objects are mapped into a 3-dimensional distance spacebecause of the following reasons: (1) R−trees are more efficient when they are usedto cluster low-dimensional data (such as 3-dimensions), (2) 3-dimensions or lowerare easier to visualize, and (3) The R, G and B colors are orthogonal; therefore, themapped points will be well dispersed; resulting in smaller and more distinguishableclusters.

Fig. 3 Mapping a color histogram C into a point in Rd, Gd and Bd (3-dimensional) distance space

Author's personal copy

J Intell Inf Syst

Indexing those mapped points in low-dimensional space using an R−tree willresult in a much faster search as compared with sequential scanning method. How-ever, it is necessary to assure that the result of a search using R−tree is comparableto that of a sequential scanning method. That is, our topological mapping methodshould not result in any false dismissals. In Faloutsos (1996), an image is mappedinto a low-dimensional feature space by computing the averages of the R, G, andB components of each pixel in the image. The proposed mapping mechanism isdifferent and a lower-bounding Lemma is presented to prove that the topologicalmapping guarantees the correctness of results (no false dismissals). However, be-cause of the contractive nature (distances in low-dimensional space become smaller)of the topological mapping, the result might contain some false alarms, which can beremoved by a subsequent filtering process (see Section 3).

Lemma 1 The correctness of the result of a range query Q is achieved if the dis-tance DRGB(Q, C) between Q and a color histogram C in the low-dimensional (3-dimensional) distance space is smaller than, or equal to, the distance Dcolor(Q, C)

beween them in the high-dimensional (64-dimensional) color space. Mathematically,(4) needs be proved.

DRGB (Q, C) ≤ Dcolor (Q, C) (4)

Proof Since the same distance function (19) is used to compute DRGB(Q, C) andDcolor(Q, C), the proof becomes a geometrical problem as shown in Fig. 4. Anotherway of viewing the topological mapping method is that it collapses some dimen-sions of the high-dimensional space to map a point (color histogram) into a low-dimensional space. That is low-dimensional distance space is a subspace of the high-dimensional color space.

First, let’s consider a 2-dimensional case to prove that the distance between Q andC in a 2-dimensional distance space is larger than, or at least equal to, the distancebetween the projections of these two points on a one dimensional space. ConsiderFig. 4a, by applying the Pythagorean theorem on triangle QCL (with the right angleat L), line QC (representing Dcolor(Q, C) in 2-dimensional distance space) is alwayslarger than, or at least equal to, line CL. Now consider rectangle CLL′C′, whereCL equals C′L′, which is the projection of points Q and C on the Bd axis and

Fig. 4 a Projection of Q and C points from 2-dimensional space into a 1-dimensional space, b is aprojection of the two points from 3-dimensional space into a 2-dimensional space

Author's personal copy

J Intell Inf Syst

also represents DB(Q, C) in a 1-dimensional distance space (Bd distance space).Therefore, it can be concluded that

D1 (Q, C) ≤ D2 (Q, C) (5)

That is the distance between Q and C in a 1-dimensional distance space is less thanor equal than their distance in a 2-dimensional distance space. Now, let’s considera 3-dimensional case. From triangle QCL (with the right angle at L) of Fig. 4b, itcan concluded that the line QC (representing Dcolor(Q, C) in 3-dimensional distancespace) is always larger than, or at least equal to, line LC, which represents theprojection of points Q and C on the 2-dimensional distance space (Rd Bd distancespace). Formally,

D2 (Q, C) ≤ D3 (Q, C) (6)

Similarly, this logic can be extended to higher dimensions. From (5) and (6), it canbe concluded that

Dk−1 (Q, C) ≤ Dk (Q, C) (7)

Since DRGB(Q, C) is the distance between Q and C in 3-dimensional distance spaceand Dcolor(Q, C) is the distance between them in a 64-dimensional distance space,and by referring to (7), it is concluded that DRGB (Q, C) ≤ Dcolor (Q, C). ��

Thus, the topological mapping method does not result in any false dismissalsbecause the distance between the objects in 3-dimensional (RdGd Bd) distance spaceunderestimates, or at least matches, the distance between the objects in the 64-dimensional color space.

2.5 Indexing object trajectories

Since R−trees are efficient when used to index points in low-dimensional space,the points mapped by the topological mapping method into 3-dimensional distacespace can be efficiently indexed by an R−tree. But, the problem of how to optimallystore the mapped points is of great importance, especially, in very large sequencedatabase environments. Before discussing how cTraj addresses this problem and forthe purpose of comparison, a straight forward solution called naive-I, where I standsfor index, is presented.

2.5.1 naive-I method

In the naive-I method, the individual mapped points are stored in the R−tree.However, in very large sequence database environment, the naive-I method willresult in a huge R−tree, and thus will greatly increase the search time.

Without loss of generality, it is assumed that an object exist in all sampledinstances of a sequence Si. Thus, if the number of sampled instances is NK, thenumber of objects is NO in Si and the number of sequences is NS. Therefore, thenumber of points NPnaive stored in an R−tree is huge and can be approximated bythe following equation:

NPnaive =∑NS

i=1NK

[Si

] × NO[Si

](8)

Author's personal copy

J Intell Inf Syst

Fig. 5 An example shot withthree color trajectories andeach color trajectory isrepresented by a cluster ofmapped points a the naive-Imethod stores individualpoints into R−tree, b the cTrajbounds the points of eachcolor trajectory by an MBBand then stores those MBBs inthe R−tree

2.5.2 cTraj method

Even though, the color histograms in an object trajectory may change over time, asdiscussed in Section 1, these color histograms are relatively similar as compared tothose of other objects. An obvious solution to the problem of the naive-I method is toexploit the similarity between the color histograms in an object trajectory. The ideais to group the mapped points that belong to the same object trajectory by an MBB.That is, bounding the points of each object trajectory by an MBB, as shown in Fig. 5.Then, cTraj stores the MBBs in the R−tree instead of individual points.

Since the number of trajectories is equal to the number of objects in a sequence,the number of points NPcTraj stored in the R−tree is:

NPcTraj =∑NS

i=1No

[Si

](9)

By comparing (8) and (9), it can be noticed that the cTraj reduces the number ofpoints stored in the R−tree by a ratio of 1/NK. Thus, in reality 1 ≤ ratio ≤ 1/NK.Such optimization of storage results in a smaller R−tree and a faster search time, asproved by our experiments (see Section 4).

The cTraj technique is employed to index object trajectories due to its greatoptimization in storage requirement and efficiency of a search. Thus, each ζi isbounded by a MBB. Let the bounded ζi be denoted by ζMBBi. Hence, a sequenceSi is represented in the RdGd Bd distance space by:

Si = ζMBB1, ζMBB2, . . . , ζMBBNo (10)

3 Searching for sequences

To search for sequences such as videos, it is not practical for users to formulateexact match queries due to the difficulty to remember the exact features of objects.Specifically, when formulating a query that investigates the colors, or color trajec-tories, of objects that exist in queried sequences. Normally, it is difficult for a userto remember the exact shades of colors, the number of colors, to what extent thesecolors changed as an object moved from one instance to another. An obvious solutionto this problem is similarity retrieval, which is supported in cTraj.

Below a tool (query interface) that allows users to formulate queries investigatingtime-varying colors of objects is presented in Section 3.1, the color similarity support

Author's personal copy

J Intell Inf Syst

in Section 3.2, and in Section 3.3 the cTraj algorithm for matching and retrievingsequences is discussed.

3.1 Query interface

To formulate a query that describes the trajectories of objects in a wanted se-quence(s), the user interface shown in Fig. 6 is developed. With this interface, auser can specify the colors of objects at different time instances (or, keyframes) atwhich these objects exist. First, a user selects an object from the ‘Current Object’menu and a keyframe from the ‘Current Frame’ menu, then specifies the colors andtheir percentages of the selected object using the color palette and the text area,respectively. Then, a user clicks on the ‘Include Specified Colors’ button to acceptthe currently specified colors for current object at the current frame. The ‘IncludePrevious Colors’ button allow a user to include the object’s colors specified at aprevious frame into the current frame. That is, if the colors of an object are the sameas in the previous frame, a user can copy them into the current frame without theneed to respecify them.

Users can visualize the specified color histograms of objects in a separate pop-up window (see Fig. 7) by clicking on the ‘DrawColorHists’ button. For example,Fig. 7 shows the trajectories of 3 objects. Under each color histogram, a tag ‘OnFm’indicating the object number (On) and the queryFrame number (Fm). The top row ofcolor histograms belong to O1, it can be noticed that the color histogram changed atF3. In the middle row of color histograms, O2 does not exist in F2. In the bottom rowof color histograms, O3 does not exist in F1. The user interface allows users to modifythe colors of objects at a certain frame without the need to redraw the whole colortrajectory of an object. Let a color histogram and a trajectroy of a queried object

Fig. 6 A user interface for formulating queries investigating trajectories of objects in the wantedsequences

Author's personal copy

J Intell Inf Syst

Fig. 7 A pop_up window thatshows the trajectories(time-varying colors) ofqueried objects.

in the color space (high-dimensional space) be denoted by Cq and ζqi , respectively.

Hence,

Q : ζq1 , ζ

q2 , . . . , ζ

qNqo (11)

Where,

ζqi : Cq

1 , Cq2 , . . . , Cq

j 1 ≤ i ≤ Nqo; 1 ≤ j ≤ Nqk (12)

Where, Nqo and Nqk are the number of queried objects and number of queryFrames,respectively. Finally, by clicking on the ‘Search’ button, each color histogram of thequeried objects is mapped using the topological mapping method into a point in theRdGd Bd distance space, in the same way as discussed in Sections 2.4.1. The set ofpoints in the RdGd Bd distance space of the queried object are bounded by an MBB,which represents the time-varying color features of the queried object. Thus, a queryQ is represented by all trajectories of the Nqo specified objects.

Q : ζqMBB1, ζ

qMBB2, . . . , ζ

qMBBNqo (13)

3.2 Similarity support

The proposed technique supports similarity retrieval at the object level and trajectorylevel.

Author's personal copy

J Intell Inf Syst

Object level At this level, the degree of perceptual similarity between the differentcolors is defined. As mentioned in Section 2.2, the quantized RGB color space, whichis used to express the color(s) of an object, consists of 64 colors. Thus, a 64 × 64matrix of metric values representing the degree of conceptual similarity betweenthe different colors is created. These value are based on our perceptual similaritybetween colors, but were further refined during the experimentation of the system.

Trajectory level The number of mapped points Nqk in the query object trajectory ζqi

is normally less than, or equal to, the number of mapped points Nk in the target objecttrajectory ζi. Users are not expected to have prior knowledge about Nk. Thereforeexact match is not possible in this case, which requires Nqk = Nk, and thus cTrajsupports similarity retrieval. That is, the distance between ζ

qi and ζi should be based

on how many color histograms in ζqi have a match, within tolerance ∈, with color

histograms in ζi.The details of how to match a query with a target sequence are discussed below.

3.3 cTraj matching

As illustrated in Fig. 8, when matching Q with the database of sequences, such asvideos, each sub-query ζ

qMBBi in Q is matched with MBBs in the R−tree separately;

as a result, a cluster (a small subset of the MBBs in the R−tree) that is most similarto ζ

qMBBi is retrieved. Hence, Nqo clusters are retrieved. These clusters are combined

and a list is generated, where each item in the list contains a sequence identifier(SID) and a number of retrieved MBBs (Nb ) that belong to this sequence. Where,1 ≤ Nb ≤ Nqo. The retrieved list of candidate sequences contains all the qualifying

Fig. 8 Block diagram toillustrate the steps of retrievingsimilar sequences to userquery

Author's personal copy

J Intell Inf Syst

sequences, which are the result of matching has no false dismissals; however, it mightcontain false alarms, as has been proved by Lemma 1. To remove these false alarms,a two-step filtering process is proposed.

3.3.1 Filtering: f irst step

This step makes use of the number of retrieved MBBs, Nb , that belongs to each shotin the retrieved cluster to compute a lower-bound distance Dlb (Q, S) between thequery Q and a sequence S. Thus, Dlb (Q, S) is defined as follows:

Dlb (Q, S) = Nqo − Nb

Nqo(14)

The Dlb (Q, S) is a rough distance that underestimates the actual distance D(Q, S)

between Q and S. Based on the computed Dlb (Q, S), sequences that have distancesgreater than, or equal to, σ from Q are removed because they are not consideredsimilar. Where, σ is a variable that can be set by a user depending on the applicationand the desired strictness of similarity required by a search. In this paper, σ is set to0.5; that is shots that are distant (dissimilar) by more than 50% from Q are removedin this first step of filtering. Now, it is necessary to prove that Dlb (Q, S) lower-bounds(smaller than, or at least equal to) the actual distance D(Q, S). That is, the Dlb (Q, S)

is the minimum distance between Q and S.

Lemma 2 The lower-bound distance Dlb (Q, S) (14) between Q and S is smaller than,or at least equal to, their actual distance D(Q, S). Mathematically, (15) should beproven:

Dlb (Q, S) ≤ D (Q, S) (15)

Proof By contradiction. The Dlb (Q, S) (14) is solely computed based on the numberof unretrieved MBBs; thus, the Dlb (Q, S) assumes that the collective distancecontributed by the retrieved Nb trajectories (MBBs) is:

∑Nb

i=1D

qMBBi, ζMBBi

) = 0 (16)

However, the actual distance of between corresponding object trajectories is:

D(ζ

qMBBi, ζMBBi

) =∑Nqk

v=1D

(Cq

i [v] , Ci [v]) ≥ 0 (17)

Where, Nqk is the number of mapped points, which represent color histograms of thequeried object Oi, in ζ

qMBBi. Therefore, the collective distance of the retrieved Nb

object trajectories is:

∑Nb

i=1D

qMBBi, ζMBBi

) ≥ 0 (18)

which contradicts the assumption of Dlb (Q, S) and proves that the Nb retrievedobject trajectories may contribute to the actual distance; thus, Dlb (Q, S) is theminimum distance between Q and S. ��

Author's personal copy

J Intell Inf Syst

3.3.2 Filtering: second step

In the first step of filtering, sequences with lower-bound distances greater than, orat least equal to, σ are removed from the list of candidate sequences. Althoughthe list of candidate sequences becomes shorter, but still may contain false alarms.Therefore, a second step of filtering is necessary to remove false alarms.

As a second step of filtering, a cTraj_Match algorithm, which matches a query withthe list of candidate sequences is proposed. The list of candidate sequences is onlya small subset of the whole database. The cTraj_match algorithm tries as much aspossible to speed up the search. The cTraj_match algorithm is shown below:

Algorithm cTraj_matchInput: Q and list of candidate sequencesOutput: a sorted list of the top K similar sequences to Q

1. for each Sx, x = 1 .. number of cadidate sequences2. for each ζ

qi, i = 1 .. number of object trajectories in Q

3. for each ζj, j = 1 .. number of color trajectories in Sx

4. compute distances between each color histogram Cqk in ζ

qi and Cl in ζj

5. correspond(Cqk, Cl) //assign Cq

k in ζqi to their most similar Cl in ζj

6. compute D(ζ

qi, ζj

)

7. correspond(ζqi, ζj) // assign ζ

qi in Q to their most similar ζj in Sx

8. compute D(Q, Sx)

9. sort all sequence based on their D(Q, Sx) in an ascending order and select thefirst K sequences

Here, the cTraj_match algorithm is explained. Each sequence Sx in the list ofcandidate sequences is matched with the query Q by first Computing the distanceD(Cq

k, Cl) between each color histogram Cqk in the queried object trajectory ζ

qi

and every color histogram Cl in the target object trajectory ζj (see lines 2–4). TheD(Cq

k, Cl) is computed by the following distance function,

D(Cq

k, Cl) =

∣∣∣∣∣∣∣

∑q

i=1

⎢⎣(Cq

k [i] − Cl [i]) +

∑b

j = 1i �= j

aij(Cq

k

[j] − Cl

[j])

⎥⎦

∣∣∣∣∣∣∣(19)

Where, a and b are the number of colors in the color histograms Cqk and Cl ,

respectively. aij is the perceptual similarity coefficients between the different colors(Aghbari et al. 2003).

Commonly, the number of queryFrames Nqk formulated by a user is much smallerthan the number of sampled instances Nk from a sequence. Therefore, it is necessaryto correspond the few color histograms in ζ

qi to their most similar color histograms

Author's personal copy

J Intell Inf Syst

in ζj. The correspondance between two color histograms in Cqk and Cl is defined as

follows:

Definition 6 Two color histograms Cqk and Cl in ζ

qi and ζj, respectively, correspond to

each other if, and only if: (1) D(Cqk, Cl) ≤ τ, (2) there shall not be another Cm whose

D(Cqk, Cm) < D(Cq

k, Cl) , where m > l (3) choronological correctness is guaranteed.

This definition suggests that three conditions must be met to correspond anypair of color histograms, Cq

k and Cl : (1) D(Cqk, Cl) ≤ τ ; in our implementation, τ is

set to 0.3, which means that two color histograms should be dissimilar by at most30%, (2) D(Cq

k, Cl) should be smaller than the distance D(Cqk,Cm) between Cq

k andany other color histogram in ζj, where m > l, (3) correspondance shall guarantee thechoronological correctness because the color histograms in both ζ

qi and ζj depict the

choronological changes (time-varying changes) in colors of an object. For example,assume ζ

qi contains two color histograms and ζj contains six color histograms; by

comparing the distances of the two sets of color histograms, it was found that Cq1 ↔

C3 and Cq2 ↔ C3, where ↔ denotes correspond to. This is a situation, where corre-

spondance is not choronologically correct, because C3 is extracted from keyframeK3 and C2 is extracted from keyframe K2, where C2 should be one of the colorhistograms in [C4.. C6]. Hence, the function in line 5, correspond(Cq

k, Cl), guaranteeschronological correspondance between the color histograms of object trajectories inζ

qi and ζj. Then, the distance between the corresponded object trajectories, D(ζ q

i , ζ j),is computed (see line 6).

Commonly, the number of queried objects Nqo is smaller than, or equal to, thenumber of object trajectories NO in the target sequence. Therefore, cTarj_matchalgorith finds correspondance between the object trajectories Q and those of Sx.Fortunately, finding correspondance between the object trajectories is simple andeach ζ

qi is corresponded to ζj gives the smallest distance D(ζ q

i , ζj). Notice thatour algorithm only supports one-to-one correspondence, which is no one objecttrajectory in Q is corresponded to more than one object trajectory in Sx. The sumof all the distances between the corresponded object trajectories is considered thedistance between the Q and Sx. Finally, (line 9) all the sequences are sorted in anascending order based on the computed distances. The top K shots in the sorted list,which are considered the most similar to Q, are returned to the user as a result of asearch.

4 Experimental results

The experiments shown below are performed on a collection of 490 video sequencescategorized into sports (bike racing, car racing, skiing, soccer and football), movie,and animation. The system is implemented using Visual C++ on a HP laptop with1.83 GHz Intel Core(TM)2 CPU. And, the query interface is developed using Java.Several experiments are conducted to evaluate the effectiveness and efficiency of thecTarj method.

The requirements for formulating queries for the below experiments are quitesimple. For each experiment, target sequences that contain moving objects arerandomly selected. Then, for each target sequence, a user chooses one or more

Author's personal copy

J Intell Inf Syst

objects based on which to formulate the query. Using the given user interface, auser specifies the features of each selected object at different time instances beforesubmitting the query for execution. The specified object’s feature may vary atdifferent time instances. In the experiment, we made sure that queries investigatedifferent number of selected objects and different number of time instances.

4.1 cTraj effectiveness

In information retrieval systems, the effectiveness of retrieval is normally measuredby two metrics: recall and precision. Specifically, for our sequence collection, recallmeasures the capacity of cTraj to retrieve all relevant sequences from the collection. Itis defined as the ratio between the number of retrieved relevant sequences ρ and thetotal number of relevant sequences γ in the collection.

recall = ρ

γ(20)

On the other hand, precision measures the retrieval accuracy; that is the ability of acTraj to reject false alarms. It is defined as the ratio between the number of retrievedrelevant sequences ρ and the total number of returned sequences δ.

precision = ρ

δ(21)

To compute the recall and precision of the cTraj technique, 20 sample queries areissued to search for 20 randomly selected sequences. These sample queries vary inthe number of object trajectories, number of queryFrames in each color trajectory,number of colors used to describe each object’s color histograms. Then, the returnedlist of shots of each sample query is compared with its ground truth. Where, theground truth of each sample query is determined manually before hand by examiningall the sequences in the collection and selecting the relevant ones to the sample query.For each sample query, (20) and (21) are used to computed recall and precisionof the cTraj technique. By varying the number of returned sequences from 1 to10, a curve representing the average recall and precision is generated as shown inFig. 9. From Fig. 9, it can be noticed that precision is quite high when the number ofreturned sequences is small and then starts to slope down as the number of returnedsequences increases. On the other hand, recall starts low and smoothly increases asthe number of returned sequences increases. That is due to the fact that during theprocess of trying to retrieve all relevant sequences to a sample query some irrelevantsequences (false alarms) are also retrieved, reducing the precision. However, bothaverage recall and precision are high on the average, which proves the effectivenessof the cTraj technique.

4.2 cTraj efficiency

To evaluate the efficiency of the cTraj technique, 256 queries are issued with thefollowing coefficients ranging as follows: the number of queried object trajectoriesranging from 1 to 4 and the number of instances (refered to as queryFrame in theseexperiment) ranging from 1 to 4.

Author's personal copy

J Intell Inf Syst

Fig. 9 The precision and recallcurves averaged over 20sample queries

4.2.1 Response time

To evaluate the efficiency of the cTraj technique, the response time (from the startof executing a query till the end of displaying the result) is compared to that ofa sequential scanning method. By fixing one of the coefficients (either numberof object trajectories or number of queryFrames), the average response time ofthe queries is computed versus the other coefficient using the cTraj techniqueand sequential scanning method. The sequential scanning method supports objecttrajectories however it does not index them using a SAM method. Figures 10 and11 compares the average response time between cTraj and the sequential scanningmethod.

Figure 10 shows the average response time in seconds versus the number ofobject trajectories, where each graph measures the averages at a certain number ofqueryFrames (ranging from 1 to 4). It can be noticed that the response time of bothmethods increases as the number of object trajectories in the query increases becausethe database is accessed once for every object trajectory in the query. Notice thatas the number of queryFrames increases, the response time curve of the sequentialmethod is shifted up with greater steps (indicating greater increase in response time),while the response time curve of the cTraj technique slightly increases (showing somerobustness to varying the number of queryFrames as compared to the sequentialmethod).

0

5

10

15

20

25

30

1 2 3 4

# queryFrames = 1

SequentialcTraj

0

5

10

15

20

25

30

1 2 3 4

# queryFrames = 2

SequentialcTraj

0

5

10

15

20

25

30

1 2 3 4

# queryFrames = 3

SequentialcTraj

0

5

10

15

20

25

30

1 2 3 4

# queryFrames = 4

SequentialcTraj

Fig. 10 Shows the average response time in seconds (vertical axis) versus number of object trajecto-ries (horizontal axis), where each graph shows the comparison at a certain number of queryFrames(ranging from 1 to 4)

Author's personal copy

J Intell Inf Syst

# Object Trajectories = 2 # Object Trajectories = 3# Object Trajectories = 1

0

5

10

15

20

25

30

1 2 3 4

SequentialcTraj

0

5

10

15

20

25

30

1 2 3 4

SequentialcTraj

0

5

10

15

20

25

30

1 2 3 4

SequentialcTraj

# Object Trajectories = 4

0

5

10

15

20

25

30

1 2 3 4

SequentialcTraj

Fig. 11 Shows the average response time in seconds (vertical axis) versus number of queryFrames(horizontal axis), where each graph shows the comparison at a certain number of object trajectories(ranging from 1 to 4)

Figure 11 show the average response time in seconds versus the number ofqueryFrames, where each graph measures the average response time at a certainnumber of object trajectories (ranging from 1 to 4) in the query. Notice that theaverage response time of both the sequential and the cTraj technique increases asthe number of queryFrames increases; that is because the number of color histogramsincreases, which result in additional color histogram comparisons between the queryand the target sequences. By inspecting these graphs, it is clear that as the numberof object trajectories increases the response time curve of the sequential methodincreases with greater steps, while the response time curve of the cTraj techniqueslightly increases (indicating some robustness to varying the number of color tra-jectories as compared to the sequential method). The overall reduction in responsetime by the cTraj technique is computed to be about 64% as compared to that of thesequential method.

4.2.2 Retrieved fraction of the database

To evaluate the selectivity of the filtering steps, the retrieved fraction of the sequencedatabase is measured after both the first step of filtering and after the second step offiltering, which is the fraction returned as the result of a query. The results shownbelow are the averages over all of the 256 sample queries. Tables 2 and 3 show thepercentage of the database that is retrieved after each filtering step.

Table 2, records the fraction (percentage) of the database after the first and secondsteps of filtering as the number of object trajectories vary from 1 to 4. These resultsare the averages of issued sample queries for each number of object trajectories.Notice as the number of color trajectories increase the retrieved fraction after thefirst step filtering increases slightly (in the range of 26.8–46.8%) due to the fact that

Table 2 Retrieved fraction of the database of shots as the number of color trajectories is varied from1 to 4

# object trajectories % of DB after 1st filtering % of DB after 2nd filtering

1 26.8 15.82 35.8 7.93 43.7 7.84 46.1 7.4

Author's personal copy

J Intell Inf Syst

Table 3 Retrieved fraction of the database of shots as the number of queryFrames is varied from 1to 4

# queryFrames % of DB after 1st filtering % of DB after 2nd filtering

1 36 122 40.5 10.23 35.4 7.94 39 8.7

for each object trajectory one cluster of sequences is retrieved from the database,and then these cluters are merged and filtered (see Section 3.3). Thus, an increase inthe number of object trajectories increases the number of retrieved clusters, whichin turn increases the number of candidate sequences retrieved. On the other hand,the retrieved fraction after the second step of filtering decreases (in the range of15.8–7.4%) as the number of object trajectories increases because an increase in thenumber of object trajectories increases the strictness of selectivity of the second stepof filtering. That is, in order for a candidate shot to be selected, it should match agreater number of object trajectories.

Table 3 shows the retrieved fractions of the database after the first and secondsteps of filtering as the number of queryFrames vary from 1 to 4. Here, as the numberof queryFrames increases the number of color histograms of an object increases,which in turn increases the number of mapped points of an object trajectory. Thus,the MBB of these mapped points may grow larger to include the added point, whichresults in an increase in the retrieved fraction of sequences (in the range of 36–39%)because the larger the MBB the more overlap it may make with other MBBs in theR−tree. However, the retrieved fraction after the second step of filtering decreases(in the range of 12–7.7%) as the number of queryFrames increases due to the factthat an increase in the number of queryFrames increases the strictness of selectivityof the second step of filtering. That is, an object trajectory in a candidate sequencemust match greater number of color histograms to be selected as a match to thecurrent query trajectory.

Fig. 12 Average responsetime in seconds versus numberof objects in query

Author's personal copy

J Intell Inf Syst

Table 4 Comparison between the naive-I and cTraj techniques in terms of storage requirements ofthe R−tree

Description naive-I cTraj

Number of shots 490 490Number of inserted MBBs 4747 1545Number of nodes 238 53Size of R−tree in bytes 216938 63525

4.2.3 cTraj vs naive-I

To show the response time optimization achieved by the cTraj technique, the cTrajmethod is compared with the naive-I method (see Section 2.5). Since the naive-I doesnot support color trajectories, which is it only supports a one-queryFrame queries,therefore a one-queryFrame queries are used in comparing the response time ofboth techniques. In the experiment, 100 sample queries searching for 100 randomlyselected shots are formulated and issued on both techniques. 25 of these samplequeries contain 1 object, 25 sample queries contain 2 objects, 25 sample queriescontain 3 objects, and 25 sample queries contain 4 objects. The average responsetime of each set of these sample queries is computed for both methods and drawn inFig. 12. Notice from Fig. 12 that cTraj greatly reduces the response time as comparedto the naive-I method. That is because grouping the mapped points, which belongto a certain object trajectory, by an MBB and then inserting those MBBs into theR−tree instead of the individual points, results in a smaller R−tree. That, in turn,speeds up the response time. It can be noticed that the percentage of reduction inresponse time becomes greater as the number of query objects increases because thedatabase is accessed once for every object.

cTraj optimized storage requirements of the R−tree and that is the main reasonbehind the speed up of the search as shown in Fig. 12. Table 4 presents a summaryof the optimization of storage requirements using the cTraj technique as comparedto the naive-I method. Notice that by using the cTraj technique, the 490 shots (thedatabase used for experimentation) contain 1545 object trajectories, and by using thenaive-I method, 4747 objects are extracted from those sequences. Thus, the size of theR−tree, which is used by the cTraj is 63525 bytes, while the size of the R−tree usedby the naive-I method is 216938 bytes. That is, cTraj reduces the size of the R−tree,as compared to the naive-I method, by more than 70%.

5 Related work & background

Previous work in indexing and searching for sequences, such as videos, has focusedon extracting visual features of shots, computing feature vectors, representing the se-quences by the computed feature vectors, and retrieving those sequences. However,the problems mentioned in Section 1 (huge storage requirements, slow search andlow accuracy due to lack of representation of time varying features of objects) haveremained without much improvement. In this paper, a cTraj technique that addressesthese problems is proposed. The work in this paper touches on three principles:

Author's personal copy

J Intell Inf Syst

video databases, time series databases, and high-dimensional data indexing. In thefollowing subsections, the related works from these disciplines are discussed.

5.1 Video databases

Few works have proposed models to index and search for shots based on the time-varying contents of these shots. An approach to represent shots by the time-varyingdirections of objects and the time-varying spatial relationships between objects isproposed by the systems in Li et al. (1997), Nabil et al. (1997) and Broilo et al. (2010).For example, in Li et al. (1997), a motion trajectory is considered as a set of points. Ateach point, an object is represented by a displacement from the previous point anddirection of the object. Also, at each point, the spatial relationships between objectsare represented. However, these systems (Broilo et al. 2010; Li et al. 1997; Nabil et al.1997), only address an exact match algorithm for matching user queries with targetshots.

The system in Dagtas et al. (1999) indexes and queries a shot by an object’s trailwhich is the area covered by the object throughout the shot. However, the modelonly incorporates the motion feature and does not support querying of multipleobjects (using more than one trail) in a scene. Another approach to query shots bythe time-varying directions of moving objects is proposed by Chang et al. (1997).The model supports multiple object queries. However, the model does not supportqueries that explicitly investigate temporal relationships between multiple objects.The above mentioned models (Li et al. 1997; Nabil et al. 1997; Dagtas et al. 1999;Chang et al. 1997) focus on modeling and representing the video data, but usea sequential search algorithm. Thus, these systems cannot be directly applied inreal-time, or in a tolerable interactive, environment where efficiency of a search isessential. In addition, the system in Morris and Trivedi (2008) proposes a systemthat represents the video content by a motion trajectory. In their system they reducedimensionality by transforming the trajectory using DFT and thus they represent thetrajectory with only the first few DFT coefficients.

Several systems, such as Flickner et al. (1995), Kobla et al. (1998), Oh and Hua(2000), have investigated the efficiency of video database systems. In Oh and Hua(2000), a camera-tracking technique to detect the type of camera operation by utiliz-ing only two values is proposed. However, this method cannot be applied in indexingshots, because representing a shot by only two values considerably underestimatesthe content of a shot, which contains several moving objects. Nevertheless, thismethod can be used as a rough filtering step in the process of retrieval.

In Kobla et al. (1998), MPEG encoded video frames are mapped into a trailof points in low-dimensional distance space using the FastMap method, which isproposed by Faloutsos (1996), where each frame is mapped into a point in low-dimensional distance space. This work is intended to analyze transitions betweenconsecutive shots and to cluster shots’ frames in order to visualize the underlyingstructure of a video. However, representing a frame by one point is an underestima-tion to the content of the frame, which may contain several objects. Therefore, thismethod can not be applied for effecitve content-based shot retrieval.

The system in Flickner et al. (1995) uses a bounding method to approxi-mate the distance of the high-dimensional color histograms of objects/frames by

Author's personal copy

J Intell Inf Syst

computing the average color components of the RGB color space. These averagesare computed and stored in the database. Then, a frame, or an object, is representedby a 3-dimensional value (Ravg, Gavg, Bavg), which is inserted into an R*-tree. Thedisadvantage of this approach is that the average colors of frames/objects should beprecomputed and stored in the database resulting in additional storage overhead.

The systems in Flickner et al. (1995), Kobla et al. (1998), Oh and Hua (2000)do not capture the time-varying features of objects and don’t optimize the storagerequirement of SAMs used in these systems. However, meaningful description of thefeature trajectory of an object can be utilized in activity classification (Naftel andKhalid 2006), which would speed up the search, in analyzing an object’s behavior(Bashir et al. 2007; Morris and Trivedi 2008), and in describing an interaction(Parameswaran and Chellappa 2006; Ma et al. 2009).

5.2 Time series databases

In time series databases, the works in Faloutsos (1996) and Faloutsos and Lin (1995)have discussed an ad hoc solution to the problem of high-dimensionality and largenumber of time sequences. A fast algorithm, called FastMap, is introduced to maptime sequences into points in some unknown k-dimensional space, such that thedistances between objects are preserved in the k-dimensional space. Then, thosemapped points are organized by an R*-tree. This method can be applied to map shotsinto points in low-dimensional space, but the problem is that it requires that distancesbetween all shots in the high-dimensional feature space be known in advance.

The work in Faloutsos et al. (1994), proposed a method to efficiently locatesubsequences within a collection of time series sequences, such that the subsequencesmatch a given query pattern within a specified tolerance. A sliding window over atime series sequence extracts its features; the result is a trail in feature space. Eachtrail is divided into sub-tails, which are then represented by their minimum boundingrectangles (MBRs). These MBRs are then indexed using an R*-tree. Dividing thetrails into sub-trail is a method of grouping related data and then indexing is basedon these groups. In spirit, this algorithm is similar to ours in terms of reducing thesize of the SAM in order to speed up the search.

Another method for exact ranked subsequence matching is proposed in Han et al.(2007). In that method, queries and database sequences are broken into segments,and lower bounds are established using LB_Keogh (2002) to prune the majority ofcandidate matches. In Keogh (2002), an efficient method for retrieving subsequencesunder Dynamic Time Warping (DTW) is presented. Their idea is to speed upDTW distance computation by reducing the length of query sequence and databasesequences by representing them as ordered lists of segments. The SPRING methodfor subsequence matching is proposed in Sakurai et al. (2007). In SPRING, optimalsubsequence matches are identified by running the DTW algorithm between thequery and each database sequence. Subsequences are identified by prepending themto the shorter sequence.

5.3 Spatio-temporal data indexing

The spatio-temporal indexing techniques retrieve all objects that satisfy the spatialconstraints at some time point during a specified time interval. The focus of these

Author's personal copy

J Intell Inf Syst

techniques is to record the location information of moving objects in past, currentand/or future times. The proposed index in Pfoser et al. (2000) stores the historicalmovements of objects as sequences of connected, spatio-temporal line segments.Then these segments are stored in a Spatio-Temporal R−tree (STR−tree) basedon their spatial proximity trajectory membership. This index experiences degradingperformance as the length of the history increases since it stores all line segments ofmoving object. On the other hand, the proposed technique, cTraj, indexes the MBBof the sequence which results in a great space reduction as compared to Pfoser et al.(2000). This reduction in space requirement in the index contributes to the increasein efficiency of retrieving the query results.

Other techniques, such as Saltenis et al. (2000), Saltenis and Jensen (2002), indexthe current and future locations of moving objects. The index proposed by Salteniset al. (2000), called TPR−tree, indexes the individual location points as well as thebounding rectangles of moving objects in non-leaf nodes, which are augmented withvelocity vectors. These velocity vectors support the queries that can predict thelocation of an object. The index proposed by Saltenis and Jensen (2002), calledREXP−tree, builds on the index of Saltenis et al. (2000) and addressed the issueof expiring information. The REXP−tree records the current and anticipated futurepositions of moving point objects, assuming that their positions expire after specifiedtime intervals. The indexes in Saltenis et al. (2000), Saltenis and Jensen (2002) focuson storing detailed current location information along with speed and directioninformation of moving objects to support predictive queries; therfore, these indexstructures can greatly grow in size as the number of objects and/or length of objects’sequences increase. Unlike these indexes, the proposed technique in this paper,cTraj, minimizes the size of the index structure by storing the MBBs of movingobjects, which leads to more efficient query processing.

The indexes proposed by Lin et al. (2005), Sun et al. (2004) records the past,current and future location information of the moving objects. While the index inSun et al. (2004) contains a summary structure that supports retrieval of approximateaggregate query results, the index in Lin et al. (2005) retrieves exact query resultsof moving objects by linearizing the timestamped locations using a B+−tree. Thisgroup of indexes (Lin et al. 2005; Sun et al. 2004) stores the location informationof both of the two previous groups: past locations (Pfoser et al. 2000) and currentand future locations (Saltenis et al. 2000; Saltenis and Jensen 2002). Therefore, moreinformation are stored in the indexes to support queries that utilize past, currentand future spatio-temporal information of moving objects, which result in hugeindex structures that lead in turn to less efficient query processing. The focus of theproposed cTraj in this paper is to store compact temporal feature information, suchas colors, of moving objects and maps these features into points in low-dimensionalspace. The sequence of points of an object is represented by an MBB that is indexedby an R−tree. Such a representation reduces the space requirement of an index andthus leads to more efficient query processing.

5.4 High-dimensional data indexing

The R−tree is a high-dimensional indexing structure that was proposed by Guttman(1984). The R−tree and its variants, such as R+−tree (Sellis et al. 1987) and

Author's personal copy

J Intell Inf Syst

R*-tree (Beckmann et al. 1990), are commonly used for indexing and clusteringmultidimesional data (spatial data, CAD data, etc.). In this paper, the basic R−tree isused to index the color trajectories of objects that exist in shots. Nevertheless, usingeither the R+−tree or the R*-tree may increase retrieval performance of a system.

In cTraj, the time-varying colors of an object are mapped into points in low-dimensional distance space. These points, which belong to one object, are boundedby an MBB, which represent the color trajectory of the object. The MBBs of objectsare indexed by an R−tree. As shown in Fig. 13, a non-leaf node NNL contains:

NNL : P1, P2, ..., Pi m ≤ i ≤ MPi : (ptr, box)

(22)

Where Pi is an entry in the non-leaf node NNL, ptr is a pointer to a child node in theR−tree and box is the MBB that cover all MBBs in the child nodes. Also, m and Mare the minimum and maximum number of entries in the node, respectively. Basedon the work in Beckmann et al. (1990), 2 < m < M/2 and the value of M in the non-

Fig. 13 a Points (solid points), which represent color histograms of objects, in low-dimensionalspace are bounded by MBBs (dotted rectangles), which represent color trajectories, are organized inan R−tree with fanout = 4. b the R−tree corresponding to a

Author's personal copy

J Intell Inf Syst

leaf node could be different from that of the leaf node. On the other hand, a leafnode NL contains an entry:

NL : P1, P2, ..., Pi m ≤ i ≤ MPi : (cTraj_id, cTraj)

(23)

Where, Pi is an entry in the leaf node NL, cTraj_id is an identifier of a color trajectoryand cTraj is a pointer to the color trajectory ζi of object i. In R−trees, the parentnodes are allowed to overlap. Consequently, R−trees can guarantee good spaceutilization and remain balanced. Figure 13a shows a 2-dimensional view of the MBBs(dotted rectangles) of color trajectories and their bounded points (solid dots), whichrepresent the individual color histograms of objects, organized in an R−tree withfanout = 4. The file sturcture for the same R−tree is shown in Fig. 13b, where nodescorrespond to disk pages.

6 Conclusion

An effective and efficient technique, called cTraj, is proposed to index and retrievesequences by specifying the time-varying features of multiple moving objects thatexist in the wanted sequence(s). In addition to supporting queries that investigate thetime-varying color contents of sequences, cTraj optimizes the storage requirementsof the R−tree and speeds up the response time. In the cTraj technique, a mappingmechanism, called topological mapping, is used to map the object trajectories intopoints in a low-dimensional distance space and at the same time guarantee thecorrectness of result. The performed experiments show a substantial optimizationin storage requirements of the R−tree and great speed up of response time whenusing the cTraj technique as compared with the sequential scanning method or thenaive-I method.

References

Aghbari, Z., Kaneko, K., & Makinochi, A. (2003). Content-trajectory approach for searching videodatabases. IEEE Transaction on Multimedia, 5(4), 516–531.

Bashir, F., Khokhar, A., & Schonfeld, D. (2007). Object trajectory-based activity classification andrecognition using hidden markov models. IEEE Transaction on Image Processing, 16(7), 1912–1919.

Beckmann, N., Kriegel, H., Schneider, R., & Seeger, B. (1990). R*-tree: An ef f icient and robust accessmethod for points and rectangles. ACM SIGMOD, pp. 322–331.

Broilo, M., Piotto, N., Boato, G., Conci, N., & De Natale, F. (2010). Object trajectory analysis invideo indexing and retrieval applications. Video search and mining, studies in computationalintelligence, (vol. 287, pp. 3–32). Berlin Heidelberg: Springer-Verlag.

Chang, S. F., Chen, W., Meng, H. J., Sundaram, H., & Zhong, D. (1997). VideoQ: An automatedcontent based video search system using visual cues. Seattle, WA: ACM Multimedia Conf.

Dagtas, S., Al-Khatib, W., Ghafoor, A., & Khokhar, A. (1999). Trail-based approach for video dataindexing and retrieval. Florence, Italy: ICMCS’99.

Faloutsos, C. (1996). Searching multimedia databases by content. Boston: Kluwer AcademicPublishers.

Faloutsos, C., & Lin, K. (1995). A fast algorithm for indexing, data-mining and visualization oftraditional and multimedia datasets. ACM SIGMOD.

Faloutsos, C., Ranganathan, M., & Manolopoulos, Y. (1994). Fast subsequence matching in time-series databses. ACM SIGMOD.

Author's personal copy

J Intell Inf Syst

Ferman, A. M., Tekalp, A. M., & Mehrotra, R. (1998). Effective content representation for video.Chicago IEEE Int’l Conf. on Image Processing.

Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., et al. (1995). Query by imageand video content: The QBIC System. IEEE.

Furft, B., Smoliar, S. W., & Zhang, H. (1996). Video and image processing in multimedia systems.Norwell: Kluwer Academic Publishers, Second printing.

Guttman, A. (1984). R−trees: A dynamic index structure for spatial searching. ACM SIGMOD,pp. 47–57.

Han, W. S., Lee, J., Moon, Y.-S., & Jiang, H. (2007). Ranked subsequence matching in time-seriesdatabases. In International Conference on Very Large Data Bases (VLDB) (pp. 423–434).

Katayama, N., & Satoh, S. (1997). The SR−tree: An index structure for high-dimensional nearestneighbor queries. pp. 369–380. ACM SIGMOD.

Keogh, E. (2002). Exact indexing of dynamic time warping. In International Conference on VeryLarge Data Bases (pp. 406–417).

Kobla, V., Doermann, D. S., & Faloutsos, C. (1998). Developing high-levelrepresentations of videoclips using videotrails. In Proc. of SPIE conference on Storage and Retrieval for Image and VideoDatabases VI, (pp. 81–92).

Korn, P., Sidiropoulos, N., Faloustos, C., Siegel, E., & Protopapas, Z. (1998). Fast and effectiveretrieval of medical tumor shapes. IEEE Transaction on Knowledge and Data Engineering, 10(6).Nov/Dec.

La Cascia, M., & Ardizzone, E. (1996). JACOB: Just a content-based query system for video data-bases. Atlanta: Proc. ICASSP.

Li, J. Z., Ozsu, M. T., & Szafron, D. (1997). Modeling of moving objects in a video database. IEEEint’l Conf. on Multimedia Computing and Systems. (pp. 336–343).

Lin, D., Jensen, C. S., Ooi, B. C., & Saltenis, S. (2005). Efficient indexing of the historical, present,and future positions of moving objects. In: Proceedings of MDM.

Ma, X., Bashir, F., Khokhar, A., & Schonfeld, D. (2009). Event analysis based on multiple interactivemotion trajectories. IEEE Transaction on Circuits and Systems for Video Technology, 19(3),397–406.

Morris, B. T., & Trivedi, M. M. (2008). A survey of vision-based trajectory learning and analysis forsurveillance. IEEE Trans. on Circuits and Systems for Video Technology, 18(8), 1114–1127.

Nabil, M., Ngu, A. H., & Shepherd, J. (1997). Modelling moving objects in multimedia databases.Fifth Int’l Conference on Database Systems for Advanced Applications. Melbourne, Australia.

Naftel, A., & Khalid, S. (2006). Classifying spatiotemporal object trajectories using unsupervisedlearning in the coefficient feature space. Multimedia Systems, 12(3), 227–238.

Oh, J. H., & Hua, K. A. (2000). Ef f icient and cost-ef fective techniques for browsing and indexinglarge video databases. Dallas, TX, USA: ACM SIGMOD.

Overview of MPEG-4 Standard (1999). MPEG-4 subgroups: requirements, audio, delivery, SYNC,systems, video. ISO/MPEG.

Parameswaran, V., & Chellappa, R. (2006). View invariance for human action recognition. Interna-tional Journal on Computer Vision, 66, 83–101.

Pfoser, D., Jensen, C. S., & Theodoridis, Y. (2000). Novel approaches to the indexing of moving objecttrajectories. In Proceedings of the 26th International Conference on Very Large Databases,Cairo, Egypt.

Sakurai, Y., Faloutsos, C., & Yamamuro, M. (2007). Stream monitoring under the time warpingdistance. In IEEE International Conference on Data Engineering (ICDE).

Saltenis, S., & Jensen, C. S. (2002). Indexing of moving objects for location-based services. In Proc.ICDE, (pp. 463–472).

Saltenis, S., Jensen, C. S., Leutenegger, S. T., & Lopez, M. A. (2000). Indexing the positions ofcontinuously moving objects. In SIGMOD Conference.

Sellis, T., Roussopoulous, N., & Faloustsos, C. (1987). The R+−tree: A dynamic index for multi-dimensional objects. 13th VLDB Conference, England, Sept.

Smith, J. R., & Chang, S. F. (1996). VisualSEEK: A fully automated content-based image query system.ACM Multimedia Conference, Boston, pp.87–98, Nov.

Sun, J., Papadias, D., Tao, Y., & Liu, B. (2004). Querying about the past, the present, and the future inspatio-temporal databases. ICDE, pp. 202–213.

Yi, B., Jagadish, H. V., & Faloutsos, C. (1998). Ef f icient retrieval of similar time sequences under timewarping. ICDE’98, Orlando, Florida, Feb. 23–27.

Author's personal copy