Curvelet transform-based technique for tracking of moving objects

21
Published in IET Computer Vision Received on 7th February 2011 Revised on 19th July 2011 doi: 10.1049/iet-cvi.2011.0023 ISSN 1751-9632 Curvelet transform-based technique for tracking of moving objects S. Nigam A. Khare Department of Electronics and Communication, University of Allahabad, Allahabad, India E-mail: [email protected]; [email protected] Abstract: This study provides an object tracking method in video sequences, which is based on curvelet transform. The wavelet transform has been widely used for object tracking purpose, but it cannot well describe curve discontinuities. We have used curvelet transform for tracking. Tracking is done using energy of curvelet coefficients in sequence of frames. The proposed method is simple and does not rely on any other parameter except curvelet coefficients. Compared with a number of schemes like Kalman filter, particle filter, Bayesian methods, template model, corrected background weighted histogram, joint colour texture histogram and covariance-based tracking methods, the proposed method extracts effectively the features in target region, which characterise better and represent more robustly the target. The experimental results validate that the proposed method improves greatly the tracking accuracy and efficiency than traditional methods. 1 Introduction Object tracking in video sequences is a very popular problem in the field of computer vision [1]. Object tracking is a process of locating a moving object (or multiple objects) over time using a single camera or multiple cameras [2]. Its objective is to associate target objects in consecutive video frames. Object tracking is the basis of applications in many areas like security, surveillance, clinical applications, biomechanical applications, human robot interaction, entertainment, education, training and so on. There are two key steps in object tracking process: 1. Object detection: detection of an object in a given scenario 2. Object tracking: frame by frame tracking of object Tracking of moving objects is a complicated task due to the following reasons: 1. The object’s shape and size may vary from frame to frame 2. Object may be occluded by other object(s) 3. Presence of noise and blur in video 4. Luminance and intensity changes 5. Object’s abrupt motion 6. Real-time scene analysis requirements Therefore real-time object tracking is a critical task in computer vision applications and tracking of moving object is a topic of great interest for researchers. In order to perform object tracking in video sequences, an algorithm analyses sequential video frames and outputs the movement of target between the frames. Many tracking algorithms have been proposed so far, for this purpose. A good survey of tracking algorithms is provided by Yilmaz et al. [3]. The mean shift algorithm was originally proposed by Fukunaga and Hostetler [4] for data clustering. It was later modified by Cheng [5]. Bradski [6] again modified it and developed a continuously adaptive mean shift algorithm to track a moving face. Another class of mean shift tracking algorithms is based on Kalman filter [7]. In another technique by Nummiaro et al. [8], a bootstrap particle filter was used to sample the observation model. Zivkovic et al. [9] used an efficient local search scheme to find the likelihood of object region and approximated this region by using Bayesian filtering. Shen et al. [10] built a robust template model from a large amount of data instead of single image for tracking. The major drawback of these techniques is to handle the scale changes. The scale selection is a big concern in case of mean shift tracking. If the scale is too big or too small, then it results into poor localisation. Owing to this reason mean shift tracking algorithms are not much promising for object tracking purpose because one cannot chose an exactly correct size of scale for changing situations. A widely used form of object tracking algorithms uses histograms. Lee and Kang [11] developed an area weighted centroid shifting algorithm that takes the colour histograms into account according to the area they cover in the initial target region and contains more spatial information about the distribution of the colours in the target than the original mean shift-based tracking. Colour histogram is an estimating mode of point sample distribution and is very robust in representing the object appearance. However, using only colour histograms in mean shift tracking has some problems. First, the spatial information of the target is lost. Second, when the target has similar appearance to the background, colour histogram will become invalid to distinguish them. The idea of combining colour and texture IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251 231 doi: 10.1049/iet-cvi.2011.0023 & The Institution of Engineering and Technology 2012 www.ietdl.org

Transcript of Curvelet transform-based technique for tracking of moving objects

www.ietdl.org

Published in IET Computer VisionReceived on 7th February 2011Revised on 19th July 2011doi: 10.1049/iet-cvi.2011.0023

ISSN 1751-9632

Curvelet transform-based technique for trackingof moving objectsS. Nigam A. KhareDepartment of Electronics and Communication, University of Allahabad, Allahabad, IndiaE-mail: [email protected]; [email protected]

Abstract: This study provides an object tracking method in video sequences, which is based on curvelet transform. The wavelettransform has been widely used for object tracking purpose, but it cannot well describe curve discontinuities. We have usedcurvelet transform for tracking. Tracking is done using energy of curvelet coefficients in sequence of frames. The proposedmethod is simple and does not rely on any other parameter except curvelet coefficients. Compared with a number of schemeslike Kalman filter, particle filter, Bayesian methods, template model, corrected background weighted histogram, joint colourtexture histogram and covariance-based tracking methods, the proposed method extracts effectively the features in targetregion, which characterise better and represent more robustly the target. The experimental results validate that the proposedmethod improves greatly the tracking accuracy and efficiency than traditional methods.

1 Introduction

Object tracking in video sequences is a very popular problemin the field of computer vision [1]. Object tracking is a processof locating a moving object (or multiple objects) over timeusing a single camera or multiple cameras [2]. Its objectiveis to associate target objects in consecutive video frames.Object tracking is the basis of applications in many areaslike security, surveillance, clinical applications,biomechanical applications, human robot interaction,entertainment, education, training and so on.

There are two key steps in object tracking process:

1. Object detection: detection of an object in a given scenario2. Object tracking: frame by frame tracking of object

Tracking of moving objects is a complicated task due to thefollowing reasons:

1. The object’s shape and size may vary from frame to frame2. Object may be occluded by other object(s)3. Presence of noise and blur in video4. Luminance and intensity changes5. Object’s abrupt motion6. Real-time scene analysis requirements

Therefore real-time object tracking is a critical task incomputer vision applications and tracking of moving objectis a topic of great interest for researchers. In order toperform object tracking in video sequences, an algorithmanalyses sequential video frames and outputs the movementof target between the frames. Many tracking algorithmshave been proposed so far, for this purpose. A good surveyof tracking algorithms is provided by Yilmaz et al. [3]. The

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251doi: 10.1049/iet-cvi.2011.0023

mean shift algorithm was originally proposed by Fukunagaand Hostetler [4] for data clustering. It was later modifiedby Cheng [5]. Bradski [6] again modified it and developeda continuously adaptive mean shift algorithm to track amoving face. Another class of mean shift trackingalgorithms is based on Kalman filter [7]. In anothertechnique by Nummiaro et al. [8], a bootstrap particle filterwas used to sample the observation model. Zivkovic et al.[9] used an efficient local search scheme to find thelikelihood of object region and approximated this region byusing Bayesian filtering. Shen et al. [10] built a robusttemplate model from a large amount of data instead ofsingle image for tracking. The major drawback of thesetechniques is to handle the scale changes. The scaleselection is a big concern in case of mean shift tracking. Ifthe scale is too big or too small, then it results into poorlocalisation. Owing to this reason mean shift trackingalgorithms are not much promising for object trackingpurpose because one cannot chose an exactly correct size ofscale for changing situations.

A widely used form of object tracking algorithms useshistograms. Lee and Kang [11] developed an area weightedcentroid shifting algorithm that takes the colour histogramsinto account according to the area they cover in the initialtarget region and contains more spatial information aboutthe distribution of the colours in the target than the originalmean shift-based tracking. Colour histogram is anestimating mode of point sample distribution and is veryrobust in representing the object appearance. However,using only colour histograms in mean shift tracking hassome problems. First, the spatial information of the target islost. Second, when the target has similar appearance to thebackground, colour histogram will become invalid todistinguish them. The idea of combining colour and texture

231

& The Institution of Engineering and Technology 2012

www.ietdl.org

features has been exploited by researchers for better targetrepresentation [12]. Ning et al. [13] combined correctedbackground weighted histograms with mean shift for robustobject tracking.

Combining both colour and texture feature is still a difficultproblem. This is because although many texture analysismethods exist, they have high computational complexityand cannot be directly used together with colour histogram.The local binary pattern (LBP) technique is very effectiveto track the object. Recently, LBP was successfully appliedto tracking of moving objects via background subtraction[14]. In LBP, each pixel is assigned a texture value that canbe naturally combined with the colour value of the pixel torepresent targets. Nguyen et al. [15] employed the imageintensity and the LBP features to construct a 2D histogramrepresentation of the targets for tracking thermographic andmonochromatic video. In [16], both the colour and texturefeatures have been combined using LBP to obtain betterresults. Various other techniques also exist for tracking.Appearance model [17] based techniques kinematicallytrack an object in video sequences. To retrieve differenttypes of features, covariance tracking [18] is also veryuseful and is robust against several illumination changes,noise and erratic motion.

However, all the techniques described above have a majordrawback that these techniques do not make use of multipleimage resolution levels. Therefore they are not able tohandle motion of variable size objects. An explicitshortcoming is that they restrict the amount of objectdisplacement to be measured between consecutivevideo sequences. In short, these techniques provide bettertracking accuracy for only within a specific search areaand do not give correct idea of the actual motion of targetobject.

Recently, transformation-based methods have become verypopular for object tracking because they can easily overcomethe above described shortcomings. By transforming imagesfrom one domain to other domain, some information whichis difficult to obtain in one domain can be obtained easilyand efficiently in other domain. Use of Fourier transform[19], discrete cosine transform [20], and so on has beenexplored at an initial stage for tracking. Now wavelettransform is very promising tool for object trackingpurpose. Wavelets are suitable for representing localfeatures. Several methods exist for object tracking usingwavelets [21–25]. Use of dual-tree complex wavelettransform gives better directional selectivity and thisproperty has been exploited by Mansouri et al. [21] forobject tracking. Khansari et al. [22] developed a noiserobust algorithm for tracking the user-defined shapes innoisy video sequences by using the features generated inthe undecimated wavelet packet transform (UWPT). Theyanalysed the adaptation of a feature vector generation andblock matching algorithm in the UWPT domain for trackinghuman objects, in crowded video scenes [23] andintroduced a modified tracking algorithm that can handlepartial or short-term full occlusion [24]. Daubechiescomplex wavelet transform has also been used for efficienttracking [25].

However, the wavelet transform does not process curvediscontinuities optimally, and discontinuities across a simplecurve affect all the wavelets coefficients on the curve. TheRidgelet transform was introduced to overcome theweakness of wavelets in higher dimensions [26]. Ridgeletsprovide a good representation for line singularities in 2Dspace. Xiao et al. [27] presented an object tracking system

232

& The Institution of Engineering and Technology 2012

based on the Ridgelet transform and proved to be analternative to wavelet representation of image data.

The Ridgelet transform alone cannot represent curvediscontinuities efficiently. The new tight frame of curveletis an effective non-adaptive representation for objects withedges [28]. The continuous curvelet transform [29, 30] andfast discrete curvelet transform [31] are capable of handlingcurve discontinuities efficiently. This property of curvelethas been widely used for image denoising [32, 33] andcharacter recognition [34, 35]. Zhang et al. [36]experimentally confirmed that curvelet transform can beused more effectively to extract feature information. Leeand Chen [37] used the digital curvelet transform to capturethe high-dimensional features at different scales and atdifferent angles of object. Also Mandal et al. [38] presentedan improvement by reducing the number of coefficients.Recently in [39, 40], it has been shown that curvelets canefficiently be used for tracking.

Extending the previous works done on applications ofcurvelet transform, in this paper, we present a new methodfor object tracking using curvelet transform in a videosequence. The approach consists of two steps: use ofcurvelet coefficients for segmentation and use of energy ofcurvelet coefficients for object tracking. Compared withthe traditional Kalman filter method [7], particle filter method[8], Bayesian filter method [9], template model method [10],corrected background-weighted histogram method [13], jointcolour–texture histogram-based method [16] and covariancetracking method [18], the proposed algorithm extractseffectively the features in object region, which characterisebetter and represent more robustly the object.

The rest of the paper is organised as follows: Section 2describes basic concepts of curvelet transform. Section 3 dealswith the proposed tracking algorithm. Experimental resultsand conclusions are given in Sections 4 and 5, respectively.

2 Curvelet transform

Curvelet transform is a new transform that provides multiscalegeometric analysis. It is based on parabolic scalingwidth ≃ length2. It was proposed by Candes and Donoho [28]in 1999. Curvelet transform is capable of handling curvediscontinuities well as compared to wavelet and Ridgelettransform. There are four steps in the implementation ofcurvelet transform, which are given in the following subsections.

2.1 Sub-band decomposition

In this step, an image is decomposed into different sub-bandsby using energy of different layers. Let f is an image thatneeds to be decomposed and P0 and Ds are low-pass andhigh-pass filters, respectively. Then this image can bedecomposed as

f � (P0f , D1f , D2f , ..., Dk f ) (1)

where 1 ≤ s ≤ k and k is the last sub-band. The sub-bandsused in the curvelet transform are localised near non-standard form of frequencies [22s, 22s+2]. Afterdecomposing into different sub-bands, we obtain differentsmall high-pass sub-images as output.

2.2 Smooth partitioning

In the second step, different small images obtained throughsub-band decomposition are now centerised along a grid of

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251doi: 10.1049/iet-cvi.2011.0023

www.ietdl.org

dyadic squares. By doing this, we can obtain each image inthe domain of dyadic squares. We achieve this as

hQ = wQDs f (2)

where wQ is small windowing function applied on each smallimage obtained through sub-band decomposition. The grid ofdyadic squares is defined as

Q = k1

2s,

k1 + 1

2s

[ ]× k2

2s,

k2 + 1

2s

[ ][ Qs (3)

where Qs belongs to all the dyadic squares of the grid, k1

corresponds to first high-pass filter and k2 corresponds tolast high-pass filter. Hence this grid is aligned between thelayers of first and last high-pass filters.

2.3 Renormalisation

Third step is the renormalisation. After smooth partitioning,renormalisation of each sub-band is done. In this step, eachsub-image is converted from dyadic square to unit square[0,1] × [0,1]. This is done to make the further process easyand to reduce the computational complexity.

For renormalisation, an operator TQ is defined for each Q as

(TQ f )(x1, x2) = 2sf (2sx1 − k1, 2sx2 − k2) (4)

where x1, x2 are ridge lines along first and last high-passfilters. This operation is used for renormalisation of eachsub-band or sub-image. The renormalisation of each sub-band is done as

gQ = T−1Q hQ (5)

Here we obtain gQ as renormalised sub-band image.

2.4 Ridgelet analysis

The last and important step of curvelet transform is Ridgeletanalysis of different renormalised sub-bands. In this step,each sub-band is processed in Ridgelet domain. For fieldL2(R2), the basis elements are defined as rl [28]. Thesebasis elements are wavelet like. The basis elements areorthonormal and localised within a rectangle 22j × 22j/2 ata scale j obeying anisotropic property width ≃ length2. Weobtain curvelet coefficients as inner product of gQ withbasis rl.

a(Ql) = kgQ, rll (6)

Finally, we obtain curvelet coefficients a(Ql) as the lastelement by Ridgelet analysis.

3 Proposed method

3.1 Algorithm for segmentation

The main objective of segmentation is to retrieve object ofinterest in first frame of video sequence. The segmentationis done in curvelet domain. The curvelet decomposition iscomposed of the following steps:

1. Sub-band decomposition of the object into a sequence ofsub-bands.

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251doi: 10.1049/iet-cvi.2011.0023

2. Windowing each sub-band into blocks of appropriate sizedepending on their centre frequency.3. Applying the Ridgelet transform to these blocks.

These steps decompose an n × n original image into sub-bands followed by the spatial partitioning of each sub-band(block) as follows

f (x, y) = cJ (x, y) +∑J

j=1

wj(x, y) (7)

where cJ is a coarse or smooth version of the original image f andwj represents the details of image f at scale 22j. Therefore thedecomposition results into J + 1 sub-bands array of sizen × n. In our method, j ¼ 1 leads to finest scale (highfrequencies). The Ridgelet transform is used to decompose theimage into smoothly overlapping blocks of sidelength bpixels. This partitioning is in such a way that the overlapbetween the two vertically adjacent blocks is a rectangulararray of size b × b/2. The overlap is used to ignore blockingartefacts. An n × n image is decomposed into 2n/b suchblocks in each direction. The number of decomposition levelsby the ‘9-7’ filter at finest pyramid scale is 5, which leads to32 directions. The coefficients in the transform domain arevery sparse and significant coefficients are located aroundedges and in the right directional sub-bands. After findingcurvelet coefficients, strong edges are detected usingcorrelation of multilevel curvelet coefficients and a simplehysteresis-based algorithm is applied for segmentation.

Result of segmentation algorithm described here is shown inFig. 1 for ‘cameraman image’. It is clear that segmentationalgorithm performs well.

3.2 Algorithm for tracking

The tight frame property of curvelet [28, 32]

∑m

|k f , gml|2 = ‖ f ‖2L2(R2) (8)

allow us to shift attention to the coefficient domain. Thusmagnitude and energy of curvelet coefficients remainapproximately invariant by translating the object in differentframes of a video. The proposed tracking algorithm exploitsthis property.

The tracking algorithm searches the object in next frameaccording to its predicted centroid value, which is computedfrom the previous four frames. In all the computations, it has

Fig. 1 Segmentation result of ‘cameraman image’

a Original imageb Segmented image

233

& The Institution of Engineering and Technology 2012

www.ietdl.org

been assumed that the frame rate is adequate and the size of theobject should not change much between adjacent frames.However, our algorithm is capable of tracking an object whosesize changes within a range in various frames. The trackingalgorithm does not require any other parameter except curveletcoefficient. Complete tracking algorithm is given in Fig. 2.

Complexity of the above described algorithm is just thecomplexity of computation of curvelet transform only. In[31, 41], it has been shown that complexity of computationof curvelet transform for N × N image is O(N 2log N).Therefore complexity of the proposed method is alsoO(N 2log N ).

Fig. 2 Algorithm for tracking

234 IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251

& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cvi.2011.0023

www.ietdl.org

4 Experiments and results

In this section, we show the experimental results of theproposed algorithm. We implemented tracking methoddescribed in Section 3.2 and tested on several video clips ofsmall as well as large object sizes. Here we present resultsfor two representative videos – the child video and thesoccer video. The proposed method does not depend onparticular model of object motion like appearance-basedmodel [17]. In addition to this, no manual intervention isneeded in whole process as it is a fully automatic technique.

First we segment the video in first frame according to themethod described in Section 3.1. For the tracking part, oncethe object area is determined in first frame, the trackingalgorithm needs to track the object from frame to frame. Asquare bounding box is created to cover the object withcentroid at (C1, C2) and computed the energy of curveletcoefficients of the square box as tracking method describedin Section 3.2. A high-pass block is assumed around theobject by using the boundaries [top, bottom, left, right] ofthe object from the previous frame in such a way that thebox can move three pixels in each direction. To start withthe top left corner of that bounding box, within the box wehave computed the energy of the curvelet coefficients foreach sub-box whose dimension is equal to the dimension ofbounding box of the object.

In our implementation, we have kept the search lengthequal to three. This facilitates the search window to movethree pixels more in the top, bottom, left and rightdirections from predicted centroid value, that is, next searchwindow can move six more pixels in vertical direction andsix more pixels in horizontal direction. Hence predictedcentroid value remains consistent even when two objectscross each other. If search window is not able to move inthese directions, then centroid will be the same as predictedeven in the presence of another object. If search length iskept less than three, then it would not be sufficient tosearch in other directions. Similarly if search length is keptgreater than three, then search window will move so far in

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251doi: 10.1049/iet-cvi.2011.0023

these directions that it will take much long time, increasingits computational time.

Two case studies of the child video and the soccer video arediscussed here one by one. In both case studies, we haveillustrated and tested the proposed method in comparison tothe Kalman filter-based method, hereafter it is termed asmethod KF [7]; particle filter method, hereafter it is termedas method PF [8]; Bayesian filter method, hereafter it istermed as method BF [9]; template model-based method,hereafter it is termed as method TM [10]; correctedbackground weighted histogram method, hereafter it istermed as method CBWH [13]; joint colour–texturehistogram method, hereafter it is termed as method JCTH[16] and the covariance tracking method, hereafter it istermed as method CT [18].

4.1 Case study 1

In case study 1, experimental results for child video areshown. This video clip contains totally 458 frames of framesize 352 × 288, but we show results only for framenumbers 1–440 with difference of 20 frames, in Fig. 3.

One can observe in this video that this video sequencecontains an abrupt motion of a child object. The objectmoves very fast and the direction of motion changesabruptly. In frame 1, the object is well covered within thebounding box. From frame 20 the object suddenly movesvery fast in an abrupt direction and stops in frame 100.Again from frame 120 the object moves in a backwarddirection and stops suddenly in frame 200. In all theseframes, the object is accurately tracked by the proposedmethod. After frame 200, the object moves very slow andtakes some rare poses like in frame 240, 280, 300, 400 and420. Despite these not very common walking poses, theobject is well tracked. Moreover, from frame 320 to 440,size of object also increases because object gets closer tothe camera position, but the object is correctly tracked bythe proposed method. As camera moves a little, thebackground of the video is not rigidly static as well as not

Fig. 3 Tracking of child video for frame nos. 1 to 440 in steps of 20 frames

a Method KF [7]b Method PF [8]c Method BF [9]d Method TM [10]e Method CBWH [13]f Method JCTH [16]g Method CT [18]h Proposed method

235

& The Institution of Engineering and Technology 2012

www.ietdl.org

Fig. 3 Continued

236 IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251

& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cvi.2011.0023

www.ietdl.org

Fig. 3 Continued

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251 237doi: 10.1049/iet-cvi.2011.0023 & The Institution of Engineering and Technology 2012

www.ietdl.org

Fig. 3 Continued

238 IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251

& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cvi.2011.0023

www.ietdl.org

Fig. 3 Continued

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251 239doi: 10.1049/iet-cvi.2011.0023 & The Institution of Engineering and Technology 2012

www.ietdl.org

Fig. 3 Continued

240 IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251

& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cvi.2011.0023

www.ietdl.org

Fig. 3 Continued

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251 241doi: 10.1049/iet-cvi.2011.0023 & The Institution of Engineering and Technology 2012

www.ietdl.org

Fig. 3 Continued

simple in nature. However, it does not affect the tracking atall. The lighting effect also changes as the object moves,but the proposed tracking is still unaffected by this.

On the other hand when method KF [7], method PF [8] andmethod BF [9] are applied on the child video clip, one caneasily observe that object is well covered in frame 1 and 20.But as the object starts moving in other direction, track lossstarts and bounding ellipse displaces up to some extent inframe 40. In frame 60, the object is again within the area ofbounding ellipse, but this is due to the fact that ellipseremains still and object retraces its path in a circle. Fromframe 80 onwards, the bounding ellipse remains still andnever recovers its track. Although the shape of ellipsechanges in method BF [9] in order to capture object’sposition, but it could not succeed. In method TM [10], theobject is well covered in frame 1 and starts displacing fromeven frame 20 and there is a total track loss in frame 40. Inthis method, the bounding box remains still and could notrecover its track. The extent of track loss is very high inthis method. Also by applying the method CBWH [13], onecan easily observe that there is a clear track loss in theposition of bounding box in frame 20, that is, when objectsuddenly changes its direction of motion. The object iscaptured by this bounding box when it comes inside in itssearch area. Similarly, in method JCTH [16] the boundingbox does not move correctly with target object when itchanges its direction of motion abruptly. There is a loss oftrack in frames 140, 160 and 180 for this method. Thebounding box remains still in this duration and recoverswhen object comes within its area. The method CT [18]again does not track the object correctly. There is anobvious track loss in the position of bounding box in frame40 with changing object direction. Also this method couldnot recover its position up to frame 180 until the object stops.

Hence, by applying the proposed method it can be easilyobserved that object is tracked correctly in each and everyframe and there is no track loss. The proposed method doeswell when the object is moving very fast. Moreover, whenthe direction of motion changes suddenly, the proposedmethod is capable to track the object correctly. In thissituation, even some good tracking techniques like [7–10,13, 16, 18] do not perform well. Also, it is not affected bythe situation, where the size of the object changes, becausethe object is tracked correctly even after this. Most of thetracking techniques take the advantage of simple objectmotion, but in this case study the object is not only moving

242

& The Institution of Engineering and Technology 2012

in abrupt directions, but also takes some extreme poses. Inthis situation the proposed algorithm performs better thanothers. In addition to this, there is no effect of backgroundand lighting conditions on the tracking accuracy of theproposed method.

4.2 Case study 2

In case study 2, we discuss the motion of a player in soccervideo. This video contains totally 329 number of frames ofsize 352 × 288, but we show results only for framenumbers 1–320 with difference of 20 frames. Results forthis video are shown in Fig. 4.

From this video, one can easily observe that this videosequence contains multiple objects. From frame no. 10–60,the target object is well covered within the bounding boxby the proposed method. In frames 70–100 another objectcomes in the area of bounding box partially. But one canobserve that the bounding box captures the target objectfrom frame 110 to 200 correctly. Again in frames 210–280another object crosses the target object and comes fullywithin the area of bounding box. But owing to this, notrack loss occurs and the target object is correctly coveredby the bounding box in frames 290 and 300. Thebackground condition in this video is complex and thecamera is moving, but it does not affect the prediction ofobject. In addition to this, the size of the objects is verysmall in this video but the proposed method does not loseits track due to the size of the objects. This tracking can beapplied on different target objects, which also proves that itis suitable to track distinct target objects.

By applying the other methods satisfactory results couldnot be obtained. The target object is well covered ininitial frame 1. A track loss starts in frame 60 in methodKF [7] and the object gets completely out of boundingellipse in frame 120. After that track loss could not berecovered. Also, the bounding ellipse gets contracted andalmost vanishes because the object is outside the coveragearea. Method PF [8] traces the object up to frame 240,when there is no occlusion. But as the occlusion starts inframe 260, the bounding ellipse completely loses its trackand do not recover it again. In method BF [9], the objectis within the bounding ellipse in frame 1 and startschanging its shape in order to keep its track accurate inframe 20. The ellipse gets bigger and bigger andsucceeded to capture the object up to frame 160. But after

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251doi: 10.1049/iet-cvi.2011.0023

www.ietdl.org

Fig. 4 Tracking of soccer video for frame nos. 1 to 320 in steps of 20 frames

a Method KF [7]b Method PF [8]c Method BF [9]d Method TM [10]e Method CBWH [13]f Method JCTH [16]g Method CT [18]h Proposed method

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251 243doi: 10.1049/iet-cvi.2011.0023 & The Institution of Engineering and Technology 2012

www.ietdl.org

Fig. 4 Continued

244 IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251

& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cvi.2011.0023

www.ietdl.org

Fig. 4 Continued

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251 245doi: 10.1049/iet-cvi.2011.0023 & The Institution of Engineering and Technology 2012

www.ietdl.org

Fig. 4 Continued

246 IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251

& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cvi.2011.0023

www.ietdl.org

Fig. 4 Continued

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251 247doi: 10.1049/iet-cvi.2011.0023 & The Institution of Engineering and Technology 2012

www.ietdl.org

that the object goes outside the ellipse and bounding ellipsecompletely loses its track in opposite direction. By applyingthe method TM [10], one can easily observe that thismethod behaves most inaccurately and completely loses itstrack from frame 40. Up to frame 240, method CBWH[13] tracks the object correctly before occlusion and losesits track completely after occlusion. Method JCTH [16]accurately tracks the object up to frame 80 and startslosing its track from frame 100. After that bounding boxremains still for all the frames and do not moveelsewhere. For this case study, method CT [18] behavesmost appropriately.

Therefore from this video it is clear that the proposedmethod tracks the target object correctly even if anotherobject occludes the target object partially or fully, whereasother methods [7–10, 13, 16] fail in this situation.Background conditions and size of object does notinfluence the tracking accuracy.

4.3 Performance evaluation

Extensive and representative experiments are performed toillustrate and test the proposed method in comparison tomethod KF [7], method PF [8], method BF [9], method TM[10], method CBWH [13], method JCTH [16] and methodCT [18]. Comparisons of the proposed algorithm with othermethods were done in Matlab 7.01 on a PC with an IntelPentium Dual 2-GHz processor. To further evaluate theproposed algorithm and to compare with the other existingtracking algorithms, five objective measures are utilised, forquantitatively evaluating the performance.

4.3.1 Efficiency: The efficiency of methods is computed interms of total time elapsed (in seconds) for the processing ofcertain frames. From Fig. 5a, it is clear that the method KF [7]

248

& The Institution of Engineering and Technology 2012

Fig. 5 Efficiency of different methods

a Efficiency of methods KF [7], PF [8], BF [9], TM [10] and the proposedmethodb Efficiency of methods CBWH [13], JCTH [16], CT [18] and the proposedmethod

Fig. 6 Centroids of object using various methods

a and c Comparison of x and y direction actual centroids with centroids computed through methods KF [7], PF [8], BF [9], TM [10] and proposed methodb and d Comparison of x and y direction actual centroids with centroids computed through methods CBWH [13], JCTH [16], CT [18] and proposed method

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251doi: 10.1049/iet-cvi.2011.0023

www.ietdl.org

takes much high time, whereas the method PF [8] and BF [9]take approximately the same time, which is higher than theproposed method. Although the method TM [10] takes lesstime as compared to other methods, but it is still morecostly in comparison to the proposed method. One canobserve from Fig. 5b that the method CBWH [13] takeshigher and the method CT [18] takes a much higher time incomparison to the proposed method in processing offrames, whereas the method JCTH [16] takes a much lesstime. Also, the proposed method may achieve an averagetracking speed of 25 frames/second using our own Matlabprogram.

4.3.2 Centroids: Figs. 6a and c shows the x and y-direction values of actual centroids and computed centroidsin x and y-directions, respectively, for the methods KF [7],PF [8], BF [9], TM [10] and the proposed method.Similarly Figs. 6b and d show the x and y-direction valuesof actual centroids and computed centroids in x andy-directions, respectively, for the methods CBWH [13],JCTH [16], CT [18] and the proposed technique. Fromfigure, it is clear that the centroids computed through theproposed method are very near to actual values, whereasthe values computed through methods KF [7], PF [8], BF

Fig. 7 Number of frames against Euclidean distance

a Euclidean distance of methods KF [7], PF [8], BF [9], TM [10] and theproposed methodb Euclidean distance of methods CBWH [13], JCTH [16], CT [18] and theproposed method

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251doi: 10.1049/iet-cvi.2011.0023

[9], TM [10], CBWH [13], JCTH [16] and CT [18] vary alarge from actual centroid. From this figure it can also beobserved that the values of centroids obtained through othermethods vary in large amount when object suddenlychanges its direction of track.

4.3.3 Euclidean distance: The Euclidean distancebetween the centroid of tracked object bounding box andactual centroid is computed as follows

ED =��������������������������(xA − xC)2 + (yA − yC)2

√(9)

where (xA, yA) is the actual centroid value and (xC, yC) is thecomputed centroid value. Fig. 7 shows the Euclidean distancefor seven tracking algorithms and the proposed method. Fromfigure it is clear that the proposed method has least Euclideandistance between centroid of tracked bounding box and actualcentroid in comparison to other methods.

4.3.4 Bhattacharyya distance: The Bhattacharyyadistance is used as a measure for separability measure oftracked object region and actual object region. For two

Fig. 8 Number of frames against Bhattacharyya distance

a Bhattacharyya distance of methods KF [7], PF [8], BF [9], TM [10] and theproposed methodb Bhattacharyya distance of methods CBWH [13], JCTH [16], CT [18] andthe proposed method

249

& The Institution of Engineering and Technology 2012

www.ietdl.org

classes of actual and computed values, it is given by

BD = 1

8(meanc − meana)T cova + covc

2

[ ]−1

(meanc − meana)

+ 1

2ln|(cova + covc)/2||cova|1/2|covc|1/2 (10)

where

meana ¼ mean vector for actual object regionmeanc ¼ mean vector for computed object regioncova ¼ covariance matrix for actual object regioncovc ¼ covariance matrix for computed object region

These values are shown in Fig. 8. It can be noticed easilythat the proposed technique show least deviation fromactual object region in comparison to other tracking methods.

4.3.5 Mean square error: The mean square error (MSE)is defined between the parameters of tracked objectbounding box and the actual object over all frames in avideo as

MSE = 1

N

∑N

i=1

[(xAi1 − xC

i1)2 + (yAi1 − yC

i1)2] (11)

where (x Ai1, y A

i1) are actual centroids, (xCi1, yC

i1) are computedcentroids and N is total number of frames in a video.Table 1 shows the MSEs averaged over all frames in videofor methods KF, PF, BF, TM, CBWH, JCTH, CT and theproposed method. From table it is clear that the proposedmethod has least MSE in comparison to other methods.

5 Conclusions

In this paper, we have developed and demonstrated a newalgorithm for tracking of objects in video that exploits newtight frames of curvelet and provides a sparse expansion fortypical images having smooth contours. We use curveletcoefficients for segmentation and tracking purpose. Thecurvelet transform provides near-ideal sparsity ofrepresentation for both smooth objects and objects withedges. It is clear that the proposed method performs well.The proposed algorithm allows user to easily and quicklytrack an object in image or video using curvelet transform.Experimental results indicate that the proposed methodperforms very well as compared to the kernel tracking [7–10], histogram tracking [13, 16] and covariance tracking[18] in terms of speed as well as accuracy both.

Table 1 Resulting mean square error for different tracking

methods

Method MSE

method KF [7] 4.4076e + 003

method PF [8] 5.9405e + 003

method BF [9] 5.8033e + 003

method TM [10] 1.7554e + 003

method CBWH [13] 395.2773

method JCTH [16] 549.5349

method CT [18] 1.3427e + 003

the proposed method 30.7969

250

& The Institution of Engineering and Technology 2012

Unlike the other methods, the proposed method does notrely upon many properties of object such as size, shape,colour, clothing and so on. In all the computations, it hasbeen assumed that the frame rate is adequate and the size ofthe object should not change between adjacent frames.However, the proposed algorithm is capable of tracking anobject whose size changes within a range in various frames.

The main contributions of the proposed method are:

1. We have developed and demonstrated a new algorithmbased on curvelet transform to track objects.2. This algorithm only depends upon curvelet coefficientsand no other parameter is required other than curveletcoefficient.3. It does not depend upon particular models of objectmotion.4. No manual intervention is needed at any step.

The main advantages of the proposed methods aresummarised below:

1. The proposed method can track too small objects.2. It can track objects with extreme poses.3. Objects moving at fast speed and whose direction ofmovement changes abruptly can be tracked efficiently.4. It can track objects with complex background.5. Occlusion problem is efficiently handled by the proposedmethod.

The experimental results demonstrate that the proposedalgorithm can track the moving objects in video clips.Although we use a simple algorithm, one can even developan algorithm to weigh in different tracking methods toachieve more accurate results.

6 Acknowledgment

This work was supported by the Department of Science andTechnology, New Delhi, India under Grant No. SR/FTP/ETA-023/2009.

7 References

1 Sonka, M., Hlavac, V., Boyle, R.: ‘Image processing, analysis andmachine vision’ (Thomson Asia Pvt. Ltd., Singapore, 2001)

2 Utsumi, A., Mori, H., Ohya, J., Yachida, M.: ‘Multiple-human trackingusing multiple cameras’. Proc. IEEE Int. Conf. Automatic Face andGesture Recognition, Nara, Japan, 1998, pp. 498–503

3 Yilmaz, A., Javed, O., Shah, M.: ‘Object tracking: a survey’, ACMComput. Surveys, 2006, 38, (4), pp. 1–45

4 Fukunaga, K., Hostetler, L.: ‘The estimation of the gradient of a densityfunction, with applications in pattern recognition’, IEEE Trans. Inf.Theory, 1975, 21, (1), pp. 32–40

5 Cheng, Y.: ‘Mean shift, node seeking, and clustering’, IEEE Trans. Patt.Anal. Mach. Intell., 1995, 17, (8), pp. 790–799

6 Bradski, G.R., Clara, S., Corporation, I.: ‘Computer vision face trackingfor use in a perceptual user interface’, Int. Technol. J., 1998, 2, (2),pp. 12–21

7 Comaniciu, D., Ramesh, V., Meer, P.: ‘Kernel-based object tracking’,IEEE Trans. Patt. Anal. Mach. Intell., 2003, 25, (5), pp. 564–575

8 Nummiaro, K., Koller-Meier, E., Van Gool, L.J.: ‘An adaptive color-based particle filter’, Image Vis. Comput., 2003, 21, (1), pp. 99–110

9 Zivkovic, Z., Cemgil, A.T., Krose, B.: ‘Approximate Bayesian methodsfor kernel-based object tracking’, Comput. Vis. Image Understand.,2009, 113, (6), pp. 743–749

10 Shen, C., Kim, J., Wang, H.: ‘Generalized kernel-based visual tracking’,IEEE Trans. Circuits Syst. Video Technol., 2010, 20, (1), pp. 119–130

11 Lee, S.H., Kang, M.G.: ‘Motion tracking based on area and level setweighted centroid shifting’, IET Comput. Vis., 2010, 4, (2), pp. 73–84

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251doi: 10.1049/iet-cvi.2011.0023

www.ietdl.org

12 Haritaoglu, I., Flickner, M.: ‘Detection and Tracking of Shoppinggroups in Stores’. IEEE Conf. Computer Vision and PatternRecognition, USA, 2001, pp. 431–438

13 Ning, J., Zhang, L., Zhang, D., Wu, C.: ‘Robust mean shift tracking withcorrected background – weighted histogram’, To appear in IETComputer vision. Available online at: http://www4.comp.polyu.edu.hk/~cslzhang/paper/IET_CV_2010.pdf

14 Heikkıa, M., Pietikainen, M.: ‘A texture-based method for modeling thebackground and detecting moving objects’, IEEE Trans. Patt. Anal.Mach. Intell., 2006, 28, (4), pp. 657–662

15 Nguyen, Q.A., Robles-Kelly, A., Shen, C.: ‘Enhanced kernel-basedtracking for monochromatic and thermographic video’. Proc. IEEE Int.Conf. Video and Signal Based Surveillance, Sydney, Australia, 2006,pp. 28–33

16 Ning, J., Zhang, L., Zhang, D., Wu, C.: ‘Robust object tracking usingjoint color-texture histogram’, Int. J. Patt. Recogn. Artif. Intell., 2009,23, (7), pp. 1245–1263

17 Ramanan, D., Forsyth, D.A., Zisserman, A.: ‘Tracking people bylearning their appearance’, IEEE Trans. Patt. Anal. Mach. Intell.,2007, 29, (1), pp. 65–81

18 Porikli, F., Tuzelq, O., Meer, P.: ‘Covariance tracking using modelupdate based on lie algebra’. Proc. IEEE Conf. Computer Vision andPattern Recognition, USA, 2006, pp. 728–735

19 Elgamel, S.A., Soraghan, J.: ‘Enhanced monopulse tracking radar usingoptimum fractional Fourier transform’, IET Radar, Sonar Navig., 2011,5, (1), pp. 74–82

20 Islam, M.M., Alam, M.S.: ‘Human motion tracking using mean shiftclustering and discrete cosine transform’. Proc. SPIE 6566, 2007,656616, doi:10.1117/12.717921

21 Mansouri, A., Azar, F.T., Aznaveh, A.M.: ‘Face tracking by 3-D dual-tree complex wavelet transform using support vector machine’. NinthInt. Symp. on Signal Processing and Its Applications, Sharjah, 2007,doi:10.1109/ISSPA.2007.4555386

22 Khansari, M., Rabiee, H.R., Asadi, M., Ghanbari, M.: ‘Crowded sceneobject tracking in presence of Gaussian white noise using undecimatedwavelet features’. Int. Symp. on Signal Processing and its Applications,Sharjah, 2007, doi:10.1109/ISSPA.2007.4555609

23 Khansari, M., Rabiee, H.R., Asadi, M., Ghanbari, M.: ‘Object tracking incrowded video scenes based on the undecimated wavelet features andtexture analysis’, EURASIP J. Adv. Signal Process., 2008, 2008, articleid 243534, 18 pages, doi:10.1155/2008/243534

24 Khansari, M., Rabiee, H.R., Asadi, M., Ghanbari, M.: ‘Occlusionhandling for object tracking in crowded video scenes based on theundecimated wavelet features’. IEEE/ACS Int. Conf. ComputerSystems and Applications, Amman, 2007, pp. 692–699

25 Khare, A., Tiwary, U.S.: ‘Daubechies complex wavelet transform basedmoving object tracking’. IEEE Symp. on Computational Intelligence inImage and Signal Processing, Honolulu, HI, 2007, pp. 36–40

IET Comput. Vis., 2012, Vol. 6, Iss. 3, pp. 231–251doi: 10.1049/iet-cvi.2011.0023

26 Candes, E.J., Donoho, D.L.: ‘Ridgelets: a key to higher dimensionalintermittency’, R. Soc. Publ. Source: Phil. Trans.: Math. Phys. Eng.Sci., 1999, 357, (1760), pp. 2495–2509

27 Xiao, L., Wu, H.Z., Wei, Z.H., Bao, Y.: ‘Research and applications of anew computational model of human vision system based on Ridgelettransform’. Proc. Int. Conf. Machine Learning and Cybernetics,Guangzhou, China, 2005, vol. 8, pp. 5170–5175

28 Candes, E.J., Donoho, D.L.: ‘Curvelets – a surprisingly effectivenonadaptive representation for objects with edges’, in Schumaker,L.L. et al. (Ed.): ‘Curves and surfaces’ (Vanderbilt University Press,Nashville, TN, 1999), Available online at: http://www-stat.stanford.edu/~candes/papers/Curvelet-SMStyle.pdf

29 Candes, E.J., Donoho, D.L.: ‘Continuous curvelet transform:I. Resolution of the wavefront set’, Appl. Comput. Harmon. Anal.,2003, 19, pp. 162–197

30 Candes, E.J., Donoho, D.L.: ‘Continuous curvelet transform: II.Discretization and frames’, Appl. Comput. Harmon. Anal., 2003, 19,pp. 198–222

31 Candes, E.J., Demanet, L., Donoho, D.L., Ying, L.: ‘Fast discretecurvelet transform’, Multiscale Model. Simul., 2006, 5, (3),pp. 861–899

32 Starck, J.L., Candes, E.J., Donoho, D.L.: ‘The curvelet transform for imagedenoising’, IEEE Trans. Image Process., 2002, 11, (6), pp. 670–684

33 Binh, N.T., Khare, A.: ‘Multilevel threshold based image denoising incurvelet domain’, J. Comput. Sci. Technol., 2010, 25, (3), pp. 32–640

34 Majumdar, A.: ‘Bangla basic character recognition using digital curvelettransform’, J. Patt. Recogn. Res., 2007, 2, (1), pp. 17–26

35 Nigam, S., Khare, A.: ‘Multifont Oriya character recognition usingcurvelet transform’, in Singh, C., et al. (Eds.): ‘Information Systemsfor Indian Languages, Communications in Computer and InformationScience’, 2011 (Springer, Berlin Heidelberg, 2011, 139, (1)),pp. 150–156

36 Zhang, J., Zhang, Z., Huang, W., Lu, Y., Wang, Y.: ‘Face recognitionbased on curvefaces’. Third Int. Conf. on Natural Computation, 2007,pp. 627–631

37 Lee, Y.C., Chen, C.H.: ‘Face recognition based on digital curvelettransform’. Int. Conf. Intelligent Systems Design and Applications,Kaohsiung, 2008, vol. 3, pp. 341–345

38 Mandal, T., Wu, Q.M.J., Yuan, Y.: ‘Curvelet based face recognition viadimension reduction’, Signal Process., 2009, 89, (12), pp. 2345–2353

39 Binh, N.T., Khare, A.: ‘Object tracking of video sequences in curveletdomain’, Int. J. Image Graph., 2011, 11, (1), pp. 1–20

40 Nigam, S., Khare, A.: ‘Curvelet transform based object tracking’. Proc.IEEE Int. Conf. on Computer and Communication Technology,Allahabad, India, 2010, pp. 230–235

41 Ma, J., Plonka, G.: ‘The curvelet transform – a review ofrecent applications’, IEEE Signal Process. Mag., 2010, 27, (2),pp. 118–133

251

& The Institution of Engineering and Technology 2012