Multiple initial point prediction based search pattern selection for fast motion estimation

Pattern Recognition 42 (2009) 475 -- 486

Contents lists available at ScienceDirect

Pattern Recognition

journal homepage: www.e lsev ier .com/ locate /pr

Multiple initial point prediction based search pattern selection forfastmotion estimation

Humaira Nisar, Tae-Sun Choi∗

Department of Mechatronics, Gwangju Institute of Science and Technology, 261 Cheomdan-Gwagiro, Oryong Dong, Buk Gu, Gwangju 500-712, Republic of Korea

A R T I C L E I N F O A B S T R A C T

Article history:Received 5 July 2007Received in revised form 25 June 2008Accepted 5 August 2008

Keywords:Motion estimationBlock matchingMotion vectorsCorrelationSpatialTemporalVideo coding

A novel, computationally efficient and robust scheme for multiple initial point prediction has been pro-posed in this paper. A combination of spatial and temporal predictors has been used for initial motionvector prediction, determination of magnitude and direction of motion and search pattern selection.Initially three predictors from the spatio-temporal neighboring blocks are selected. If all these predictorspoint to the same quadrant then a simple search pattern based on the direction and magnitude of thepredicted motion vector is selected. However if the predictors belong to different quadrants then we startthe search from multiple initial points to get a clear idea of the location of minimum point. We havealso defined local minimum elimination criteria to avoid being trapped in local minimum. In this casemultiple rood search patterns are selected. The predictive search center is closer to the global minimumand thus decreases the effect of monotonic error surface assumption and its impact on the motion field.Its additional advantage is that it moves the search closer to the global minimum hence increases thecomputation speed. Further computational speed up has been obtained by considering the zero-motionthreshold for no motion blocks. The image quality measured in terms of PSNR also shows good results.

© 2008 Elsevier Ltd. All rights reserved.

1. Introduction

Block matching motion estimation and compensation is an es-sential part of several video coding standards such as MPEG-1/2/4[1,2], ITU-T H.261/263/263+ [3] and the newest video coding stan-dard H.264/AVC [4]. A variety of motion estimation algorithms havebeen developed for video coding. Block matching motion estimationalgorithms are the most popular and simplest in concept, design,and implementation. It allows us to exploit temporal correlation andreduce the redundancy that exists between the frames of video se-quences which leads to higher compression. In block based motionestimation (BBME) the current frame is partitioned into square blocksof pixels and the best match of these blocks is found inside the refer-ence frame using a predefined distortion criterion. The best match isthen used as a predictor for the block in the current frame, whereasthe displacement between the two blocks is usually defined as themotion vector (MV), which is associated with the current block. Inthe encoder it is only necessary to send the MV and the residue blockdefined as the difference between the current block and the predic-tor. This requires fewer bits than the direct coding of the original.

∗ Corresponding author.E-mail address: [email protected] (T.-S. Choi).

0031-3203/$ - see front matter © 2008 Elsevier Ltd. All rights reserved.doi:10.1016/j.patcog.2008.08.010

Full search (FS) is the most straightforward and optimal blockmatching algorithm (BMA) which searches exhaustively inside thesearch window to find the MV. Despite very heavy computationsrequired in FS it is still widely used in video coding applicationsdue to its simplicity and ease of hardware implementation. Howeverits large computational requirements have fuelled many researchactivities, and designing the most efficient BMA remains an openresearch problem. Several fast block matching motion estimationalgorithms [5–21] have been proposed so far. These fast algorithmsinvolve approaches like unimodal error surface assumption (UESA),variable search range instead of fixed one, methods using multi-resolution, spatial and temporal correlation of MVs, pixel decimationetc.

Some well known examples are the three step search (TSS) [6],new three step search (NTSS) [7], four step search (FSS) [9], diamondsearch (DS) [13], simple and efficient search (SES) [18], improvedadaptive rood pattern search (ARPS-2) [22], etc. Experimental resultsshow that these algorithms reduce the computational requirementsignificantly by checking only some points inside the search win-dow, while keeping a good error performance when compared withFS algorithm. However, most of these algorithms get trapped in localminimum, yielding a significant loss in MV estimation performance.DS (utilizes a large and small DS pattern) is an outstanding algo-rithm adopted by MPEG-4 verification model (VM) [14] due to its

http://www.sciencedirect.com/science/journal/pr

http://www.elsevier.com/locate/pr

file:[email protected]

476 H. Nisar, T.-S. Choi / Pattern Recognition 42 (2009) 475 -- 486

Current Block

ROS

Fig. 1. Commonly used regions of support (ROS) for motion prediction (spatialdomain only).

Current block

ROS

Currentframe

(spatial)

Previous frame

(temporal)

Current Block(X)

MVSL

MVSA

MVPPrevious

block

frame n

Fig. 2. Proposed region of support (ROS) in spatial and temporal domain.

superiority to other methods in the class of fixed search patternalgorithms. However fixed search patterns are unable to constantlymatch the dynamic motion content and could incur redundantsearches, thus not only wasting the computational power butalso leading to local minimum matching error trapping and largeprediction errors. ARPS-2 is a novel adaptive rood pattern searchalgorithm. It uses different arm lengths in horizontal and verticaldirections in the initial search stage, with the center point placed onthe predicted position, followed by a unit size rood pattern repeat-edly in the refinement stage until the MV of the current macro blockis found. In low bit rate coding applications such as video telephonythe motion field is very structured and is slowly varying. Moreoverthe MVs are usually limited in magnitude. This suggests that signif-icant computational gains can be achieved by taking advantage ofstrong spatio-temporal dependencies that exist between the MVs.

In this paper we present a predictive motion estimation tech-nique employing both spatial and temporal correlation to find theinitial search center. There is a possibility that if the prediction goeswrong the initial search point could be misleading. To avoid this wehave used the concept of multiple predictors. For example, at thefirst step instead of choosing only one initial search center we willchose multiple initial search centers to start search. The best onehaving the minimum error will be assumed to be closer to the globalminimum. By accurately predicting the location of the best MV can-didate we can search a relatively small area in the neighborhood ofthe predicted MV. This is indeed possible especially in low bit ratevideo applications owing to the large amount of motion field redun-dancies within the same as well as consecutive frames.

In the proposed algorithm, a dynamic search pattern is con-structed for each block based on its spatio-temporal neighboring

-7

0

7

-7

0

7

1000

2000

3000

4000

5000

6000

z

x

y

Flower

2000

4000

6000

8000

10000

12000

14000

16000

z

-70

7

-7 x

y

Football

0

1000

2000

3000

4000

5000

6000

z

-70

7

-7xy

Hall Monitor

0 7

0 7

Fig. 3. Overall characteristics of distortion surfaces for (a) Hall Monitor sequence,(b) Football sequence and (c) Flower garden sequence.

MVs. This pattern (based on the magnitude and direction of thepredicted MV) quickly identifies the best center point for the re-fined search to effectively reduce the unnecessary intermediatesearches and avoid being trapped at local minimum position. Theproposed technique reduces the number of computations and pro-duces a smooth motion field and yields images of better quality.Our motion estimation technique is simple and computationally ef-ficient, and its estimation performance is comparable to FS BMA. Inthe following section we will present the proposed motion estima-tion algorithm. This is followed by the experimental environment,results and discussion on results.

2. Proposed algorithm

Motion estimation is a multi-step process that involves a com-bination of techniques such as motion starting point, motion searchpatterns and adaptive control to curb the search, avoidance of search

H. Nisar, T.-S. Choi / Pattern Recognition 42 (2009) 475 -- 486 477

PS

IV

I

II

PT sameIIIPS

IV

IIII

II

PT

different

Fig. 4. Spatial and temporal predictors: (a) lie in same quadrant and (b) lie indifferent quadrants.

closer to global minimum

D

x

distortion surface

1

2

global minimum

Fig. 5. Multiple initial points selected on distortion surface in one-dimensional space.

7000600050004000300020001000

Z 0-7

0

7

x

7

0

-7

y

Fig. 6. Motion vector distribution for Table Tennis Sequence.

stationary regions and avoidance of local minimum. The collectiveefficiency of these techniques makes a motion estimation algorithmrobust and efficient. The main objective of the proposed algorithmis to decrease the computational burden while keeping a good pre-dicted image quality. Important aspects of the proposed algorithmare (1) using spatio-temporal neighborhood information that leadsto the prediction of initial search center, (2) multiple initial startingpoint selection, (3) adaptive search pattern, and (4) local minimumelimination criteria. All these points help in reducing the computa-tional complexity and finding the true minimum error point. In theproposed algorithm the spatial and temporal correlation is utilizedto adjust the size of the rood shaped search pattern for matching dif-ferent motion magnitudes and directions. This improves the searchspeed as well as accuracy.

Fig. 7. Various search patterns employed in proposed algorithm: (a) large roodpattern and (b) small rood pattern.

Fig. 8. Search pattern for stationary blocks: (a) if SAD < threshold and (b) if SAD >threshold.

1st step

Initial Search Point

2nd stepInitial Search Point 1st step

Fig. 9. Single point prediction for: (a) small motion blocks and (b) medium andlarge motion blocks.

2.1. Prediction

The MV prediction is based on spatio-temporal correlation infor-mation. The spatio-temporal neighboring blocks are often associatedwith the same object and have similar MVs as that of the currentblock. Commonly used regions of support (ROS) for spatio-temporalprediction are shown in Fig. 1. As can be seen normally only spa-tial neighbouring blocks are utilized. However in our proposed al-gorithm we have used neighbouring blocks from the spatial as wellas temporal domain, as shown in Fig. 2. The blocks from the spa-tial domain are the left and top neighbouring blocks (relative to thecurrent block). The left neighbouring block is not always correlatedwith the current block and is unavailable for left margin blocks. Sowe have chosen top block as well to compensate when the left blockis not available. It has been observed that majority of MVs are lo-cated purely in the horizontal and vertical directions, in other wordsoblique MVs are less common [24,25].


Initial search pattern

1st step

2nd step

1st step

2nd step

Initial search patten

Initial search centers

1st step 2nd step

Fig. 10. Multiple point prediction search pattern: (a) case 1 (b) case 2 and (c) Case 3.

In the temporal domain the MV temporal correlation is effectednot only by the moving speed of the moving objects but also by theframe rate. Thus it is less stable than spatial correlation. Further-more, the temporal neighboring vectors are not used in the MV dif-ferential coding. Therefore only one temporal neighbouring block,i.e., block at the same location in the previous (reference) frame isused for prediction. In our preliminary experiments we tested somecommonly used regions of support (ROS) as mentioned in Fig. 1 forour proposed algorithm. The test sequences, Hall Monitor, Claire,Car Phone, Mobile, News, Miss America and Football had been usedfor the experiments. Each pixel in the image sequences is uniformlyquantized to 8bits. The block size selected was 16 × 16 pixels andthe maximum motion displacement of search area was ±7 in bothhorizontal and vertical directions. Sum of absolute difference (SAD)was used as block distortion measure (BDM). The results showedthat the proposed ROS is superior to those composed of other im-mediate neighbouring blocks in terms of average PSNR and searchspeed. Thus we have three initial MVs; two MVs are provided byspatial neighbours and one by temporal neighboring block.

The first step of our algorithm utilizes these neighbouringMVs forpredicting an initial search point which is closer to the global opti-mum. These motion vectors, MVSL(n) (spatial left), MVSA(n) (spatialabove), and MVP(n − 1) (previous) perform as the candidates of thepredicted motion vector P(Xn) for the current block X in frame n. Ifthe predictor accuracy is high the optimal MV (inside a given search

window) can be attained faster thus enabling computational savingsfor fast searches [16]. The commonly used schemes using medianX/Y have its problems since its performance can deteriorate dramat-ically for high/mixed irregular motion sequences. Therefore we willcalculate the predicted MV by using the weighted mean method

P(Xn) =K∑

k=1

(�kMVk) (1)

where � is coefficient of weighted mean and k is for the number ofblocks. The x and y components of the weighted mean predicted MVare computed independently. The MV correlation decreases rapidlyas we move away from the current macro block along the spatialaxis [16]. Thus the fact that “only the vectors corresponding to thecloset blocks should be used” has also been confirmed by our ex-periments. Including poorly correlated MVs in the computation ad-versely affects prediction accuracy. Even with simple predictors likemean and median searching the 4× 4 area around the initial searchcenter would generally produce more than 90% of the MVs obtainedby FS algorithm. This should not be surprising since many of thelow resolution video sequences such as Claire and Miss America ex-hibit very small and slow motion and non-motion related variations.Once the predicted MV is obtained the first step of the algorithm isto move the initial search center to the predicted MV location.

Most error surfaces encountered in real world video sequencesare not truly unimodal as can be seen from Fig. 3. However theoverall characteristics of the distortion surfaces are unimodal. Alsothe neighboring region of the global optimal point (minimum er-ror point) can be considered as a unimodal distortion surface. Ithas been noted that localizing the search origin through appropriatepredictors reduces the probability of getting trapped in local min-imum [12,26,27]. Therefore in our proposed algorithm we are us-ing the idea of multiple predictors acting as multiple initial searchpoints. In case of irregular motion, the chance of locating true MV in-creases by checking multiple points within the search window. Thusour method achieves excellent tradeoff between search speed andprediction error. We have also used the correlated motion of localneighborhood for deciding between different search strategies.

2.2. Multiple initial point prediction

In our algorithm we have used two spatial neighboring blocks(left and above) and one temporal block (same block in the previousframe) for initial point prediction. The two initial point predictorscan be obtained as follows:

1. From spatial frame (weighted mean of MVs of the two spatialneighbors)

PS(Xn) = a∗(MVSL) + a1∗(MVSA) (2)

2. From temporal frame (MV of the reference block)

PT(Xn) = (MVP) (3)

where PS(Xn) and PT(Xn)are spatial and temporal predicted MVs,respectively. MVSL , MVSA, and MVP are the MVs of the spatial left,spatial above and temporal reference blocks, respectively, � and �1are prediction coefficients.

For the case of starting corner block the spatial predictors are notavailable so instead we use zero MV for that, whereas for the leftcolumn the left block is not available and for the top row the aboveblock is not available.

PS(Xn) = 0 for top left block (4)


Table 1Simulation results for Claire, QCIF, 30 fps, and 150 frames

QP BMA PSNR (dB) Bit rate (kbits/s) Time (s) Speedup PSNR gain (dB) Bit rate increase (%)

Total ME Total ME

28 FS 39.86 32.10 21.43 17.95 1.00 1.00 – –TSS 39.81 33.03 10.13 6.60 2.12 2.72 −0.048 2.90FSS 39.82 33.13 9.35 6.04 2.29 2.97 −0.038 3.22DS 39.82 33.87 9.26 5.43 2.32 3.31 −0.037 5.52Proposed 39.84 31.51 6.99 3.54 3.07 5.08 −0.016 −1.81

32 FS 36.90 18.24 19.52 16.18 1.00 1.00 – –TSS 36.84 18.11 10.33 7.22 1.89 2.24 −0.058 −0.71FSS 36.48 18.10 9.52 6.28 2.05 2.58 −0.414 −0.75DS 36.88 18.14 9.35 6.25 2.09 2.59 −0.02 −0.54Proposed 36.87 17.88 6.88 3.22 2.84 5.03 −0.027 −1.93

36 FS 34.48 11.08 17.44 13.65 1.00 1.00 – –TSS 34.46 10.99 10.33 6.92 1.69 1.97 −0.019 −0.78FSS 34.48 11.03 9.69 6.64 1.80 2.06 −0.001 −0.43DS 34.50 11.25 9.46 6.32 1.84 2.16 0.013 1.50Proposed 34.38 10.82 6.66 3.50 2.62 3.90 −0.103 −2.34

40 FS 31.43 7.53 16.73 13.23 1.00 1.00 – –TSS 31.44 7.66 10.79 7.07 1.55 1.87 0.002 1.81FSS 31.38 7.56 9.86 6.66 1.70 1.99 −0.053 0.43DS 31.42 7.54 9.69 6.46 1.73 2.05 −0.015 0.25Proposed 31.43 7.21 6.64 3.29 2.52 4.02 −0.005 −4.21

Table 2Simulation results for Car Phone QCIF, 30 fps, and 300 frames


Total ME Total ME

28 FS 36.55 158.47 68.02 60.48 1.00 1.00 – –TSS 36.43 160.37 21.48 14.07 3.17 4.30 −0.113 1.20FSS 36.44 158.97 19.41 12.07 3.50 5.01 −0.103 0.31DS 36.45 158.88 19.02 11.38 3.58 5.32 −0.093 0.25Proposed 36.44 158.85 16.35 9.32 4.16 6.49 −0.109 0.24


36 FS 30.93 46.52 54.46 47.88 1.00 1.00 – –TSS 30.80 46.96 22.86 14.27 2.38 3.35 −0.129 0.94FSS 30.81 46.32 20.12 13.55 2.71 3.53 −0.115 −0.42DS 30.84 46.68 19.52 12.61 2.79 3.80 −0.09 0.36Proposed 30.81 46.50 15.64 8.93 3.48 5.36 −0.116 −0.05


PS(Xn) = (MVSA) for all left column blocks except top left block (5)

PS(Xn) = (MVSL) for all top row blocks except top left block (6)

The temporal reference blocks are not available for the firstframe.

PT(Xn) = 0 for the first frame (7)

We divide the search space into four quadrants and then see ifboth these vectors lie in same quadrant or not. The angles for thedivision of search space are defined as follows:

Direction I: −45�ang <45Direction II: 45�ang <135

Direction III: 135�ang <225Direction IV: 225�ang <315

It must be noted that the division of search space into four quad-rants has been done taking into account the nature of the video se-quences as they tend to move in the horizontal or vertical directions.So instead of selecting 1st quadrant from 0◦ to 90◦ we have selectedfirst quadrant from −45◦ to 45◦.

2.2.1. Case 1 same quadrantWhen both spatial and temporal predicted MVs lie in the same

quadrant we assume that the dominant motion is in this quadrantand we start our search from this quadrant. This case is shown inFig. 4(a). This seems to be a simple case so we calculated P(Xn)by taking the weighted mean of the two spatial and one temporal


Table 3Simulation results for Mobile, QCIF, 30 fps, and 100 frames


Total ME Total ME

28 FS 33.14 456.35 31.90 29.37 1.00 1.00 – –TSS 33.11 454.66 7.01 4.15 4.55 7.08 −0.027 −0.37FSS 33.14 454.47 6.25 3.69 5.11 7.97 −0.004 −0.41DS 33.14 454.46 6.00 3.50 5.31 8.39 0.001 −0.41Proposed 33.13 454.66 5.33 2.57 5.99 11.43 −0.01 −0.37




Table 4Simulation results for News, QCIF, 30 fps, and 100 frames


Total ME Total ME


32 FS 33.50 46.29 66.61 58.13 1.00 1.00 – –TSS 33.45 47.36 21.47 14.94 3.10 3.89 −0.051 2.31FSS 33.49 46.87 19.58 13.05 3.40 4.45 −0.008 1.27DS 33.47 46.23 19.35 12.16 3.44 4.78 −0.029 −0.12Proposed 33.49 46.26 16.12 9.17 4.13 6.34 −0.007 −0.05

36 FS 30.70 27.94 54.46 47.88 1.00 1.00 – –TSS 30.60 28.08 22.86 14.27 2.38 3.35 −0.095 0.50FSS 30.57 27.92 20.12 13.55 2.71 3.53 −0.13 −0.07DS 30.62 28.09 19.52 12.61 2.79 3.80 −0.075 0.53Proposed 30.67 27.96 15.64 8.93 3.48 5.36 −0.03 0.05

40 FS 28.07 17.17 52.79 45.47 1.00 1.00 – –TSS 27.97 17.31 21.88 14.83 2.41 3.07 −0.093 0.83FSS 28.01 17.58 20.35 12.98 2.59 3.50 −0.053 2.39DS 28.07 17.39 20.02 13.33 2.64 3.41 0.002 1.28Proposed 28.05 17.29 14.97 8.27 3.53 5.50 −0.016 0.72

MV and start the search from this point (one point only). P(Xn) iscalculated as follows

P(Xn) = a2∗(PT(Xn)) + a3∗(PS(Xn)) (8)

P(Xn) = (PT(Xn)) for top left block (9)

P(Xn) = a2∗(PT(Xn)) + a4∗(PSA(Xn))for all left column blocks except top left block (10)

P(Xn) = a2∗(PT(Xn)) + a4∗(PSL(Xn))for all top row blocks except top left block (11)

where PSA(Xn) and PSL(Xn) in Eqs. (10) and (11) stands for thepredicted MV for the spatial above and spatial left blocks, i.e., the

blocks lying above and on the left of the current block in the currentframe. �2, �3 and �4 are the prediction coefficients.

2.2.2. Case 2 (different quadrants)When spatial and temporal predicted MVs lie in different quad-

rants, then we use multiple predictors i.e., two initial predicted MVsand start our search from two separate initial points. This is shownin Fig. 4(b) and is explained as follows

1. Spatial predictors PS(Xn), as defined by Eq. (2), Eqs. (4)–(6).2. Temporal predictor PT(Xn), as defined by Eqs. (3) and (7).

This choice of multiple points decreases the risk of ignoring theactual motion and reduces the chance of being trapped in local min-imum.

2.2.3. Local minimum elimination criteriaFrom the characteristics of distortion surfaces as seen from Fig. 3,

it becomes quite clear that there are a number of local minimums


Table 5Simulation results for Hall Monitor, CIF, 30 fps, and 300 frames


Total ME Total ME


32 FS 35.31 119.69 196.8 169.58 1.00 1.00 – –TSS 35.31 123.06 89.09 60.83 2.21 2.79 0.0 2.82FSS 35.31 121.75 81.95 53.16 2.40 3.19 0.0 1.72DS 35.32 122.04 81.39 52.95 2.42 3.20 0.01 1.96Proposed 35.31 119.88 58.88 32.79 3.34 5.17 0.0 0.16

36 FS 32.77 59.44 172.98 146.36 1.00 1.00 – –TSS 32.81 60.48 90.77 62.82 1.91 2.33 0.04 1.75FSS 32.83 60.17 83.19 54.39 2.08 2.69 0.06 1.23DS 32.79 59.89 80.67 53.09 2.14 2.76 0.02 0.76Proposed 32.78 59.46 58.52 32.41 2.96 4.52 0.01 0.03


Table 6Simulation results for Flower CIF, 30 fps, and 150 frames


Total ME Total ME




40 FS 24.70 206.74 122.27 108.51 1.00 1.00 – –TSS 24.68 221.09 44.61 30.80 2.74 3.52 −0.021 6.95FSS 24.70 212.27 40.36 26.57 3.03 4.08 −0.005 2.67DS 24.72 209.33 39.16 24.71 3.12 4.39 0.012 1.26Proposed 24.68 208.01 31.13 16.72 3.93 6.49 −0.019 0.62

present in addition to the global minimum. So the beauty of thesearch algorithm is that it should get rid of local minimums whilesearching for the global minimum but keeping a low computationalcost. The reason for selecting multiple initial points for predictionis that it can result in increasing a chance of selecting an initialpoint closer to the global minimum rather than the local minimum.This can be seen from Fig. 5 which shows a distortion illustration in1D space. Here D along y-axis stands for distortion that is plottedagainst the distance X along x-axis. Two initial search points 1 and2 are selected in the first step. Point 1 has lower distortion errorso it is considered closer to the global minimum. In the later stepswe will extend fine search around point 1 to reach the global min-imum point. We have defined local minimum elimination criteria(LMEC) to stop the search in case of multiple initial starting points, asfollows:

LMEC = abs(SAD(spatial)-SAD(temporal))/

min(SAD(spatial),SAD(temporal)) (12)

If LMEC has a value higher than a predefined threshold, then wecan safely assume that one of the two starting points is actually theglobal minimum point and stop the search at that point. Otherwisewe will continue searching the minimum distortion point from theminimum of the two multiple points calculated.

2.3. Magnitude of predicted MV/motion content

The magnitude of predicted MV is used to define the motioncontent of the blocks. The blocks are classified into three categoriesbased on the motion content. These are stationary (no motion), smallmotion (P(Xn)�1), and, medium motion (P(Xn)�3) and large mo-tion (P(Xn) >3) blocks.

2.4. Search pattern

The distribution of the global minimum point in real world videosequences is centered at the position of zero motion, at the search


Table 7Average performance comparison with respect to full search

BMA Sequence Speedup PSNR gain Bit rate increase (%)

Total ME

TSS Mobile 3.96 5.74 −0.04 −1.04Claire 1.81 2.2 −0.03 0.805Car phone 2.77 3.65 −0.132 1.58News 2.77 3.65 −0.077 1.47Flower 3.03 4.01 −0.029 4.825Hall monitor 2.12 2.65 −0.023 2.145

Average 2.743 3.650 −0.055 1.631

FSS Mobile 4.4 6.58 −0.025 −1.27Claire 1.96 2.4 −0.127 0.618Car phone 3.05 4.12 −0.115 0.653News 3.07 4.12 −0.072 0.958Flower 3.39 4.7 −0.015 1.67Hall monitor 2.31 2.995 −0.005 1.43

Average 3.027 4.153 −0.060 0.677

DS Mobile 4.67 7.241 −0.016 −1.03Claire 1.995 2.53 −0.015 1.68Car phone 3.11 4.33 −0.113 0.808News 3.11 4.33 −0.039 0.425Flower 3.51 5.02 −0.008 0.385Hall monitor 2.36 3.10 −0.01 1.27

Average 3.126 4.425 −0.034 0.590

Prop Mobile 5.31 9.18 −0.016 −0.87Claire 2.76 4.51 −0.038 −2.57Car phone 3.825 5.92 −0.091 0.323News 3.83 5.92 −0.02 −0.06Flower 4.15 6.73 −0.029 0.17Hall monitor 3.22 4.93 −0.01 −0.19

Average 3.849 6.198 −0.034 −0.533

window center as in TSS, FSS and NTSS etc. Most MVs are found tobe enclosed in a circular support within a radius of 2–3pels centeredat the position of zero motion. The MV distribution for Table Tennissequence is shown in Fig. 6 that supports the above mentioned ap-proach [20]. Using these characteristics only 1–2 steps of the searchpattern will give the final result. Since the refined search center isalready closer to the global minimum point any local search usinga small compact search pattern should be fairly efficient. Becausesearching on a pattern's first step search points is unavoidable andthe minimum necessary computational cost of a search pattern isdirectly related to the number of their first step search points. There-fore the points must be chosen carefully to lower the unavoidablecomputational cost. In our proposed algorithm the search pattern isbased on the motion content of the blocks, which is derived from themagnitude of the predicted MV. Search pattern also depends on sin-gle or multiple point prediction. Types of search pattern employedin the proposed algorithm are shown in Fig. 7.

2.4.1. Stationary blocks (zero-motion blocks)For stationary blocks the initial search center is considered same

as the actual search center. To capture any motion the algorithmtakes the following steps:

1. If SAD (search center) < threshold, then search only one pointand the initial search point is taken as the final MV (which is thezero point) as shown in Fig. 8(a). SAD stands for sum of absolutedifference.

2. If SAD (search center) � threshold then we search five points,the search center and four neighbouring points on horizontal andvertical axis at a step size of one, and then stop the search. Stepsize is defined as the horizontal/vertical distance between twopixels. This is shown in Fig. 8(b).

2.4.2. Motion blocks (single point prediction)In this case we assume that the block has a regular pattern. Search

is started from the initial search point that has been defined by Eq.(4). Here again we encounter two types of cases:

1. For case of smallmotion blockswe use a small rood search pattern.2. For case of medium and large motion blocks again we observe

the SAD of the search center.

• If SAD (search center) < threshold, then search a small rood pat-tern around the search center. This is shown in Fig. 9(a).

• If SAD (search center) � threshold then in the first step we willfollow a large rood search pattern, which is followed by a smallrood search pattern. This is shown in Fig. 9(b).

2.4.3. Motion blocks (multiple point prediction)In case of multiple point prediction the PS(Xn) and PT(Xn) lie

in different quadrants. In this case there are two starting points forsearch as defined by Eqs. (2) and (3). Multiple point prediction isfurther divided into three cases on the basis of distance between twostarting points.

2.4.3.1. Case I: (Xs − Xt)�1 & (Ys − Yt)�1 Where Xs and Ys are xand y components of PS(Xn), and, Xt and Yt are x and y componentsof PT(Xn). The search procedure as shown in Fig. 10(a) is as follows:

1. First only two initial points will be checked, the one with mini-mum error will be selected.

2. If LMEC >0.75, stop the search, as it is assumed to be the globalminimum point.


Claire QCIF 30 Hz

31

33

35

37

39

41

5Bit Rate (kbps)

PSN

R (

dB)

Car Phone QCIF 30 Hz

28

30

32

34

36

38

1257525Bit Rate (kps)

PSN

R (

dB)

Mobile QCIF 30 Hz

22

24

26

28

30

32

34

0Bit Rate (kbps)

PSN

R (

dB)

News QCIF 30 Hz

27

29

31

33

35

37

0Bit Rate (kbps)

PSN

R (

dB)

Hall Monitor CIF 30 Hz

30

32

34

36

38

0Bit Rate (kbps)

PSN

R (

dB)

FSTSSFSSDS

Proposed

FS

TSS

FSS

DS

Proposed

FSTSSFSSDSProposed

FSTSSFSSDSProposed

FSTSSFSSDSProposed

FSTSSFSSDSProposed

Flower CIF 30 Hz

24

26

28

30

32

34

36

150Bit Rate (kbps)

PSN

R (

dB)

50 100 150 200 250 350 550 750 950 1150 1350 1550

25 50 75 100100 200 300 400 500

15 25 35 175

Fig. 11. Comparison of rate–distortion curves for different block matching algorithms: (a) Claire, (b) Car Phone, (c) Mobile, (d) News, (e) Hall Monitor, (f) Flower.

Table 8Simulation results for Hall Monitor, at 10 and 30 fps


Total ME Total ME

28 FS 37.70 84.94 254.07 222.76 1.00 1.00 – –Proposed 37.69 84.75 59.47 33.58 4.27 6.63 −0.01 −0.22

32 FS 35.31 39.39 215.25 185.09 1.00 1.00 – –Proposed 35.31 39.56 58.67 32.75 3.67 5.65 0.0 0.43

36 FS 32.77 19.81 190.48 160.96 1.00 1.00 – –Proposed 32.78 19.98 58.31 31.65 3.27 5.08 0.01 0.86

40 FS 30.27 11.32 161.13 134.62 1.00 1.00 – –Proposed 30.23 11.48 57.48 32.24 2.80 4.18 −0.04 1.41

Proposed algorithm at 10 fps 3.50 5.39 −0.01 0.62Proposed algorithm at 30 fps (from Tables 5 and 7) 3.22 4.93 −0.01 −0.19

Table 9Simulation results for News, at 10 and 30 fps


Total ME Total ME

28 FS 36.69 25.78 18.08 16.08 1.00 1.00 – –Proposed 36.67 25.53 4.83 2.50 3.74 6.44 −0.02 −0.97

32 FS 33.50 15.43 18.21 15.02 1.00 1.00 – –Proposed 33.49 15.42 4.80 2.64 3.80 5.70 −0.01 −0.065

36 FS 30.70 9.31 17.27 13.79 1.00 1.00 – –Proposed 30.67 9.32 4.77 2.77 3.62 4.98 −0.03 0.107

40 FS 28.07 5.72 16.23 12.83 1.00 1.00 – –Proposed 28.05 5.80 4.78 2.62 3.40 4.90 −0.02 1.40

Proposed algorithm at 10 fps 3.64 5.50 −0.2 0.12Proposed algorithm at 30 fps (from Tables 4 and 7) 3.83 5.92 −0.02 −0.06


Performance for Hall MonitorCIF Sequence

30

32

34

36

38

0

Bit Rate (kbps)

PSN

R (

dB)

FS 30fps

Proposed 30fps

FS 10fps

Proposed 10fps

Performance for News QCIF Sequence

27

29

31

33

35

37

0Bit Rate (kbps)

PSN

R (

dB)

FS 30 fps

Proposed 30 fps

FS 10 fps

Proposed 10 fps

20 40 60 80 50 100 150 200 250

Fig. 12. Comparison of rate–distortion curves at different frame rates (30 and 10 fps) for full search and proposed algorithm: (a) News, (b) Hall Monitor.

3. Otherwise follow a small rood pattern, if minimum location is atthe center, stop the search.

4. Else search another small rood pattern.

2.4.3.2. Case II: (Xs− Xt)�3 & (Ys− Yt)�3 The search procedure asshown in Fig. 10(b) is as follows:

1. First only two small rood patterns will be checked at the twoinitial points and minimum error point is selected.

2. If LMEC >0.75, stop the search, as it is assumed to be the globalminimum point.

3. Otherwise follow a large rood pattern, if minimum location is atthe center, stop the search.


2.4.3.3. Case III: (Xs− Xt) >3 or (Ys− Yt) >3 The search procedure asshown in Fig. 10(c) is as follows:

1. First two large rood patterns will be checked at the two initialpoints and minimum error point is selected.

2. If LMEC >0.75 and minimum location is at the center, stop thesearch, as it is assumed to be the global minimum point.

3. Otherwise follow a small rood pattern, if minimum location is atthe center stop the search.


3. Experimental results

The proposed algorithm is implemented in JM-12.2 [23] ofH.264/AVC reference software. In simulation we compare FS, TSS,FSS, DS, and the proposed algorithm in terms of computations(search speed measured by total encoding time and ME time) andperformance (PSNR and bit rate). The simulation is carried out atfour different quantization parameters (QP = 28, 32, 36, 40) to testthe algorithm at different bit rates.

For encoding purposes JM-12.2 main encoder profile has beenused. For each test sequence only the first frame has been coded asI frame and the remaining frames are coded as P frames. Only onereference frame has been used. Each pixel in the image sequencesis uniformly quantized to 8bits. Sum of absolute difference (SAD)distortion function is used as the block distortion measure (BDM).MVs with integer pel accuracy are used to evaluate the performanceof all BMAs and the search range used is 16. Image formats used areQCIF and CIF, and sequences are tested at 30 fps (frames per second)and 10 fps. The simulation platform in our experiments is a PC withintel pentium IV 2.66GHz CPU.

The test sequences used are partly acquired from recommendedSimulation Common Conditions [28]. Other sequences used for ourexperiments and analysis were obtained from the Video Trace Re-search Group at Arizona State University [29]. The test sequences

used are Claire, Car Phone, Mobile, News, Hall Monitor and Flower.These sequences have been selected to emphasize different kinds ofmotions and contents such as low (Claire) to high amount of move-ment (Car phone), camera zooming and camera panningmotion (mo-bile), content with complicated texture (mobile) and highly com-plex Flower sequence. Hall Monitor represents a “security camera”case. The camera shows a corridor with two people entering, walk-ing through it and eventually leaving it again. Claire is a typical videoconferencing sequence, with movement restricted to the face areaof the speaker with a fixed background. Mobile sequence has lot ofmotion in the background. It is considered a complex sequence sincethe objects move at different speeds in different directions and thebackground also moves. Car Phone is also considered a complex se-quence, as you can observe medium movement in foreground andfast movement in background.

During simulation a fixed threshold equal to 512 has been used.Both � and �1 are 0.5. �2, �3 and �4 are chosen to be 0.35, 0.65 and0.65, respectively. LMEC is chosen to be 0.75. A larger value of LMEChas been chosen to ensure that the minimum error point is in factthe global minimum point.

Tables 1–6 present the performance comparison of the proposedalgorithm with other BMAs. Two different measures are used to cal-culate the computational efficiency of our algorithm. These are totalspeedup and ME speedup. The two performance quality measuresare PSNR gain and bit rate increase. These are computed as follows:

Speedup (Total) = Total Time (FS)/Total Time (algorithm) (13)

Speedup (ME) = ME Time (FS)/ME Time (algorithm) (14)

PSNR Gain = PSNR (algorithm) − PSNR (FS) (15)

Bit rate increase = {(Bit rate (algorithm)

− Bit rate (FS))/Bit rate (FS)}∗100 (16)

Negative value of PSNR gain means decreasing PSNR and positivevalue means increasing PSNR. Similarly negative bit rate increasemeans that the bit rate has actually decreased.

Tables 1–6 present that the proposed algorithmhas greater adapt-ability for the sequences of different kinds. All test sequences achievehigh speed up with little degradation in PSNR and negligible in-crease for bit rate. Table 7 provides the overall average performancecomparison for fast search algorithms. The proposed algorithm hasachieved an overall average encoder speedup of 3.85 times and MEspeedup of 6.2 times, its bit rate is slightly decreased by 0.53% whileit has a minimal loss in the encoding quality of 0.034dB. DS has anoverall speedup of 3.13 times and ME speed up of 4.43 times while itshows a minimal loss in the encoding quality of 0.034dB and 0.59%bit rate. The proposed algorithm has an overall speedup of 1.23 timeswith respect to DS and ME speedup is 1.4 times, whereas the dif-ference in PSNR is same for both algorithms and our proposed algo-rithm has slightly better bit rate. Fig. 11 shows the rate distortioncomparison of proposed and other fast BMAs at 30 fps. It is clear that


Claire (QP = 28)

39.4

39.6

39.8

40

40.2

1Frame No.

PSN

R (

dB)

FS TSS FSS

DS Proposed

Car Phone (QP=28)

36.4

36.6

36.8

37

37.2

37.4

PSN

R (

dB)

FS TSS FSSDS Proposed

Mobile (QP = 28)

32.80

33.00

33.20

33.40

33.60

Frame No.

PSN

R (

dB)

FS TSS FSS DS Proposed

News (QP=28)

36

36.2

36.4

36.6

36.8

37

37.2

PSN

R (

dB)


Hall Monitor (QP = 28)

37.5

37.6

37.7

37.8

37.9

Frame No.

PSN

R (

dB)


Flower (QP=28)

34.5

34.7

34.9

35.1

35.3

PSN

R (

dB)


11 21 31 41 51 61 71 81 91 1Frame No.

11 21 31 41 51 61 71 81 91

1Frame No.

11 21 31 41 51 61 71 81 91

1Frame No.

11 21 31 41 51 61 71 81 91

1 11 21 31 41 51 61 71 81 91

1 11 21 31 41 51 61 71 81 91

Fig. 13. PSNR verses frame number comparison for different block matching algorithms at QP=28: (a) Claire, (b) Car Phone, (c) Mobile, (d) News, (e) Hall Monitor, (f) Flower.

the performance of the proposed algorithm is very close to that ofthe FS algorithm.

Tables 8–9 gives the performance comparison of proposed algo-rithm with FS at 30 and 10 fps. The results show a small difference(less than 1% on the average) in bit rate whereas PSNR remains same.Fig. 12 shows the rate distortion comparison of proposed and FS al-gorithm at different frame rates. Since bit rate directly depends onframe rate so for the same video sequence, when frame rate de-creases, bit rate also decreases. However, when other coding condi-tions are same, the PSNR normally remains same for different framerate settings.

Fig. 13 shows PSNR versus frame number comparison for the first100 frames of different video sequences with the proposed and otherfast BMAs at QP=28. From the results it is obvious that the PSNR ofproposed algorithm is quite close to FS and DS algorithms. Althoughall algorithms yield reliable performance, it's quite clear that theproposed algorithm shows definitely better behavior in terms ofvisual quality.

4. Conclusion

In this paper, a fast motion estimation algorithm has been pro-posed. The algorithm takes advantage of the spatio-temporal cor-relation information and utilizes multiple initial point predictionto accurately predict the search center and avoid being trapped

in local minimum. A local minimum elimination criterion has alsobeen defined, which helps to distinguish between local and globalminimums. Hence the initial search point selected in this manner iscloser to the global minimum that helps to speed up the matchingprocess. The final search pattern selected is adaptive and dependson the magnitude and direction of the predicted motion vector andthe motion content of the block. Experimental results show that theproposed algorithm has good predicted image quality measured interms of PSNR with less computational complexity as compared toother fast search algorithms.

Acknowledgment

This work was supported by the Korea Science and EngineeringFoundation (KOSEF) Grant funded by the Korean government (MOST)(No. R01-2007-000-20227-0).

References

[1] D. LeGall, MPEG: a video compression standard for multimedia, Commun. ACM34 (4) (1991) 47–58.

[2] ISO/IEC JTC// SC29/WG11 moving picture experts group, MPEG2 test model 4,1993.

[3] CCITT SG XV, Recommendation H.261 video codec for audiovisual services atp*64kbits/sec, Technical Report COMXVR37-E, August 1990.

[4] T. Wiegand, G. Sullivan, A. Luthra, Draft ITU-T recommendation and final draftinternational standard of joint video specification (ITU-T Rec. H.264|ISO/IEC14496-10 AVC), 2003.


[5] H. Nisar, T.-S. Choi, Fast motion estimation algorithm based on spatio-temporalcorrelation and direction of motion vectors, Electron. Lett. 42 (24) (2006)1384–1385.

[6] J.R. Jain, A.K. Jain, Displacement measurement and its application in interframeimage coding, IEEE Trans. Commun. COM-29 (12) (1981) 1799–1808.

[7] R. Li, B. Zeng, M.L. Liou, A new three step search algorithm for block motionestimation, IEEE Trans. Circuits Syst. Video Technol. 4 (4) (1994) 438–442.

[8] K.H.K. Chow, M.L. Liou, Generic motion search algorithm for video compression,IEEE Trans. Circuits Syst. Video Technol. 3 (1993) 148–157.

[9] L.M. Po, W.C. Ma, A novel four-step search algorithm for fast block motionestimation, IEEE Trans. Circuits Syst. Video Technol. 6 (3) (1996) 313–317.

[10] T. Koga, K. Iinuma, A. Hirano, Y. Iijima, T. Ishiguro, Motion compensatedinterframe coding for video conferencing, Proc. Nat. Telecommun. Conf. 1 (1981)G5.3.1–G5.3.5.

[11] L.M. Po, W.C. Ma, New center biased search algorithm for block motionestimation, Proc. Int. Conf. Image Process. 1 (1995) 410–413.

[12] K.R. Namuduri, Motion estimation using spatio-temporal contextualinformation, IEEE Trans. Circuits Syst. Video Technol. 14 (8) (2004) 1111–1115.

[13] S. Zhu, K.K. Ma, A new diamond search algorithm for fast block matchingmotion estimation, IEEE Trans. Image Process. 9 (2) (2000) 287–290.

[14] MPEG-4 video verification model (Version 14.0), ISO/IEC JTC1/SC29/WG11N2932, October 1999.

[15] H. Nisar, T.-S. Choi, An adaptive block motion estimation algorithm based onspatio temporal correlation, Digest of Technical Papers, International Conferenceon Consumer Electronics, January 2006, 393–394.

[16] Y. Lee, F. Kossentini, M. Smith, R. Ward, Predictive RD constrained motionestimation for very low bit rate video coding, IEEE J. Sel. Areas Commun. 15(1997) 1752–1763.

[17] D. Turga, T. Chen, Estimation and mode decision for spatially correlated motionsequences, IEEE Trans. CSVT 11 (10) (2001) 1098–1107.

[18] J. Lu, M.L. Liou, A simple and efficient search algorithm for block matchingmotion estimation, IEEE Trans. Circuits Syst. Video Technol. 7 (2) (1997)429–433.

[19] P.C. Chung, C.-L. Huang, E.L. Chen, A region-based selective optical flow back-projection for genuine motion vector estimation, Pattern Recognition 40 (3)(2007) 1066–1077.

[20] B.-G. Kim, S.-K. Song, P.-S. Mah, Enhanced block motion estimation based ondistortion-directional search patterns, Pattern Recognition Lett. 27 (12) (2006)1325–1335.

[21] K.-L. Chung, L.-C. Chang, A novel two-phase Hilbert-scan-based search algorithmfor block motion estimation using CTF data structure, Pattern Recognition 37(7) (2004) 1451–1458.

[22] K.-K. Ma, G. Qui, An improved adaptive rood pattern search for fast block-matching motion estimation in JVT/H.26L, Proc. IEEE Int. Symp. Circuits Syst.2 (2003) 708–711.

[23] Joint video team reference software, Version 12.2 (JM12.2)〈http://iphome.hhi.de/suehring/tml/download/〉.

[24] C.-H. Cheung, L.-M. Po, A novel cross-diamond search algorithm for fast blockmotion estimation, IEEE Trans. Circuits Syst. Video Technol. 12 (12) (2002)1168–1177.

[25] C.-S. Yu, S.-C. Tai, Adaptive double-layered initial search pattern for fast motionestimation, IEEE Trans. Multimedia 8 (6) (2006) 1109–1116.

[26] M. Gallant, G. Cote, F. Kossentini, An efficient computation constrained blockbased motion estimation algorithm for low bit rate video coding, IEEE Trans.Image Process. 8 (12) (1999) 1816–1823.

[27] J.-B. Xu, L.-M. Po, C.-K. Cheung, Adaptive motion tracking block matchingalgorithms for video coding, IEEE Trans. Circuits Syst. Video Technol. 9 (7)(1999) 1025–1029.

[28] G. Sullivan, Recommended simulation common conditions for H.26L codingefficiency experiments on low-resolution progressive-scan source material, ITUSG16 Doc.VCEG-N81, 2001.

[29] Video Trace Research Group 〈http://trace.eas.asu.edu/yuv/index.html〉.

About the Author—HUMAIRA NISAR received the B.E. (Honours) in Electrical Engineering from University of Engineering and Technology, Lahore, Pakistan in 1993. Shereceived M.S. in Nuclear Engineering from Quaid-e-Azam University, Islamabad, Pakistan in 1995. Also she received M.S. degree in Mechatronics from Gwangju Instituteof Science and Technology, Gwangju, Korea in 2000. Currently she is doing Ph.D. in Signal and Image Processing Laboratory in School of Information and Mechatronics atGwangju Institute of Science and Technology, Republic of Korea. Her research interests include image processing, motion estimation, video compression and statistical signalprocessing.

About the Author—TAE-SUN-CHOI (S′88–M′93–SM′99) received the B.S. degree in Electrical Engineering from the Seoul National University, Seoul, Korea, in 1976 and theM.S. degree in Electrical Engineering from the Korea Advanced Institute of Science and Technology, Seoul, Korea, in 1979, and the Ph.D. degree in Electrical Engineering fromthe State University of New York at Stony Brook, in 1993. He is currently a Professor in the Department of Mechatronics at Gwangju Institute of Science and Technology,Gwangju, Korea. His research interests include image processing, machine/robot vision, and visual communications.

http://iphome.hhi.de/suehring/tml/download/

http://trace.eas.asu.edu/yuv/index.html

Multiple initial point prediction based search pattern selection for fast motion estimation

Documents

Transcript of Multiple initial point prediction based search pattern selection for fast motion estimation