Dynamic control of motion estimation search parameters for low complex H.264/AVC video coding

8
IEEE Transactions on Consumer Electronics, Vol. 52, No. 1, FEBRUARY 2006 Manuscript received January 12, 2006 0098 3063/06/$20.00 © 2006 IEEE 232 Dynamic Control of Motion Estimation Search Parameters for Low Complex H.264 Video Coding S. Saponara, M. Casula, F. Rovati, D. Alfonso, Member, IEEE and L. Fanucci, Member, IEEE Abstract —This paper presents a novel technique to reduce the motion estimation (ME) complexity in H.264/AVC video coding. A low complexity context-aware controller is added to a basic search engine; at coding time the controller extracts from the search engine partial results information on the input signal statistics, using them to dynamically configure the ME search parameters, such as number of reference frames, valid block modes and search area. Unnecessary computations and memory accesses can be avoided, decreasing ME complexity while keeping unaltered coding efficiency for a wide range of applications: bit-rates from tens of kbits/s to tens of Mbits/s and video formats from QCIF to CCIR. The context-aware control can be used with any ME search engine and in the paper is successfully applied to Full Search and fast ME, as EPZS and UMHS, in the JM10 software model of H264/AVC. 1 Index Terms — H.264/AVC, motion estimation, video coding, low complexity algorithms, consumer multimedia. I. INTRODUCTION H.264/AVC is the new video coding standard jointly developed by ISO/IEC and ITU-T. It doubles the coding efficiency of its H.26x/MPEGx ancestors [1-3] representing the enabling technology for high quality video communication over wireline (e.g. xDSL) and wireless (e.g. 3G Mobile phones, WLAN, digital video broadcasting) networks. Like previous schemes, H.264/AVC is based on a hybrid motion compensated and transform coding model. Additional features, particularly in the ME task as the use of multiple reference frames and variable block sizes (16x16-pixel block and its 6 sub partitions down to 4x4 blocks), improve the coding efficiency at the expenses of an increased implementation cost up to one order of magnitude for the encoder [1, 3]. That makes H.264/AVC not suitable for low power electronic devices and/or for real time coding applications. In previous standards only one reference frame and two block modes (16x16 and 8x8) were allowed. Hence, the design of low- complex ME, supporting variable block sizes and multi reference frames, is mandatory for the success of H.264/AVC- based products in a consumer scenario. Several low complexity ME engines have been proposed in the literature to avoid the redundant, massive computations of FS [4-15] by reducing the number of possible matching points for 1 This work was supported by the NEWCOM and PRIMO projects. S. Saponara, M. Casula and L. Fanucci are with the Dipartimento Ingegneria della Informazione, University of Pisa, via Caruso 16, 56122, Pisa, Italy; e-mail: {sergio.saponara, michele.casula, luca.fanucci}@iet.unipi.it. F. Rovati, D. Alfonso are with AST lab, STMicroeletronics, Centro Direzionale Colleoni-Palazzo La Dialettica, 20041, Agrate, Italy; e-mail: {fabrizio.rovati, daniele.alfonso}@st.com. a given search area. Among them EPZS (Enhanced Predictive Zonal Search) [5, 6] and UMHS (Unsymmetrical cross Multi Hexagon Search) [4] have been officially adopted as reference fast ME in the publicly available JM standard C model [16]. For each image block under estimation they basically realizes a predictive search by using motion vector (MV) predictors of neighbouring blocks plus a refining phase on proper patterns (e.g. square, diamond, hexagonal, ...) with early termination search criteria. Their rate-distortion performances are quite close to FS and better than other known fast ME engines. However, as most of known ME techniques, the basic search is simply repeated multiple times when enabling multiple reference frames and/or variable block sizes. Since ME operations increase with the number of blocks and reference frames an unnecessary redundancy is introduced in computations and memory accesses. To cope with the requirements of real time applications and low power devices our proposed technique adds a low complexity context-aware controller to a basic ME engine. At coding time the controller extracts from the search engine partial results information on the input signal statistics using them to dynamically configure the three ME search parameters: number of reference frames, valid block modes and search area for each 16x16 block and its subpartitions. This way unnecessary computations and memory accesses can be avoided, decreasing ME complexity while keeping unaltered coding efficiency for a wide range of applications: bit-rates from tens of kbits/s to tens of Mbits/s and video formats from QCIF to CCIR. The proposed context-aware control can be used in combination with any ME search engine and in the paper it is successfully applied to FS and fast engines, as EPZS and UMHS, in the framework of H.264/AVC. Hereafter Section II presents the H.264/AVC-compliant test environment used for the development of the proposed control system. After a brief review of state-of-art, Section III describes the algorithms for Reference Frames Control, Block Modes Control and Search Area Control and the performance achieved, in terms of coding efficiency and complexity, when applying them to FS, EPZS and UMHS. Section IV shows performance and complexity results when combining altogether the three control algorithms and analyses the robustness of the proposed technique vs. the variation of the quantization parameter (QP). Section V deals with the tuning of the chosen set of thresholds for search parameters configuration. Conclusions are drawn in Section VI. II. H.264/AVC TESTBENCH The software model used to design and test our technique is the JM 10. First we carried out some preliminary simulations

Transcript of Dynamic control of motion estimation search parameters for low complex H.264/AVC video coding

IEEE Transactions on Consumer Electronics, Vol. 52, No. 1, FEBRUARY 2006

Manuscript received January 12, 2006 0098 3063/06/$20.00 © 2006 IEEE

232

Dynamic Control of Motion Estimation Search Parameters for Low Complex H.264 Video Coding

S. Saponara, M. Casula, F. Rovati, D. Alfonso, Member, IEEE and L. Fanucci, Member, IEEE

Abstract —This paper presents a novel technique to

reduce the motion estimation (ME) complexity in H.264/AVC video coding. A low complexity context-aware controller is added to a basic search engine; at coding time the controller extracts from the search engine partial results information on the input signal statistics, using them to dynamically configure the ME search parameters, such as number of reference frames, valid block modes and search area. Unnecessary computations and memory accesses can be avoided, decreasing ME complexity while keeping unaltered coding efficiency for a wide range of applications: bit-rates from tens of kbits/s to tens of Mbits/s and video formats from QCIF to CCIR. The context-aware control can be used with any ME search engine and in the paper is successfully applied to Full Search and fast ME, as EPZS and UMHS, in the JM10 software model of H264/AVC.1

Index Terms — H.264/AVC, motion estimation, video coding, low complexity algorithms, consumer multimedia.

I. INTRODUCTION H.264/AVC is the new video coding standard jointly developed by ISO/IEC and ITU-T. It doubles the coding efficiency of its H.26x/MPEGx ancestors [1-3] representing the enabling technology for high quality video communication over wireline (e.g. xDSL) and wireless (e.g. 3G Mobile phones, WLAN, digital video broadcasting) networks. Like previous schemes, H.264/AVC is based on a hybrid motion compensated and transform coding model. Additional features, particularly in the ME task as the use of multiple reference frames and variable block sizes (16x16-pixel block and its 6 sub partitions down to 4x4 blocks), improve the coding efficiency at the expenses of an increased implementation cost up to one order of magnitude for the encoder [1, 3]. That makes H.264/AVC not suitable for low power electronic devices and/or for real time coding applications. In previous standards only one reference frame and two block modes (16x16 and 8x8) were allowed. Hence, the design of low-complex ME, supporting variable block sizes and multi reference frames, is mandatory for the success of H.264/AVC-based products in a consumer scenario. Several low complexity ME engines have been proposed in the literature to avoid the redundant, massive computations of FS [4-15] by reducing the number of possible matching points for

1 This work was supported by the NEWCOM and PRIMO projects. S. Saponara, M. Casula and L. Fanucci are with the Dipartimento

Ingegneria della Informazione, University of Pisa, via Caruso 16, 56122, Pisa, Italy; e-mail: {sergio.saponara, michele.casula, luca.fanucci}@iet.unipi.it.

F. Rovati, D. Alfonso are with AST lab, STMicroeletronics, Centro Direzionale Colleoni-Palazzo La Dialettica, 20041, Agrate, Italy; e-mail: {fabrizio.rovati, daniele.alfonso}@st.com.

a given search area. Among them EPZS (Enhanced Predictive Zonal Search) [5, 6] and UMHS (Unsymmetrical cross Multi Hexagon Search) [4] have been officially adopted as reference fast ME in the publicly available JM standard C model [16]. For each image block under estimation they basically realizes a predictive search by using motion vector (MV) predictors of neighbouring blocks plus a refining phase on proper patterns (e.g. square, diamond, hexagonal, ...) with early termination search criteria. Their rate-distortion performances are quite close to FS and better than other known fast ME engines. However, as most of known ME techniques, the basic search is simply repeated multiple times when enabling multiple reference frames and/or variable block sizes. Since ME operations increase with the number of blocks and reference frames an unnecessary redundancy is introduced in computations and memory accesses. To cope with the requirements of real time applications and low power devices our proposed technique adds a low complexity context-aware controller to a basic ME engine. At coding time the controller extracts from the search engine partial results information on the input signal statistics using them to dynamically configure the three ME search parameters: number of reference frames, valid block modes and search area for each 16x16 block and its subpartitions. This way unnecessary computations and memory accesses can be avoided, decreasing ME complexity while keeping unaltered coding efficiency for a wide range of applications: bit-rates from tens of kbits/s to tens of Mbits/s and video formats from QCIF to CCIR. The proposed context-aware control can be used in combination with any ME search engine and in the paper it is successfully applied to FS and fast engines, as EPZS and UMHS, in the framework of H.264/AVC. Hereafter Section II presents the H.264/AVC-compliant test environment used for the development of the proposed control system. After a brief review of state-of-art, Section III describes the algorithms for Reference Frames Control, Block Modes Control and Search Area Control and the performance achieved, in terms of coding efficiency and complexity, when applying them to FS, EPZS and UMHS. Section IV shows performance and complexity results when combining altogether the three control algorithms and analyses the robustness of the proposed technique vs. the variation of the quantization parameter (QP). Section V deals with the tuning of the chosen set of thresholds for search parameters configuration. Conclusions are drawn in Section VI.

II. H.264/AVC TESTBENCH The software model used to design and test our technique is the JM 10. First we carried out some preliminary simulations

S. Saponara et al.: Dynamic Control of Motion Estimation Search Parameters for Low Complex H.264 Video Coding 233

using an Athlon XP at 2GHz and 1GB RAM with a configuration of the encoder featuring a max. search displacement of 16, a max. reference frame number of 5, 7 block modes (16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4 pixel partitions), CAVLC entropy coding, 30 Hz frame rate. For ME the three standard techniques FS, EPZS and UMHS were used.

TABLE I ABSOLUTE CODING PERFORMANCE USING EPZS, UMHS AND FS ME

Sequence Parameter EPZS UMHS FS

MET (s) 1.03 1.15 7.4

Bit-rate (kbits/s) 1122.5 1126.8 1137.7

Stefan SIF

PSNR (dB) 34.94 34.92 34.95

MET (s) 4.41 4.69 29.65

Bit-rate (kbits/s) 5600.2 5604.4 5608.2

Mobile CCIR

PSNR (dB) 33.67 33.66 33.68

MET (s) 4.75 5.5 29.65

Bit-rate (kbits/s) 4690.2 4704.4 4711.2

Tennis CCIR

PSNR (dB) 33.73 33.72 33.73

MET (s) 0.35 0.31 2.28

Bit-rate (kbits/s) 147.7 148.1 148

Dancers QCIF

PSNR (dB) 39.47 39.43 39.54

MET (s) 1.3 1.17 8.7

Bit-rate (kbits/s) 398.8 400.9 399.8

Foreman CIF

PSNR (dB) 36.5 36.5 36.51

MET (s) 0.17 0.11 2.28

Bit-rate (kbits/s) 32.5 32.3 32.4

Akiyo QCIF

PSNR (dB) 38.13 38.13 38.13

This phase allowed us to highlights the limits of known FS and fast ME engines and to extract correlation rules between the statistics of the input sequence and the results of the ME task. Such rules have been used to design our adaptive ME technique: within the H.264/AVC JM model the only part of code modified was the ME, whereas a context-aware controller

was added and applied to basic FS, EPZS and UMHS engines. The simulation results with basic EPZS, UMHS and FS are reported in Table I in terms of absolute values for Peak Signal to Noise Ratio (PSNR), bit-rate (BR) as a measure of coding efficiency, and average ME coding time per frame (MET) as a measure of implementation complexity. More than 20 known test sequences with different characteristics, in terms of image format and dynamism, were adopted as input stimuli: for the sake of clarity this paper refers to a representative subset of 6 videos. Reported data have been obtained with a QP of 28, which was also used for the tests in Section III. Discussions on the performance for other QP values are reported in Section IV. Simulation results in Table I are the reference for the ones obtained with the proposed low-complexity technique. In Sections III and IV we apply the context-aware controller to the ME engines, then comparing the coding efficiency and the implementation complexity against the original one. We evaluate the bit-rate deviation for a fixed PSNR as in (1) and the ME time saving as in (2). BRCE and METCE are the bit-rate and the ME processing time when applying a context-aware technique whereas BRCD and METCD are the bit-rate and ME time when doing the basic coding. The fixed PSNR value for each test video is reported in Table I depending on the basic ME engine used: EPZS, UMHS or FS.

1001% ⋅−=∆CD

CE

BR

BRBR (1), 1001% ⋅−=∆

CD

CE

MET

METMET (2)

III. LOW COMPLEX ME IN H.264/AVC The proposed global context-aware controller combines three different algorithms affecting ME search parameters: Reference Frames Control, Block Modes Control and Search Area Control. After a brief review of known algorithms for the configuration of ME search parameters this section describes the behavior of the proposed three controllers and shows their performance when applied alone, i.e. with the other two controls disabled, to EPZS, UMHS and FS basic engines. The three control types can be also used in couple of two or altogether, as reported in Section IV, thus achieving different trades between coding efficiency and speed-up of the ME task.

A. State-of-art review

Algorithms for ME search parameters control have been discussed in the literature [13-15], [17-21] but they have several limits overcome by the proposed technique. Known works are focused on a specific search parameter, e.g. area in [13-15], reference frames in [17-19], block modes in [20, 21]. Their reciprocal influence is not investigated and when combining them together a large set of thresholds has to be tuned and the relevant control complexity adds up. On the contrary the proposed technique optimize all ME search parameters in the same control environment taking care of the reciprocal influence and using the same set of thresholds and the same algorithmic structure thus ensuring robustness vs. thresholds variation and negligible control complexity. Moreover, most of known control algorithms are specifically tailored for a given ME engine, mainly FS, and are optimized

IEEE Transactions on Consumer Electronics, Vol. 52, No. 1, FEBRUARY 2006 234

for QCIF and CIF video formats. The proposed technique overcomes such limits: it can be used in combination with any ME and in the paper it is successfully applied to FS and fast engines, as EPZS and UMHS, using video formats from QCIF to CCIR.

B. Reference Frame Control

This control decides the maximum number of reference frames to be used for the ME of each 16x16 macroblock (MB) and its enabled subpartitions. The best reference frame depends on the dynamism of video input and so it can change from a sequence to another and from a block to another. For each generic MB i-th inside a frame N we want to derive how many reference frames must be searched starting from the first. We propose an iterative strategy that uses the data of the previously encoded 16x16 MB partition (minimum Sum of Absolute Differences cost, SADmin, and relative optimal reference frame, R) to decide how many reference frames are effective for its enabled subpartitions in the current frame and for the same 16x16 MB partition in the next frame. As reported in Fig. 1, the SAD cost of the generic 16x16 MB is compared to some thresholds (THR3, THR4, THR5) to decide how many frames beyond R must be searched for its subpartitions. For example if SADmin < THR3 then R reference frames will be searched, while if THR4 < SADmin < THR5 then R + 2 reference frames will be searched. The used reference frames number is the same for all the 6 partitions.

Fig. 1. Reference frames control: thresholds determine how many reference frames must be used for all subpartitions in the current frame N and for the 16x16 partition in the next frame N+1

The data obtained for the 16x16 partition of the i-th MB in frame N are also used to derive the maximum number of reference frames for the 16x16 partition of the i-th MB in frame N+1 (see Fig. 1). Again the SADmin cost is compared to two thresholds (THR1, THR2) that are set up to detect if the block belongs to a static background; if so (SADmin < THR1) only the first reference frame will be searched; if THR1 < SADmin < THR2 then the first three reference frames will be searched; otherwise all 5 reference frames will be searched. Since THR1 < THR2 < ... < THR5, recognizing a static MB means choosing just one reference frame for each partition. The adopted threshold values and the analysis of the

algorithm in terms of robustness vs. variation of the threshold set are reported in Section V. Table II shows the average number of reference frames per MB (AvgF) when applying the above control rules to EPZS, UMHS and FS, and the complexity reduction achieved vs. the original ME techniques. The latter using always 5 reference frames for all the 7 block size partitions.

TABLE II PERFORMANCE AND COMPLEXITY WHEN APPLYING THE REFERENCE

FRAME CONTROL TO EPZS, UMHS AND FS

Sequence AvgF % C EPZS % BR

FS % BR

UMHS % BR

Stefan SIF 2.6 -48 2.2 1.81 1.08 Mobile CCIR 3.37 -32.6 1.85 1.77 2 Tennis CCIR 2.69 -46.2 1.04 0.64 1.39 Dancers QCIF 1.55 -69 -0.96 -0.31 -0.25 Foreman CIF 2.01 -59.8 0.48 0.7 -0.06 Akiyo QCIF 1.28 -74.4 0.3 -0.43 -0.73

Complexity reduction is calculated as 10015% ⋅−=∆ AvgFC

and is the same when using EPZS, UMHS or FS as basic engine. Indeed their rate-distortion performance is so close that their search leads to similar SADmin costs and optimal reference frames, hence the Reference Frame Control algorithm makes the same choice. The allowed complexity saving in Table II ranges from 32% to 74% depending on the input video. Fast-moving sequences, e.g. Mobile, feature a reference frames reduction lower than static videos, as Akiyo. Indeed static sequences often use only one reference frame whereas fast-moving videos have optimal reference frames beyond the first. Note that similar complexity reduction results have been obtained in terms of ME time saving (% MET). The rate-distortion performance variation vs. the original ME engines is measured as in (1). For EPZS, UMHS and FS it is always lower than 2.2%, whereas it is lower than 1% when averaged on all the considered test sequences.

C. Block Modes Control

H.264/AVC can encode a MB using a 16x16 pixel partition or four 8x8 partitions. The former can be further splitted into two partitions 16x8 or 8x16 whereas each 8x8 partition can be further splitted in smaller block sizes (8x4, 4x8 and 4x4) for a total of 7 possible block modes. Simulations data, collected as described in Section II, prove that the lower is the SAD cost of the 16x16 partition, the lower is the probability to encode a MB using blocks smaller than 8x8. As a consequence, if the SAD cost is lower than a certain threshold the encoder will never find a satisfactory encoding mode with small partitions but it will use those larger than 8x8. The control strategy proposed in this paper always enables the 16x16, 8x16, 16x8 and 8x8 partitions and chooses the others (8x4, 4x8, 4x4) by using proper thresholds to identify useful partitions. As for the Reference Frame Control we start from the 16x16 ME data comparing the SADmin cost to progressive thresholds THR3, THR4 and THR5, used to classify the range of the cost and the corresponding chosen modes (see Fig. 2). The block types simultaneously enabled range from 4 to 7.

S. Saponara et al.: Dynamic Control of Motion Estimation Search Parameters for Low Complex H.264 Video Coding 235

Fig. 2. Selection of additional block modes to basic 16x16 partition

Table III reports the average number of enabled partitions per MB (AvgP) when applying the above control rule to EPZS, UMHS and FS, and the complexity reduction achieved vs. the original ME techniques. The latter using always 5 reference frames for all the 7 block size partitions. Complexity reduction is calculated as 10017% ⋅−=∆ AvgPC and is the same for

EPZS, UMHS or FS. Table III shows that the achievable complexity saving ranges from 27% to 42% whereas the bit-rate deviation for a fixed PSNR value is negligible and varying from -1.6% to 0.83%. Note that the complexity reduction in Table III refers to the % reduction of enabled partitions for which a matching cost has to be evaluated. For EPZS and UMHS similar reduction results are obtained in terms of ME processing time (% MET). This is not the case for FS since its implementation in the JM software exploits partition data reuse calculating the distortion of the smallest block type and then adding up the corresponding subtype distortion to generate larger block types. Hence for FS the Block Modes Control leads to ME time savings, from 6% to 11%, lower than the % C ones.

TABLE III

PERFORMANCE AND COMPLEXITY WHEN APPLYING THE BLOCK MODES CONTROL TO EPZS, UMHS AND FS

Sequence AvgP % C EPZS % BR

FS % BR

UMHS % BR

Stefan SIF 4.96 -29.1 -0.06 -0.03 0.26 Mobile CCIR 4.63 -33.8 0.21 0.45 0.36 Tennis CCIR 5.1 -27.1 0.1 0.18 0.06 Dancers QCIF 4.36 -37.7 0.05 0.7 0.83 Foreman CIF 4.35 -37.8 0.66 0.41 0.01 Akiyo QCIF 4.03 -42.4 -1.6 0.27 0.68

D. Search Area Control

Algorithms that reconfigure the search area according to the input signal statistics have been already successfully proposed by the authors for FS in [13]: the search displacement is configured on the basis of the MV field of neighbouring blocks and is compared to maximum and minimum values to which it can be clipped. We realized the porting to EPZS and UMHS of the technique proposed in [13] by raising the value of the minimum allowed search size. Since results for FS are already discussed in [13], here Table IV shows only the results achieved by applying the search area control to EPZS and UMHS: bit-rate deviation and complexity reduction based on the average displacement assigned (AvgD) vs. the default maximum 16. The obtained complexity saving, from 27% to 64%, is lower than FS one in [13]: the reason is that EPZS and UMHS are fast search engines with an average number of

matching points per area much lower than FS, so the margins to reduce the redundancy of the search are tighter. The bit-rate increase for a fixed PSNR value is negligible varying from 0 to 0.96%.

TABLE IV PERFORMANCE AND COMPLEXITY WHEN APPLYING THE SEARCH AREA

CONTROL TO EPZS AND UMHS

Sequence AvgD % C EPZS % BR

UMHS % BR

Stefan SIF 11.6 -27.5 0.46 0.79 Mobile CCIR 9.7 -39.4 0.41 0.45 Tennis CCIR 8.5 -46.9 0.63 0.59 Dancers QCIF 12.1 -24.4 0 0.96 Foreman CIF 10.6 -33.8 0.49 0.3 Akiyo QCIF 5.6 -65 0.27 0.77

IV. GLOBAL ME PARAMETERS CONTROL The encoding process which uses the three controls altogether is accomplished according to the ME flow in Fig. 3.

Fig. 3. Global ME parameters control flow

At the beginning of each sequence encoding process, the thresholds are set up by following the law detailed in Section V (step 1 in Fig. 3). The first frame is coded in INTRA mode and ME is not used, the second frame is the first frame coded INTER and for the ME of all its MB the basic search engine is used without dynamic control. If the current frame is the third the control can select the search area for all the seven partitions and the reference frame number for the 16x16 partition according to the data coming from the ME task in the previous processed frame (step 2). After that, the search engine performs the 16x16 ME and returns the data concerning its cost and optimal reference frames (step 3). At this point the control system is able to elaborate the valid block modes and their relevant reference frames (step 4). The search engine completes the ME task (step 5) for the current

IEEE Transactions on Consumer Electronics, Vol. 52, No. 1, FEBRUARY 2006 236

MB and then starts with a new MB processing by iterating the same flow from step 2 to step 5. With reference to the simultaneous application of all three controls to UMHS, FS and EPZS, Figs. 4 and 5 show the complexity reduction evaluated in terms of ME saved time (% MET), and the bit-rate deviation for a fixed PSNR value (% BR). For all three ME engines the bit-rate increase is bounded and the overall complexity reduction is very high even for the most complex sequences: indeed, the ME time saving ranges from 70% to 97% for FS and from 60% to 91% for UMHS and EPZS. This corresponds to a saving factor from 3 to 33 for FS and from roughly 2.5 to 11 for EPZS ad UMHS.

0102030405060708090

100

Stefan SIF

MobileCCIR

TennisCCIR

DancersQCIF

ForemanCIF

AkiyoQCIF

% S

aved

ME

Tim

e

.

EPZS UMHS FS

Fig. 4. Saved ME time for global system control on EPZS, UMHS and FS

-1

0

1

2

3

4

Stefan SIF

MobileCCIR

TennisCCIR

DancersQCIF

ForemanCIF

AkiyoQCIF

% B

itra

te in

crea

se

.

EPZS UMHS FS

Fig. 5. Bit-rate increase for global system control on EPZS, UMHS, FS

The control strategy described above and the data reported in Figs. 4 and 5 refer to SAD as distortion cost function. In H.264/AVC it can be enabled a rate-distortion (RD) optimization option which considers as cost function both the distortion and rate of the motion prediction [3, 6]. When applying the proposed global control system to FS, UMHS and EPZS with the RD optimization enabled we obtained similar results to those in Figs. 4 and 5 without RD optimization. As example, Table V shows the complexity reduction (% MET), and the bit-rate deviation for a fixed PSNR value (% BR) when applying the three controls to UMHS with RD optimization. In Table V, similarly to Figs. 4 and 5, the ME time saving ranges from 61% to 91% whereas the bit-rate increase is always below 2%.

TABLE V PERFORMANCE AND COMPLEXITY WHEN APPLYING THE GLOBAL CONTROL

TO UMHS WITH RD OPTIMIZATION ENABLED

Stefan SIF

MobileCCIR

Tennis CCIR

Dancers QCIF

ForemanCIF

AkiyoQCIF

% MET -61.2 -62.66 -62.13 -75.85 -72.03 -91.14

% BR 1.99 1.67 0.74 1.36 0.84 0.44

As showed in Figs. 6 and 7, combining the global control system with a fast ME engine, as UMHS, allows a complexity reduction vs. the basic FS of two orders of magnitude if the maximum search displacement is 16. The average bit-rate deviation is below 1%. Note that with a maximum search displacement of 32 the complexity reduction is up to three orders of magnitude whereas the bit-rate increase is still negligible. Similar results have been obtained repeating the tests with EPZS vs. FS.

0

50

100

150

200

250

StefanSIF

MobileCCIR

TennisCCIR

DancersQCIF

ForemanCIF

AkiyoQCIF

Sp

eed

up

fac

tor

.

Fig. 6. Speed up factor of UMHS with global system control vs. basic FS

-1

-0,5

0

0,5

1

1,5

2

2,5

3

StefanSIF

MobileCCIR

TennisCCIR

DancersQCIF

ForemanCIF

AkiyoQCIF

% B

itra

te in

crea

se

.

Fig. 7. Bit-rate increase of UMHS with global control vs. basic FS

Data in Figs. 4 to 7 and Tables I to V have been obtained with a fixed QP of 28. To assess the robustness of the proposed technique vs. QP variation, Figs. 8 to 11 show, for different QP values, the ME time saving and the bit-rate increase at fixed PSNR obtained by applying all three controls to UMHS (Figs. 8 and 9) and EPZS (Figs. 10 and 11). The complexity reduction in terms of saved ME time is always high, being about 75% on average for each QP value. Bit-rate performances are slightly affected by QP, varying from a 1.5% reduction to a 4% increase. Similar results have been obtained for FS.

S. Saponara et al.: Dynamic Control of Motion Estimation Search Parameters for Low Complex H.264 Video Coding 237

0102030405060708090

100

StefanSIF

MobileCCIR

TennisCCIR

DancersQCIF

ForemanCIF

AkiyoQCIF

% S

aved

ME

Tim

e

.

QP = 7 QP = 20 QP = 28 QP = 36 QP = 49

Fig. 8. ME time saving vs. QP, UMHS with global control

-2-1012345

StefanSIF

MobileCCIR

TennisCCIR

DancersQCIF

ForemanCIF

AkiyoQCIF

% B

itra

te in

crea

se

.

QP = 7 QP = 20 QP = 28 QP = 36 QP = 49

Fig. 9. Bit-rate deviation vs. QP, UMHS with global control

0102030405060708090

100

StefanSIF

MobileCCIR

TennisCCIR

DancersQCIF

ForemanCIF

AkiyoQCIF

% S

aved

ME

Tim

e

.

QP = 7 QP = 20 QP = 28 QP = 36 QP = 49

Fig. 10. ME time saving vs. QP, EPZS with global control

-2

-1

0

1

2

3

4

5

StefanSIF

MobileCCIR

TennisCCIR

DancersQCIF

ForemanCIF

AkiyoQCIF

% B

itra

te in

crea

se

.

QP = 7 QP = 20 QP = 28 QP = 36 QP = 49

Fig. 11. Bit-rate deviation vs. QP, EPZS with global control

Examples of complete rate-distortion curves are shown in Fig. 12 for Dancers QCIF, in a bandwidth range from tens of kbits/s to 1 Mbits/s, and in Fig. 13 for Tennis CCIR, in a bandwidth range from hundreds of kbits/s to 55 Mbits/s. In Figs. 12 and 13 the basic performance of UMHS is plotted with a gray line whereas a black curve shows the performance

when the adaptive control system is enabled. The curves are very close with a negligible difference proving that the proposed control system keeps unaltered coding efficiency for a wide range of applications: bit-rates from tens of kbits/s to tens of Mbits/s and formats from QCIF to CCIR. Again, similar conclusions have been derived repeating the tests with other video sequences and with FS and EPZS ME.

Fig. 12. Rate-distortion curve for Dancers QCIF

Fig. 13. Rate-distortion curve for Tennis CCIR

V. THRESHOLD SET ANALYSIS The selected set of thresholds has been derived from computer simulations as a trade off between algorithmic complexity and coding efficiency. The thresholds depend on the QP value used by the H.264/AVC coder according to the law reported in Fig. 14. As a matter of fact, the higher is the QP value, the higher are the prediction errors and relevant SADmin values: given a certain predicted field of MVs, high QP values lead to a high difference between the original frame and the reconstructed one in the coding loop. This is why we linked the thresholds

IEEE Transactions on Consumer Electronics, Vol. 52, No. 1, FEBRUARY 2006 238

values to the quantization step QP. The above dependencies have been extracted by observing the variation of SADmin cost vs. QP for several test sequences. A similar approach can be also found in literature, e.g. in [12, 13]. For the QP=28 used in previous sections the set of thresholds is THR1=450, THR2=650, THR3=800, THR4=1500, THR5=3200.

0100020003000400050006000700080009000

10000

7 28 49 QP

Th

resh

old

s

.

T1 T2 T3 T4 T5

Fig. 14. Dependency of thresholds on QP

To assess the robustness of the proposed technique vs. threshold setting, Figs. 15 and 16 show algorithm performance in terms of saved ME time (% MET) and bit-rate deviation (% BR) if the values of the thresholds are scaled by a factor K. Fig. 15 refers to the three control techniques applied simultaneously to UMHS while Fig. 16 refers to the same control applied to FS. All thresholds must maintain their comparative relationship (THR1 < THR2 <…< THR5) and hence all of them are scaled by the same multiplicative factor K ranging from 0.5 to 2.

Fig. 15. Trade offs obtained scaling the thresholds by a factor K, UMHS

with global system control at QP=28

In Figs. 15 and 16, K can assume the following values: 0.5, 0.8, 1, 1.1, 1.25, 1.5 and 2. K=1 stands for the current threshold setting, i.e. the one used for the results showed in Sections III and IV. The variation range for K refers to a change up to 100% for the thresholds. Comparing the figures it is clear that the behaviour for UMHS is similar to the FS one.

Similar results have been also obtained for EPZS. The results in Figs. 15 and 16 demonstrate the robustness of the proposed control technique vs. threshold setting and that changing the set of thresholds means adjusting the trade-off between coding efficiency and ME complexity.

Fig. 16. Trade offs obtained scaling the thresholds by a factor K, FS

with global system control at QP = 28

Particularly the simpler is the video to be coded the lower is the impact of threshold variation: e.g. in Fig. 15 when K ranges from 0.5 to 2 the BR and MET of Stefan SIF change from 2% to 6% and from 35% to 75% respectively while for Akiyo QCIF the range of variation is lower, from -1% to 2 % for the BR and from 88% to 96% for the MET. For K = 1, that is to say for the thresholds adopted in this paper, the trade off for most of the curves is in their knee. Indeed for K > 1 in Figs. 15 and 16 there is a high increase of % BR with a weak % MET gain. For K < 1 the % BR value is almost stable while there is a high % MET gain. Such behaviour is followed by those sequences exploiting all thresholds from THR1 to THR5 whereas the impact of threshold setting is minimal for low complexity sequences (e.g. Akiyo) whose ME mainly exploits lower thresholds.

VI. CONCLUSION H.264/AVC is the new video coding standard by ISO/IEC and ITU–T doubling the coding efficiency of its H.26x/MPEGx ancestors. The paper proposes a novel technique to remarkably reduce the complexity of its innovative ME system by configuring at coding time the number of reference frames, valid block modes and search area used to perform the ME task for a given macroblock. The three control types can be used alone, in couple of two or all three together thus achieving different trades off between coding efficiency and speed-up of the ME task. The proposed control system can be applied to any ME search engine and in the paper it is successfully applied to FS and fast engines, EPZS and UMHS, in the framework of JM10 model of H.264/AVC. Unnecessary computations and memory accesses are avoided thus decreasing ME complexity up to one order of magnitude. Combining the proposed controller with a fast ME engine

S. Saponara et al.: Dynamic Control of Motion Estimation Search Parameters for Low Complex H.264 Video Coding 239

allows for a complexity saving up to three orders of magnitude vs. basic FS. The high compression efficiency of H.264/AVC video coder is kept unaltered in a wide bit-rate range, from tens of kbits/s to tens of Mbits/s, for both slow-motion and high dynamic video sequences. The robustness of the proposed technique vs. threshold setting and QP variation is also demonstrated.

ACKNOWLEDGMENT The contribution of Eng. Luca Celetto, STMicroelectronics, AST Agrate (Italy) is gratefully acknowledged.

REFERENCES [1] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira,

T. Stockhammer, T. Wedi, “Video coding with H.264/AVC: tools, performance and complexity”, IEEE Circuits and Systems Magazine, vol. 4, n. 1, pp. 7 – 28, 2004

[2] T. Wiegand, G. Sullivan, G. Bjntegaard, A. Luthra, “Overview of the H.264/AVC video coding standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, n. 7, pp. 560-576, 2003

[3] S. Saponara, K. Denolf, G. Lafruit, C. Blanch, J. Bormans, “Performance and complexity co-evaluation of the Advanced Video Coding standard for cost effective multimedia communications”, Journal Applied Signal Processing, vol. 2004, n. 2, pp. 220-235, 2004

[4] Z. Chen, J. Xu, Y. He, “Efficient fast ME predictions and early-termination strategy based on H.264 statistical characters”, Proc. ICICS – PCM 2003, pp. 213-218, Singapore, December 2003

[5] H.Y. Cheong, A. M. Tourapis, P. Topiwala, “Fast ME in the JM reference software”, 16th meeting of JVT of ISO/IEC MPEG & ITU-T VCEG, document JVT-P026, Poznan, July 2005

[6] H.Y. Cheong, A. Tourapis, “Fast motion estimation within the H.264 codec”, Proc. IEEE ICME'03, pp. 517-520, Baltimore, July 2003

[7] A. Tourapis, O. Au, M. Liou, “Highly efficient predictive zonal algorithms for fast block-matching motion estimation”, IEEE Trans. Circ. and Syst. for Video Tech., vol. 12, n.10, pp.934-947, Oct. 2002

[8] P. Kuhn, Algorithms, complexity analysis and VLSI architectures for MPEG-4 motion estimation, Kluwer Academic Publisher, 1999

[9] D. Alfonso, F. Rovati, D. Pau, L. Celetto, “An innovative, programmable architecture for ultra-low power motion estimation in reduced memory MPEG-4 encoder”, IEEE Trans. Consumer Electronics, vol. 48, n. 3, pp. 702 – 708, Aug. 2002

[10] D. Alfonso, D. Bagni, L. Pezzoni, E. Piccinelli, “Fast motion estimation with detection of scene changes and interlaced/progressive content for H.264/AVC encoding”, Proc. MICV 2005, Madrid, March 2005

[11] P. De Pascalis, L. Pezzoni, G. Mian, D. Bagni, “Fast motion estimation with size based predictors selection hexagon search in H.264/AVC encoding”, Proc. Eusipco2004, Wien, Sept. 2004

[12] C. De Vleeschouwer, T. Nilsson, K. Denolf, J. Bormans, “Algorithmic and architectural co-design of a motion-estimation engine for low-power video devices”, IEEE Trans. Circ. and Syst. for Video Tech., vol. 12, n.12, pp. 1093 – 1105, Dec. 2002

[13] S. Saponara, M. Melani, L. Fanucci, P. Terreni, “Adaptive algorithm for fast motion estimation in H.264/MPEG-4 AVC”, Proc. Eusipco2004, pp. 569 – 572, Wien, Sept. 2004

[14] S. Saponara, L. Fanucci, “Data-adaptive ME algorithm and VLSI architecture design for low-power video systems”, IEE Proceedings - Computers and Digital Techniques, vol. 151, n. 1, pp. 51-59, 2004

[15] W.S. Song, M.-C. Hong, “Adaptive search range decision algorithm for fast motion estimation”, Proc. SPIE Visual Communications and Image Processing 2004, vol. 5308, pp. 1073-1081, 2004

[16] http://iphome.hhi.de/suehring/tml/index.htm [17] Y.– W. Huang et al., “Analysis and reduction of reference frames for

motion estimation in MPEG – 4 AVC/JVT/H.264”, Proc IEEE ICASSP03, pp. 809 -812, April 2003

[18] C.– W. Ting et al., ”Center-Biased frame selection algorithms for fast multi-frame motion estimation in H.264”, IEEE Int. Conf. Neural Networks & Signal Proc., vol. 2, pp. 1258 – 1261, Nanjing, Dec. 2003

[19] A. Chang et al., “A novel approach to fast multi-frame selection for H.264 video coding”, Proc. IEEE ICASSP03, pp.105 – 108, April 2003

[20] X. Jing, L. – P. Chau, “Fast approach for H.264 inter mode decision”, IEE Electronics Letters, vol. 40, n. 17, pp. 1050 – 1052, August 2004

[21] P. Yin, H.-Y. C. Tourapis, A. M. Tourapis, J. Boyce, “Fast mode decision and motion estimation for JVT/H.264”, Proc. IEEE ICIP03, pp. 853-856, Barcelona, Sept. 2003

Sergio Saponara was born in Bari in 1975. He received the electronic engineering degree and the Ph.D. in information engineering, both from Pisa University, in 1999 and 2003. respectively. Since 2001 he collaborates with Consorzio Pisa Ricerche, Italy and in 2002 he was with IMEC, Belgium as Marie Curie research fellow. Currently he is a Research Scientist and Assistant

Professor at the University of Pisa. His research and teaching interests are on electronic circuits and systems for multimedia, telecom and automation. He co-authored more than 40 papers including journals, conferences and patents.

Michele Casula was born in Oristano (Italy) in 1979. He took the Bachelor’s degree and the Master Science in Electronic Engineering at the University of Pisa where he is currently pursuing a Ph.D. in Information Engineering. His research interests include video signal processing, computer graphics and System-on-chip VLSI design.

Daniele Alfonso (M’04) was born in Alghero, Italy, in 1972. In 1998 he graduated in Electrical Engineering at the Turin Polytechnic, and then he joined STMicroelectronics’ Advanced System Technology labs, working on image compression algorithms, jointly with the Italian National Research Council. Later, he focused on moving pictures encoding and transcoding (H.263,

MPEG-2, MPEG-4, H.264/AVC), low-power motion estimation, de-interlacing and frame-rate conversion. His main interests are algorithms and architectures for digital video applications and he holds several patents granted in Europe and U.S.

Fabrizio Rovati received the Electronic Engineering degree from Politecnico of Milan in 1996. He has, since 1995, joined STMicroelectronics (until 1998 at Bristol, UK, currently at Agrate, Italy) working on digital video processing algorithms and architectures for Systems-on-Chip. He is currently with the AST R&D group leading a project for multimedia streaming over packet-based

networks. He has authored 15 British, European and U.S. granted patents and tens of journal/conference papers. He has been contract professor at University of Pavia.

Luca Fanucci (S’95-A’96-M’04) is Associate Professor of Microelectronics at the University of Pisa. He was born in Montecatini, Italy, in 1965. He received the Doctor Engineer degree and the Ph.D. in electronic engineering from the University of Pisa in 1992 and 1996, respectively. From 1992 to 1996, he was with the European Space Agency's Research and Technology

Center, Noordwijk, The Netherlands, and from 1996 to 2004 he was a Research Scientist of the Italian National Research Council in Pisa. His research interests include design technologies for integrated circuits and systems, with emphasis on system-level design, hardware/software co-design and low-power. He co-authored more than 100 journal/conference papers and holds more than 10 patents.