Motion and disparity field estimation using rate-distortion optimization

Motion and Disparity Field Estimation using

Rate-Distortion Optimization �

Dimitrios Tzovaras and Michael G. Strintzis, Senior Member, IEEE

Electrical and Computer Engineering DepartmentInformation Processing LaboratoryAristotle University of Thessaloniki

Thessaloniki 54006, Greecephone: (+30-31) 996-359, fax: (+30-31) 996-398

e-mail : [email protected]

Abstract

A rate-distortion framework is used to de�ne a displacement vector-�eld es-

timation technique for use in video coding. This technique achieves maximum

reconstructed image quality under the constraint of a target bitrate for the coding

of the vector sequence. The technique may be adapted so as to limit its smoothing

e�ect to homogeneous areas and avoid highly textured areas and edges. Use of this

technique is evaluated for two application areas in which the need for high com-

pression of displacement vector �elds is particularly acute. The �rst is motion-�eld

coding for very low bit rate image sequence transmission as in videophone applica-

tions. The second application area is coding for the transmission of dense disparity

�elds. This is needed for the generation at the receiver of intermediate viewpoints

through spatial interpolation. It is also needed in a number of other applications

requiring accurate depth knowledge, including 3D medical data transmission and

transmission of scenes to be postprocessed using depth-keyed segmentation. Ex-

perimental results illustrating the performance of the proposed technique in these

application areas are presented and evaluated.

Subject terms: Very low bit-rate coding; rate-distortion theory; vector �eldcoding; depth map coding.

�This work was supported by the EU CEC Project PANORAMA (ACTS project 092) and the COST211ter project.

i

List of Figures

1 Average MSE versus average bitrate (in bits=vector) for the coding of the

�rst 50 frames of \Claire" using motion compensation. : : : : : : : : : : 15

2 Average MSE versus average bitrate (in bits=vector) for the coding of the

�rst 25 frames of \Tunnel" using disparity compensation. : : : : : : : : : 15

3 MSE versus bitrate (in bits=vector) for the block-based coding of the �fth

frame of \Miss America". : : : : : : : : : : : : : : : : : : : : : : : : : : : 16

4 MSE versus bitrate (in bits=vector) for the block-based coding of the third

frame of \Claire". : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16

5 MSE versus bitrate (in bits=vector) for the block-based coding of the sec-

ond frame of \Sergio". : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17

6 MSE versus bitrate (in bits=vector) for the coding of the dense disparity

�eld corresponding to the second frame of \Tunnel". : : : : : : : : : : : 17

7 Adaptive versus non-adaptive versions of the proposed RDOVFE algo-

rithm in terms of MSE versus bitrate (in bits=vector) for the block-based

coding of the �fth frame of \Miss America". : : : : : : : : : : : : : : : : 18

8 (a) Original frame 1 of \Miss America". (b) Original frame 5 of \Miss

America". (c) Original motion vector �eld estimated with the block match-

ing algorithm. (d) Motion vector �eld estimated with the rate-distortion

algorithm at 1:58bits=vector. (e) The output of the edge extractor (homo-

geneous areas marked white, edges are marked grey and highly textured

areas marked black). (f) Computed motion vector using the adaptive

smoothing algorithm. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19

9 (a) Original frame 1 of \Claire". (b) Original frame 3 of \Claire". (c)

Original motion vector �eld estimated with the block matching algorithm.

(d) Reconstructed frame 3 of \Claire" using the computed vector �eld

at 1:1 bits=vector. (e) The output of the edge extractor. (f) Computed

motion vector �eld using the adaptive smoothing algorithm. : : : : : : : 20

10 (a) Original left channel image \Sergio" (frame 2). (b) Original right

channel image \Sergio" (frame 2). (c) Block-based estimate of disparity.

(d) The output of the edge extractor. (e) Pixel-based estimate of depth.

(f) Intermediate image generated using the computed depth. : : : : : : 21

ii

11 (a) Original left channel image \Tunnel" (frame 2). (b) Original right

channel image \Tunnel" (frame 2). (c) Block-based estimate of disparity.

(d) The output of the edge extractor. (e) Pixel-based estimate of depth.

(f) Intermediate image generated using the computed depth. : : : : : : 22

iii

I INTRODUCTION

The transmission of full motion video through limited capacity channels is critically de-

pendent on the ability of the compression schemes to achieve target bit rates while still

maintaining acceptable visual quality [1]. In order to achieve this, motion estimation

and motion compensated prediction are frequently used, so as to reduce temporal redun-

dancies in image sequences [2]. Similarly in the coding of stereo and multiview images,

prediction may be based on disparity compensation [3] or the best of motion and disparity

compensation [4].

While much attention has been devoted to the coding of the intraframe and pre-

diction error images, the displacement vector �elds are usually coded losslessly using

DPCM/Hu�man coding resulting in limited compression. The reason for this is that

digital video coding systems for many applications have at their disposal rates ranging

from 1 Mbit=sec to 25 Mbits=sec. At such rates, only a minor part of the global rate is

devoted to the transmission of the vector �elds, hence the bitrate overhead produced by

lossless encoding of the vector �elds is negligible. In many emerging application areas

however, lossy compression of the vector �elds is often highly desirable, and sometimes

unavoidable. For example, mobile videophone or multimedia transmission channels are

often limited to capacities of 4.8 - 64 kbps. In such cases, it is clearly desirable to reduce

as much as possible the bitrate needed to transmit the motion vector �elds, provided

that this reduction does not produce intolerable distortion in the reconstructed image. It

is also desirable to allocate the bitrate devoted to the coding of motion �elds adaptively,

depending on the complexity of the sequence and also on the overall bitrate availability

when the latter varies with time.

High compression of vector �elds is also needed in many emerging applications requir-

ing the transmission of disparity �elds for stereo image communication. Sparse disparity

�elds are used to predict one image of a stereoscopic pair from another, within a coding

scheme using disparity compensated prediction [3, 4] or joint motion and disparity com-

pensated prediction [5]. The need for compression is particularly acute when dense depth

or disparity �elds must be transmitted in order to permit multiview image generation

at the receiver, through spatial interpolation [4]. With a multiview display, this allows

the observer to watch the scene from varying optical angles. In other applications, the

1

generation of intermediate images is needed even with simple monoscopic displays at

the receiver. For example, simulated eye-contact is known to enhance the \telepresence"

which is desirable in advanced videoconferencing schemes [1]. Further, dense dispar-

ity estimation and transmission is necessary in many other applications, permitting for

example distance-to-the-camera keyed segmentation for background/foreground mixing,

and quality control with depth models.

Lossy or lossless techniques for the transmission of such dense disparity �elds were

investigated in [6, 7]. Lossless transmission was estimated to require bitrates as high

as 1Mbps for 720 � 576 pixel sized image sequences [6]. Such requirements exceed the

capacity of many practical communication channels.

Reduction of the bit rate needed for the coding of either motion or disparity vector

�elds may be achieved by optimal and adaptive bit allocation to already determined vec-

tor �elds [11, 12, 13] or by appropriate smoothing [8]. Such smoothing is often necessary

quite independently of data compression purposes. For example, in disparity estimation

of object-based coding of stereo sequences [3, 4, 5, 9, 10] improved results are obtained

by a disparity estimation procedure based on dynamic programming with a smoothing

constraint. In general, however, care must be taken that smoothing of vector �elds for

compression purposes, does not impair the e�ciency of the resulting displacement com-

pensated image prediction. Thus, a compromise must be found between minimizing the

entropy of the displacement vectors and minimizing the displaced frame di�erence be-

tween temporally or spatially adjacent frames. An elegant framework for the de�nition

of such a strategy is provided by the classical rate-distortion constrained minimization

procedure. This has been recently used in many coding applications including bit allo-

cation for vector quantization [14], wavelet packet image coding [15] and quadtree still

image coding [16].

The present paper extending the preliminary results reported in [10], investigates the

use of this methodology for block-based motion/disparity �eld estimation under the con-

straint of a target bitrate for the coding of the vector information. The entropy of the

displacement vectors is used as a measure of the bit rate needed for their lossless trans-

mission. An adaptive variant of the vector �eld estimation is speci�cally applied, which

limits the e�ects of the smoothing algorithm to the homogeneous areas only, avoiding

highly textured areas and edges.

2

Experimental results are given for the coding of the typical videophone QCIF image

sequences \Miss America" and \Claire" and the stereoscopic image sequences \Sergio"

and \Tunnel".

The paper is organized as follows. In Section II the general vector �eld estimation

problem is de�ned in terms of a rate-distortion minimization approach. Section III de-

scribes the application of the algorithm for the purposes of motion, disparity and depth

�eld estimation. Section IV describes an adaptive version of the general estimation al-

gorithm which avoids smoothing of the vector �elds in areas such as edges or object

boundaries. Experimental results given in Section V illustrate the performance of the

proposed methods. Finally, conclusions are drawn in Section VI.

II THE GENERAL VECTOR FIELD ESTIMA-

TION METHOD

An optimal in the rate-distortion sense vector �eld estimation technique is developed

in this section. The vector �eld estimator is going to be used for motion or disparity

estimation between two subsequent in time frames or the left and right camera images,

respectively for the two speci�c applications examined in this paper.

Let vi = (v(i)x ; v(i)y ) 2 V be the vector corresponding to the block i of the image,

where V is the set of all possible displacement vectors determined by the search area

S of the block matching algorithm. The general vector �eld estimation algorithm aims

to minimize the distortion D of the reconstructed image sequence, under a constraint

Rbudget on the rate for the transmission of the vector �eld information. This corresponds

to the following constrained optimization problem :

minvi2V

NbXi=1

Di(vi) ; (1)

subject toNbXi=1

�Ri(vi) � Rbudget ; (2)

where Nb is the total number of blocks in the image, Di(vi) is the contribution of vi to

the distortion function and �Ri(vi) is the contribution of the vector vi to the total rate

or cost of the transmission of the motion vectors.

3

The methodology in [17] permits the transformation of the above into an uncon-

strained optimization problem. In fact, as shown in [17] (the proof is also contained in

[14]), the solution fv?i (�); i = 1; : : : ; Nbg of the problem of unconstrained minimization

of

J(�) =NbXi=1

Ji(vi(�)) =NbXi=1

Di(vi(�)) + �NbXi=1

�Ri(vi(�)) ; (3)

is also a solution of (1) if

Rbudget =NbXi=1

�Ri(v?i (�)) : (4)

The problem therefore, reduces to ensuring that (4) has a solution for vi(�) and deter-

mining this solution. This was investigated from a general viewpoint in [14], where it was

shown that �Ri(v(�)) and Di(v(�)) are monotonic functions of the Langrange multiplier

�, which may be interpreted as a quality index, with values ranging from 0 (highest rate,

lowest distortion) to 1 (lowest rate, highest distortion). Further investigation in [15]

proved that the solution of (4) may be obtained using any fast convex algorithm such as

the bisection algorithm [18]. One such algorithm, which gave very good results in both

[15] and [16], is also adopted in the present paper. This algorithm proceeds as follows.

First two values �l < �u of � are found so that

NbXi=1

�Ri(vi(�u)) � Rbudget �NbXi=1

�Ri(vi(�l)) : (5)

For the coding of a sequence of frames, these values are chosen to be �l = 0, �u = 1

for the initial frame and �l = 0:8��, �u = 1:2�� for subsequent frames, where �� is

the solution of (4) for the previous frame. The bracketing interval is then successively

decreased in size by the following procedure :

� Step 1 For each block i, i = 1; : : : ; Nb, compute Di(vi(�l)) and Di(vi(�u)) and

the corresponding �Ri(vi(�l)) and �Ri(vi(�u)).

� Step 2 Set

�new =

��PNb

i=1[Di(vi(�l))�Di(vi(�u))]PNbi=1[�Ri(vi(�l))��Ri((�u))]

��+ � ; (6)

where � is a vanishingly small positive number.

� Step 3 For each i, determine iteratively the displacement vectors vi(�new) so as

to minimize for i = 1; 2; : : : ; Nb

Di(vi(�new)) + �new�Ri(vi(�new)) : (7)

4

Then compute the corresponding f�Ri(vi(�new))gi and fDi(vi(�new))gi.

� Step 4 IfPNb

i=1�Ri(vi(�new)) =PNb

i=1�Ri(vi(�u)), then stop, � = �u .

Else ifPNb

i=1�Ri(vi(�new)) > Rbudget, �l �new . Go to step 2.

Else �u �new . Goto step 2.

Obviously therefore, if the above algorithm converges, a solution of (4) will have been

obtained and the equivalent problem of the constrained minimization of (1) will have

been solved.

Note that the distortion corresponding to each motion vector in a speci�c search area

is computed only once, at the �rst iteration of the algorithm. Thus the computational

load of the algorithm consists of updating the entropy of the vector �eld and �nding the

minimum J(�).

III APPLICATION OF THE VECTOR FIELD ES-

TIMATION ALGORITHMTOMOTION, DIS-

PARITY AND DEPTH ESTIMATION

The speci�c way the vector �eld a�ects the quality of the reconstructed image will de-

termine the distortion index Di(vi(�)). A number of such distortion measures have been

proposed in the literature. In the case of block-based motion estimation, the simplest

and most commonly used is the temporally displaced frame di�erence

Di(vi) =bxXk=0

byXl=0

��imt(m+ k; n+ l)� imt�1(m+ k + v(i)x ; n+ l+ v(i)y )�� ; (8)

where (m;n) are the upper left hand corner coordinates of block i, imt(), and imt�1() is

the image at time instant t and t� 1, respectively and bx; by are the dimensions of the

block.

The corresponding distortion index for block-based disparity estimation is the spa-

tially displaced frame di�erence

Di(vi) =bxXk=0

byXl=0

��iml(m+ k; n+ l)� imr(m+ k + v(i)x ; n+ l + v(i)y )�� ; (9)

5

where again (m;n) are the upper left hand corner coordinates of block i, iml(t), and

imr(t) are the left and right images respectively and bx; by are the dimensions of the

block.

If depth is evaluated from disparity using methods such as given in [5] or from motion

using the method in [23], a pixel-wise distortion index may be used such as

Di(vi) =bxX

k=�bx

byXl=�by

��iml(m+ k; n+ l)� imr(m+ k + v(i)x ; n+ l + v(i)y )�� ; (10)

where now (m;n) are the coordinates of the working pixel, bx; by are the dimensions of a

rectangular window centered at (m;n) and v(i)x ; v(i)y are the components of the disparity

vector computed from depth information.

Alternately, if the depth �eld is modeled by a wireframe [5], only the node information

is transmitted, hence an appropriate distortion measure would be

Di(vi) =X

(m;n) 2 block i

(z(m;n)� z(m;n))2 ; (11)

where z(m;n) is the wireframe modeled depth and z(m;n) is the depth computed from

the disparity �eld vi.

Also the transmission cost R(vi(�)) will depend on the speci�c method used for the

coding of the vector �elds. Assume �rst that entropy coding (e.g. Hu�man or arithmetic

coding) is used, with an adaptive probability model. In this case, the transmission cost

or rate is the entropy

R = �m1X

x=�m1

m2Xy=�m2

pxylog2(pxy) ; (12)

where m1; m2 are the maximum allowed x- and y- components respectively, of the dis-

placement vectors, and

pxy =1

Nb

NbXk=1

dxy(vk) ; (13)

where

dxy(vk) =

(1 if (v(k)x = x) and (v(k)y = y)0 otherwise

: (14)

The above rate R may be rewritten in the form (3)

R =NbXi=0

�Ri(vi) ; (15)

6

with the de�nition for i > 1

�Ri(vi) = �m1X

x=�m1

m2Xy=�m2

p(i)xy log2(p(i)xy) +

m1Xx=�m1

m2Xy=�m2

p(i�1)xy log2(p(i�1)xy ) ; (16)

and

�R1(v1) = �m1X

x=�m1

m2Xy=�m2

p(1)xy log2(p(1)xy ) ; (17)

with

p(i)xy =1

i

iXk=1

dxy(vk) : (18)

The algorithm in Section II may be applied using (16) for the determination of the

contribution of vi to the rate increase. Alternately, since the second term of the right

part of (16) is independent of vi, step 3 of the algorithm may be equivalently carried out

by minimizing for each i the sum

D?(vi(�new)) + �newR?(vi(�new)) ; (19)

where D?(vi) is the total distortion of the �rst i blocks of the image and is given by

D?(vi) =iX

k=1

Dk(vk(�new)) ; (20)

and R?(vi) is equal to the �rst term in (16), thus to the total entropy of the �rst i blocks

of the image :

R?(vi) = �m1X

x=�m1

m2Xy=�m2

p(i)xy log2(p(i)xy) ; (21)

where p(i)xy is given by (18).

Note also that (18) is equivalent to the following e�cient formula for the incremental

computation of pxy(vi(�)) :

p(i+1)xy =i

i+ 1p(i)xy + dxy(vi+1(�)) : (22)

Alternately, DPCM coding of the vector �eld may be assumed, followed by appropriate

entropy coding. In this case the coding rate is expressed by

R = �2m1X

x=�2m1

2m2Xy=�2m2

qxylog2(qxy) ; (23)

where qxy is the probability that the di�erence between the vector minimizing the index

J(�) and the vector corresponding to the previous block, dv = vi�vi�1 satis�es dvx = x

7

and dvy = y. The probability qxy is computed using equation (18) or (22) with vk

replaced by dvk.

A more computationally e�cient approach, which does not involve incremental com-

putation of the probability density of the vector �eld or the �rst order vector �eld dif-

ferences, is to assume a model for this probability density function. Speci�cally, the

assumption of Gauss-Markov Random Field to describe both motion [19] and disparity

[20] vector di�erences could be used so as to accelerate the rate-distortion minimization

procedure.

IV ADAPTIVE VECTOR FIELD ESTIMATION

In the general procedure outlined in the preceding section, the mechanism for the mini-

mization of the rate constraint index, is based on limiting the entropy, which as expected,

and as shown by the experimental results, is attained by smoothing of the displacement

vector �elds. Uniform smoothing of motion or disparity vector �elds may have very un-

desirable e�ects, and in fact, may result to severe loss of motion or depth information,

especially in the neighborhood of edges, and thus to an increased reconstruction error.

An adaptive application of the proposed algorithm is therefore necessary so as to limit

the smoothing only to the homogeneous areas and avoid highly textured areas or edges.

In this way, the discontinuities of both motion and disparity [21], are preserved and the

reliable vector �elds corresponding in highly textured areas are excluded from smoothing.

Furthermore, the use of edge information reduces the well known \corona e�ect" which

is one of the drawbacks of block-based matching methods.

Adaptive vector �eld estimation may be accomplished by the minimization of the

following index :

J(�) =NbXi=1

Di(vi(�)) + �NbXi=1

E(vi(�))�Ri(vi(�)) ; (24)

where E(vi(�)) is a bitmap de�ned as :

E(vi(�)) =

(0 if vi is reliable1 otherwise

: (25)

The motion vector vi corresponding to block i may be considered reliable whenever the

block contains edges or highly textured areas. For the detection of edges and textured

8

areas a variant of the technique in [22] was used, based on the observation that highly

textured areas exhibit a high local intensity variance in all directions while on edges the

intensity variance is higher across the direction of the edge.

Accordingly, for each pixel in the image, two parameters were computed. The �rst

parameter, �2, is the unbiased estimate of the local variance. If the variance exceeds a

threshold �2c , the pixel is considered to be a part of either an edge or a highly textured

area. The second parameter was used to di�erentiate highly textured areas from edges.

For this purpose the local variances in eight directions (multiples of �=4), were computed

and the second parameter p was de�ned by

p =maxi �2imini �2i

: (26)

Since p is close to 1 in areas with uniform texture, and much higher in edge areas, pixels

were assigned to edges whenever p exceeded a threshold pc. Finally, blocks containing

edges or consisting in large part (over 50%) of highly textured area were assumed to

produce reliable vi.

However, with the adaptive version of the algorithm, there is an increased probability

that no solution of the constrained optimization problem (3,4) will exist. In this case it

will be impossible to determine an initial �l satisfying (5) and the algorithm (5)-(7) will

be inapplicable. If this happens, the restriction on smoothing may be dropped for the

highly textured areas and retained only for blocks containing edge information. If this

still fails to produce a rate equal to Rbudget according to (4), all restrictions on smoothing

are lifted and the adaptive version of the algorithm reverts to the non-adaptive one.

Both the non-adaptive (3) and the adaptive (24) versions of the vector �eld estimation

techniques were experimentally applied for the estimation of motion, disparity and depth

�elds, using appropriate distortion measures to evaluate the quality of the reconstructed

images.

V EXPERIMENTAL RESULTS

In order to evaluate the performance of the proposed approach for the low-bitrate coding

of videophone and videoconferencing image sequences, the proposed Rate Distortion

Optimization Vector Field Estimation (RDOVFE) algorithm was applied �rst to the

9

typical QCIF sequences \Claire" and \Miss America". Further, the disparity estimation

algorithm was tested in the stereo image sequences \Sergio" of dimension 256� 256 and

\Tunnel" 1 of dimension 360 � 288.

First, the RDOVFE algorithm was sequentially applied for the coding of the �rst 50

frames of \Claire" using motion compensation. On a R4400 Silicon Graphics machine it

required an execution time of about 20 seconds compared to 4 seconds for the exhaustive

block-matching search algorithm. The average bitrate versus the average mean square

error (MSE) is shown in Figure 1.

As seen, the proposed technique may achieve very considerable bitrate savings with

very modest corresponding increases in the mean-square error. For example, Figure 1

indicates that coding of the �rst 50 frames of the \Claire" sequence may be achieved

with 1:1 bits=vector (0:01718 bits=pixel) and an average mean square error of 2:64. This

contrasts with 3:23 bits=vector (0:05046 bits=pixel) required for lossless coding with a

mean square error of 2:53. That is, a bitrate decrease by a factor of 2:93 is achieved at the

cost of mean-square error increase of the order of 4% with no visible image deterioration.

Entropy coding, without DPCM, was used to compress the vector �elds.

Next, full search block matching motion estimation was performed between the �rst

and �fth frame of \Miss America" (see Figures 8a and 8b) and the �rst and the third

frame of \Claire" (see Figures 9a and 9b), in order to evaluate the quality of the computed

vector �elds. A block size of 8�8 pixels and a search area of �8; : : : ; 8 pixels for both x�

and y� components of the motion vector �eld was used. The rate distortion minimization

algorithm was run for various values of Rbudget and was seen to converge in fewer than 6

iterations. Figure 3 and 4 show the MSE versus the bitrate (in bits=vector) needed for

the coding of the motion vectors corresponding to the �fth frame of \Miss America" and

the third frame of \Claire", respectively. The value of � corresponding to each of the

operating points of the algorithm is also shown.

The original vector �eld computed using an exhaustive search block matching algo-

rithm is shown in Figures 8c and 9c for the two sequences, respectively. The vector �eld

coded at 1:58 bits=vector for \Miss America" computed using the proposed algorithm,

1The sequence \Tunnel" which is a MPEG-4 stereo test sequence, was prepared by the CentreD'�Etudes de T�el�edi�usion et T�el�ecommunications (CCETT) for use in the DISTIMA RACE and thePANORAMA ACTS projects.

10

is shown in Figures 8d. The motion compensated estimate of the third frame of \Claire"

corresponding to motion �eld coded at 1:1bits=vector is shown in Fig. 9d.

Edge extraction was subsequently performed using the technique proposed in Section

IV with �2c = 40 and pc = 10 with results shown in Figures 8e and 9e. In these �gures

homogeneous areas are shown in white, edges in grey and highly textured areas shown in

black. The adaptive version of the algorithm described in section IV was then applied to

the two test sequences and produced the vector �elds shown in Figures 8f and 9f. In this

case, the adaptive smoothing technique further reduced the bitrate (at 1.202 bits/vector)

by eliminating the scattered motion vectors corresponding to the background.

The performances of the adaptive and non-adaptive versions of the RDOVFE algo-

rithm were also compared. Figure 7 shows the rate-distortion curve for the adaptive

and non-adaptive versions in application for the coding of \Miss America". In curve

RDOVFE-(T+E) of Figure 7 the no-smoothing constraint of the adaptive version of the

proposed algorithm is used in both highly textured areas and blocks containing su�cient

edge information. In curve RDOVFE-T of Figure 7 the no-smoothing constraint is re-

laxed only in the highly textured areas in order to obtain lower bit rates. If again, the

Rbudget constraint is not satis�ed, the no-smoothing constraint is relaxed for all blocks in

the image, and the adaptive version of algorithm reduces to the non-adaptive one (curve

RDOVFE in Figure 7).

The displacement vector �eld estimation algorithm was then applied for the coding

of the �rst 25 frames of \Tunnel" using pixel-based disparity compensation. Figure 2

shows the average MSE versus the average bitrate for the coding of depth information of

the �rst 25 frames of the sequence \Tunnel'. Accurate depth information (with negligible

increase in MSE) can be extracted at 2:61 bits=pixel.

The original left and right second frames of \Sergio" and \Tunnel" are shown in

Figures 10a and 10b and 11a and 11b. Block-based disparity estimation with a block size

of 8 � 8 was performed �rst. The computed x-component of the block-based estimated

disparity �eld is shown in Fig. 10c and 11c. Figure 5 shows the MSE versus the bitrate

(in bits=vector) needed for the coding of the block-based disparity �eld corresponding to

the second frame of \Sergio".

The dense disparity �eld was estimated using a window of 9 � 9 pixels and a search

area of �16; : : : ; 16 and �1; : : : ; 1 pixels for the x- and y- components of the disparity

11

�eld, respectively. Figure 6 shows the MSE versus the bitrate needed for the coding of

the depth map corresponding to the second frame of \Tunnel".

The output of the edge extraction procedure applied to the right channel second

frames of \Sergio" and \Tunnel" is shown in Figure 10d and 11d respectively. The

computed depth maps using the adaptive algorithm are then illustrated in Figures 10e

and 11e. The depth map is shown quantized into 256 levels and has the same resolution

with the original image (since it is computed by a dense disparity �eld). Brighter areas

represent those closer to the cameras. As seen, the smoothing properties of the depth

estimation method results in realistic depth-map estimates.

Spatial interpolation was also performed using the technique in [6] for the generation

of the intermediate images of \Tunnel" and \Sergio" using the information of the left

camera image and the computed dense disparity �eld. The resulting intermediate images

are shown in Figures 10f and 11f.

VI CONCLUSIONS

A rate-distortion framework was used to de�ne a vector �eld estimation technique which

achieves maximum reconstructed image quality under the constraint of a target bitrate

for the coding of the vector sequence. An adaptive version of this technique limits the

smoothing e�ect to homogeneous areas and avoids highly textured areas and edges. Ap-

plication of this technique were investigated for the estimation of motion vectors in very-

low bitrate image sequence coding. In this case, the proposed algorithm can be combined

with an appropriate rate control strategy to optimize the coding of the motion vectors

corresponding to all frames of an image sequence. The technique was also evaluated in

application for the estimation and coding of dense disparity vector �elds. The latter are

needed to enable a multiview receiver to generate intermediate images by use of spatial

interpolation and also in other applications requiring accurate depth knowledge, such as

scenes to be postprocessed using depth keying. Experimental results were presented and

evaluated for both above application areas.

12

References

[1] H. Li, A. Lundmark, and R. Forchheimer, \Image Sequence Coding at Very Low Bi-

trates - A Review," IEEE Trans. on Image Processing, vol. 3, pp. 589{609, Sep. 1995.

[2] H. G. Musmann, P. Pirsch, and H. J. Grallert, \Advances in Picture Coding," Proc.

IEEE, vol. 73, pp. 523{548, Apr. 1985.

[3] D. Tzovaras, M. G. Strintzis, and H. Sahinoglou, \Evaluation of Multiresolution

Techniques for Motion and Disparity Estimation," Signal Processing : Image Com-

munication, vol. 6, pp. 59{67, Mar. 1994.

[4] M. Ziegler, \Digital Stereoscopic Imaging and Application, A Way Towards New

Dimensions, The RACE II project DISTIMA," in IEE Colloq. on Stereoscopic Tele-

vision, (London), 1992.

[5] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \Disparity Field and Depth Map

Coding for Multiview 3D Image Generation," Signal Processing (Image Communi-

cation), accepted for publication.

[6] M. G. Strintzis, D. Tzovaras, and N. Grammalidis, \Depth Map and Disparity Field

Coding for the Communication of Multiview Images," in Proc. 35th Int'l Conf. on

Digital Signal Processing '95, (Limassol, Cyprus), June 1995.

[7] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \Object-Based Coding of Stereo

Image Sequences using Joint 3-D Motion/Disparity Compensation," IEEE Trans.

on Circuits and Systems for Video Technology, to appear 1996.

[8] G. Adiv, \Determining Three-Dimensional Motion and Structure from Optical Flow

Generated by Several Moving Objects," IEEE Trans. on Pattern Analysis and Ma-

chine Intelligence, vol. 7, pp. 384{401, Jul. 1985.

[9] N. Grammalidis, S. Malassiotis, D. Tzovaras, and M. G. Strintzis, \Stereo image se-

quence coding based on 3D motion estimation and compensation," Signal Processing

: Image Communication, vol. 7, No. 2, pp. 129{145, Aug. 1995.

[10] D. Tzovaras and M. G. Strintzis, \Motion Estimation Using Rate Distortion Theory

for Very Low Bit Rate Image Sequence Coding," in Proc. Int'l Conf. Telecommuni-

cations '96, (Istanbul, Turkey), Apr. 1996.

[11] J. Ribas-Corbera and D. L. Neuho�, \Optimal Bit Allocations for Lossless Video

Coders : Motion Vectors vs. Di�erence Frames", in ICIP' 95, pp. 180-183, Sep. 1995.

13

[12] J. Ribas-Corbera and D. L. Neuho�, \Optimal Motion Vector Accuracy for Block-

based Motion-Compensated Video Coders", SPIE Electronic Imaging 1996, Digital

Video Compression : Algorithms and Technologies 1996, pp. 302-314, Jan. 1996.

[13] J. Ribas-Corbera and D. L. Neuho�, \Reducing Rate/Complexity in Video Cod-

ing by Motion Estimation with Block Adaptive Accuracy", VCIP'96, pp. 615-624,

Mar. 1996.

[14] Y. Shoham and A. Gersho, \E�cient Bit Allocation for an Arbitrary Set of Quan-

tizers," IEEE Trans. on Acoust., Speech, Signal Processing, vol. 36, pp. 1445{1453,

Sep. 1988.

[15] K. Ramchandran and M. Vetterli, \Best Wavelet Packet Bases in a Rate-Distortion

Sense," IEEE Trans. on Image Processing, vol. 2, pp. 160{175, Apr. 1993.

[16] G. J. Sullivan and R. Baker, \E�cient Quadtree Coding of Images and Video,"

IEEE Trans. on Image Processing, vol. 3, pp. 327{331, May 1994.

[17] H. Everett, \Generalized Langrange Multiplier Method for Solving Problems of Op-

timum Allocation of Resources," Operation Res., vol. 11, pp. 399{417, 1963.

[18] W. K. Press, B. P. Flannery, S. A. Tenkolsky, and W. T. Vetterling, Numerical

Recipes in C : The Art of Scienti�c Computing, Cambridge, U.K., Cambridge Univ.

Press, 1988.

[19] J. Konrad and E. Dubois, \Bayesian Estimation of Motion Vector Fields," IEEE

Trans. on on Pattern Analysis and Machine Intelligence, vol. 14, pp. 910{927,

Sep. 1992.

[20] S. Malassiotis and M. G. Strintzis \Joint Motion/Disparity MAP Estimation for

Stereo Image Sequences", in IEE Proceedings: Vision, Image & Signal Processing,

Vol. 143, No. 2, pp. 101-108, Apr. 1996.

[21] L. Falkenhagen, \3D Object-based Depth Estimation From Stereoscopic Image Se-

quences," in Proc. Int'l Workshop on Stereoscopic and 3D Imaging '95, (Santorini,

Greece), pp. 81{86, Sep. 1995.

[22] W. L. O. Egger and M. Kunt, \High Compression Image Coding Using an Adaptive

Morphological Subband Decomposition," Proc. IEEE, vol. 83, pp. 272{287, Feb.

1995.

[23] J. Weng, T. Huang, and N. Ahuja, \Motion and Structure from Two Perspective

Views: Algorithms, Error Analysis and Error Estimation," IEEE Trans. on Pattern

Analysis and Machine Intelligence, vol. 11, pp. 451{476, May 1989.

14

2:5

3

3:5

4

4:5

0:5 1 1:5 2 2:5 3

MSE

Bitrate

\Claire"

Figure 1: Average MSE versus average bitrate (in bits=vector) for the coding of the �rst50 frames of \Claire" using motion compensation.

10

15

20

25

30

35

40

45

1 2 3 4 5 6

MSE

Bitrate

\Tunnel"

Figure 2: Average MSE versus average bitrate (in bits=vector) for the coding of the �rst25 frames of \Tunnel" using disparity compensation.

15

3

3:5

4

4:5

5

5:5

6

6:5

0:5 1 1:5 2 2:5 3 3:5 4 4:5 5

MSE

Bitrate

�=0�=2029�=3092

�=6228

�=20776

�=50000Miss America 3

3

3

3

3 33

Figure 3: MSE versus bitrate (in bits=vector) for the block-based coding of the �fthframe of \Miss America".

3

3:5

4

4:5

5

5:5

6

6:5

0:5 1 1:5 2 2:5 3

MSE

Bitrate

�=0�=2064

�=3732

�=7181

�=12502

�=30598 \Claire" 33

3

3

33

3

Figure 4: MSE versus bitrate (in bits=vector) for the block-based coding of the thirdframe of \Claire".

16

8

9

10

11

12

13

14

15

0:5 1:5 2:5 3:5 4:5 5:5

MSE

Bitrate

�=0

�=22859

�=32474

�=60831

�=85125

�=200000\Sergio" 3

3

3

3

3

3

3

Figure 5: MSE versus bitrate (in bits=vector) for the block-based coding of the secondframe of \Sergio".

10

15

20

25

30

35

40

45

1 2 3 4 5 6

MSE

Bitrate

�=0�=5163

�=14354

�=29862

�=44365

�=328618

�=600000 \Tunnel" 33

3

3

3

33 3

Figure 6: MSE versus bitrate (in bits=vector) for the coding of the dense disparity �eldcorresponding to the second frame of \Tunnel".

17

3

3:5

4

4:5

5

5:5

6

6:5

7

7:5

1 2 3 4 5

MSE

Bitrate

\RDOVFE" 33

3

3

3

3 33

\RDOVFE+T"\RDOVFE+T+E"

Figure 7: Adaptive versus non-adaptive versions of the proposed RDOVFE algorithm interms of MSE versus bitrate (in bits=vector) for the block-based coding of the �fth frameof \Miss America".

18

(a) (b)

(c) (d)

(e) (f)

Figure 8: (a) Original frame 1 of \Miss America". (b) Original frame 5 of \Miss America".(c) Original motion vector �eld estimated with the block matching algorithm. (d) Motionvector �eld estimated with the rate-distortion algorithm at 1:58bits=vector. (e) Theoutput of the edge extractor (homogeneous areas marked white, edges are marked greyand highly textured areas marked black). (f) Computed motion vector using the adaptivesmoothing algorithm.

19

(a) (b)

(c) (d)

(e) (f)

Figure 9: (a) Original frame 1 of \Claire". (b) Original frame 3 of \Claire". (c) Originalmotion vector �eld estimated with the block matching algorithm. (d) Reconstructedframe 3 of \Claire" using the computed vector �eld at 1:1 bits=vector. (e) The outputof the edge extractor. (f) Computed motion vector �eld using the adaptive smoothingalgorithm.

20

(a) (b)

(c) (d)

(e) (f)

Figure 10: (a) Original left channel image \Sergio" (frame 2). (b) Original right channelimage \Sergio" (frame 2). (c) Block-based estimate of disparity. (d) The output of theedge extractor. (e) Pixel-based estimate of depth. (f) Intermediate image generatedusing the computed depth.

21

(a) (b)

(c) (d)

(e) (f)

Figure 11: (a) Original left channel image \Tunnel" (frame 2). (b) Original right channelimage \Tunnel" (frame 2). (c) Block-based estimate of disparity. (d) The output ofthe edge extractor. (e) Pixel-based estimate of depth. (f) Intermediate image generatedusing the computed depth.

22

Motion and disparity field estimation using rate-distortion optimization

Documents

Transcript of Motion and disparity field estimation using rate-distortion optimization