Motion and Disparity Field Estimation using
Rate-Distortion Optimization �
Dimitrios Tzovaras and Michael G. Strintzis, Senior Member, IEEE
Electrical and Computer Engineering DepartmentInformation Processing LaboratoryAristotle University of Thessaloniki
Thessaloniki 54006, Greecephone: (+30-31) 996-359, fax: (+30-31) 996-398
e-mail : [email protected]
Abstract
A rate-distortion framework is used to de�ne a displacement vector-�eld es-
timation technique for use in video coding. This technique achieves maximum
reconstructed image quality under the constraint of a target bitrate for the coding
of the vector sequence. The technique may be adapted so as to limit its smoothing
e�ect to homogeneous areas and avoid highly textured areas and edges. Use of this
technique is evaluated for two application areas in which the need for high com-
pression of displacement vector �elds is particularly acute. The �rst is motion-�eld
coding for very low bit rate image sequence transmission as in videophone applica-
tions. The second application area is coding for the transmission of dense disparity
�elds. This is needed for the generation at the receiver of intermediate viewpoints
through spatial interpolation. It is also needed in a number of other applications
requiring accurate depth knowledge, including 3D medical data transmission and
transmission of scenes to be postprocessed using depth-keyed segmentation. Ex-
perimental results illustrating the performance of the proposed technique in these
application areas are presented and evaluated.
Subject terms: Very low bit-rate coding; rate-distortion theory; vector �eldcoding; depth map coding.
�This work was supported by the EU CEC Project PANORAMA (ACTS project 092) and the COST211ter project.
i
List of Figures
1 Average MSE versus average bitrate (in bits=vector) for the coding of the
�rst 50 frames of \Claire" using motion compensation. : : : : : : : : : : 15
2 Average MSE versus average bitrate (in bits=vector) for the coding of the
�rst 25 frames of \Tunnel" using disparity compensation. : : : : : : : : : 15
3 MSE versus bitrate (in bits=vector) for the block-based coding of the �fth
frame of \Miss America". : : : : : : : : : : : : : : : : : : : : : : : : : : : 16
4 MSE versus bitrate (in bits=vector) for the block-based coding of the third
frame of \Claire". : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16
5 MSE versus bitrate (in bits=vector) for the block-based coding of the sec-
ond frame of \Sergio". : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17
6 MSE versus bitrate (in bits=vector) for the coding of the dense disparity
�eld corresponding to the second frame of \Tunnel". : : : : : : : : : : : 17
7 Adaptive versus non-adaptive versions of the proposed RDOVFE algo-
rithm in terms of MSE versus bitrate (in bits=vector) for the block-based
coding of the �fth frame of \Miss America". : : : : : : : : : : : : : : : : 18
8 (a) Original frame 1 of \Miss America". (b) Original frame 5 of \Miss
America". (c) Original motion vector �eld estimated with the block match-
ing algorithm. (d) Motion vector �eld estimated with the rate-distortion
algorithm at 1:58bits=vector. (e) The output of the edge extractor (homo-
geneous areas marked white, edges are marked grey and highly textured
areas marked black). (f) Computed motion vector using the adaptive
smoothing algorithm. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19
9 (a) Original frame 1 of \Claire". (b) Original frame 3 of \Claire". (c)
Original motion vector �eld estimated with the block matching algorithm.
(d) Reconstructed frame 3 of \Claire" using the computed vector �eld
at 1:1 bits=vector. (e) The output of the edge extractor. (f) Computed
motion vector �eld using the adaptive smoothing algorithm. : : : : : : : 20
10 (a) Original left channel image \Sergio" (frame 2). (b) Original right
channel image \Sergio" (frame 2). (c) Block-based estimate of disparity.
(d) The output of the edge extractor. (e) Pixel-based estimate of depth.
(f) Intermediate image generated using the computed depth. : : : : : : 21
ii
11 (a) Original left channel image \Tunnel" (frame 2). (b) Original right
channel image \Tunnel" (frame 2). (c) Block-based estimate of disparity.
(d) The output of the edge extractor. (e) Pixel-based estimate of depth.
(f) Intermediate image generated using the computed depth. : : : : : : 22
iii
I INTRODUCTION
The transmission of full motion video through limited capacity channels is critically de-
pendent on the ability of the compression schemes to achieve target bit rates while still
maintaining acceptable visual quality [1]. In order to achieve this, motion estimation
and motion compensated prediction are frequently used, so as to reduce temporal redun-
dancies in image sequences [2]. Similarly in the coding of stereo and multiview images,
prediction may be based on disparity compensation [3] or the best of motion and disparity
compensation [4].
While much attention has been devoted to the coding of the intraframe and pre-
diction error images, the displacement vector �elds are usually coded losslessly using
DPCM/Hu�man coding resulting in limited compression. The reason for this is that
digital video coding systems for many applications have at their disposal rates ranging
from 1 Mbit=sec to 25 Mbits=sec. At such rates, only a minor part of the global rate is
devoted to the transmission of the vector �elds, hence the bitrate overhead produced by
lossless encoding of the vector �elds is negligible. In many emerging application areas
however, lossy compression of the vector �elds is often highly desirable, and sometimes
unavoidable. For example, mobile videophone or multimedia transmission channels are
often limited to capacities of 4.8 - 64 kbps. In such cases, it is clearly desirable to reduce
as much as possible the bitrate needed to transmit the motion vector �elds, provided
that this reduction does not produce intolerable distortion in the reconstructed image. It
is also desirable to allocate the bitrate devoted to the coding of motion �elds adaptively,
depending on the complexity of the sequence and also on the overall bitrate availability
when the latter varies with time.
High compression of vector �elds is also needed in many emerging applications requir-
ing the transmission of disparity �elds for stereo image communication. Sparse disparity
�elds are used to predict one image of a stereoscopic pair from another, within a coding
scheme using disparity compensated prediction [3, 4] or joint motion and disparity com-
pensated prediction [5]. The need for compression is particularly acute when dense depth
or disparity �elds must be transmitted in order to permit multiview image generation
at the receiver, through spatial interpolation [4]. With a multiview display, this allows
the observer to watch the scene from varying optical angles. In other applications, the
1
generation of intermediate images is needed even with simple monoscopic displays at
the receiver. For example, simulated eye-contact is known to enhance the \telepresence"
which is desirable in advanced videoconferencing schemes [1]. Further, dense dispar-
ity estimation and transmission is necessary in many other applications, permitting for
example distance-to-the-camera keyed segmentation for background/foreground mixing,
and quality control with depth models.
Lossy or lossless techniques for the transmission of such dense disparity �elds were
investigated in [6, 7]. Lossless transmission was estimated to require bitrates as high
as 1Mbps for 720 � 576 pixel sized image sequences [6]. Such requirements exceed the
capacity of many practical communication channels.
Reduction of the bit rate needed for the coding of either motion or disparity vector
�elds may be achieved by optimal and adaptive bit allocation to already determined vec-
tor �elds [11, 12, 13] or by appropriate smoothing [8]. Such smoothing is often necessary
quite independently of data compression purposes. For example, in disparity estimation
of object-based coding of stereo sequences [3, 4, 5, 9, 10] improved results are obtained
by a disparity estimation procedure based on dynamic programming with a smoothing
constraint. In general, however, care must be taken that smoothing of vector �elds for
compression purposes, does not impair the e�ciency of the resulting displacement com-
pensated image prediction. Thus, a compromise must be found between minimizing the
entropy of the displacement vectors and minimizing the displaced frame di�erence be-
tween temporally or spatially adjacent frames. An elegant framework for the de�nition
of such a strategy is provided by the classical rate-distortion constrained minimization
procedure. This has been recently used in many coding applications including bit allo-
cation for vector quantization [14], wavelet packet image coding [15] and quadtree still
image coding [16].
The present paper extending the preliminary results reported in [10], investigates the
use of this methodology for block-based motion/disparity �eld estimation under the con-
straint of a target bitrate for the coding of the vector information. The entropy of the
displacement vectors is used as a measure of the bit rate needed for their lossless trans-
mission. An adaptive variant of the vector �eld estimation is speci�cally applied, which
limits the e�ects of the smoothing algorithm to the homogeneous areas only, avoiding
highly textured areas and edges.
2
Experimental results are given for the coding of the typical videophone QCIF image
sequences \Miss America" and \Claire" and the stereoscopic image sequences \Sergio"
and \Tunnel".
The paper is organized as follows. In Section II the general vector �eld estimation
problem is de�ned in terms of a rate-distortion minimization approach. Section III de-
scribes the application of the algorithm for the purposes of motion, disparity and depth
�eld estimation. Section IV describes an adaptive version of the general estimation al-
gorithm which avoids smoothing of the vector �elds in areas such as edges or object
boundaries. Experimental results given in Section V illustrate the performance of the
proposed methods. Finally, conclusions are drawn in Section VI.
II THE GENERAL VECTOR FIELD ESTIMA-
TION METHOD
An optimal in the rate-distortion sense vector �eld estimation technique is developed
in this section. The vector �eld estimator is going to be used for motion or disparity
estimation between two subsequent in time frames or the left and right camera images,
respectively for the two speci�c applications examined in this paper.
Let vi = (v(i)x ; v(i)y ) 2 V be the vector corresponding to the block i of the image,
where V is the set of all possible displacement vectors determined by the search area
S of the block matching algorithm. The general vector �eld estimation algorithm aims
to minimize the distortion D of the reconstructed image sequence, under a constraint
Rbudget on the rate for the transmission of the vector �eld information. This corresponds
to the following constrained optimization problem :
minvi2V
NbXi=1
Di(vi) ; (1)
subject toNbXi=1
�Ri(vi) � Rbudget ; (2)
where Nb is the total number of blocks in the image, Di(vi) is the contribution of vi to
the distortion function and �Ri(vi) is the contribution of the vector vi to the total rate
or cost of the transmission of the motion vectors.
3
The methodology in [17] permits the transformation of the above into an uncon-
strained optimization problem. In fact, as shown in [17] (the proof is also contained in
[14]), the solution fv?i (�); i = 1; : : : ; Nbg of the problem of unconstrained minimization
of
J(�) =NbXi=1
Ji(vi(�)) =NbXi=1
Di(vi(�)) + �NbXi=1
�Ri(vi(�)) ; (3)
is also a solution of (1) if
Rbudget =NbXi=1
�Ri(v?i (�)) : (4)
The problem therefore, reduces to ensuring that (4) has a solution for vi(�) and deter-
mining this solution. This was investigated from a general viewpoint in [14], where it was
shown that �Ri(v(�)) and Di(v(�)) are monotonic functions of the Langrange multiplier
�, which may be interpreted as a quality index, with values ranging from 0 (highest rate,
lowest distortion) to 1 (lowest rate, highest distortion). Further investigation in [15]
proved that the solution of (4) may be obtained using any fast convex algorithm such as
the bisection algorithm [18]. One such algorithm, which gave very good results in both
[15] and [16], is also adopted in the present paper. This algorithm proceeds as follows.
First two values �l < �u of � are found so that
NbXi=1
�Ri(vi(�u)) � Rbudget �NbXi=1
�Ri(vi(�l)) : (5)
For the coding of a sequence of frames, these values are chosen to be �l = 0, �u = 1
for the initial frame and �l = 0:8��, �u = 1:2�� for subsequent frames, where �� is
the solution of (4) for the previous frame. The bracketing interval is then successively
decreased in size by the following procedure :
� Step 1 For each block i, i = 1; : : : ; Nb, compute Di(vi(�l)) and Di(vi(�u)) and
the corresponding �Ri(vi(�l)) and �Ri(vi(�u)).
� Step 2 Set
�new =
�����PNb
i=1[Di(vi(�l))�Di(vi(�u))]PNbi=1[�Ri(vi(�l))��Ri((�u))]
�����+ � ; (6)
where � is a vanishingly small positive number.
� Step 3 For each i, determine iteratively the displacement vectors vi(�new) so as
to minimize for i = 1; 2; : : : ; Nb
Di(vi(�new)) + �new�Ri(vi(�new)) : (7)
4
Then compute the corresponding f�Ri(vi(�new))gi and fDi(vi(�new))gi.
� Step 4 IfPNb
i=1�Ri(vi(�new)) =PNb
i=1�Ri(vi(�u)), then stop, � = �u .
Else ifPNb
i=1�Ri(vi(�new)) > Rbudget, �l �new . Go to step 2.
Else �u �new . Goto step 2.
Obviously therefore, if the above algorithm converges, a solution of (4) will have been
obtained and the equivalent problem of the constrained minimization of (1) will have
been solved.
Note that the distortion corresponding to each motion vector in a speci�c search area
is computed only once, at the �rst iteration of the algorithm. Thus the computational
load of the algorithm consists of updating the entropy of the vector �eld and �nding the
minimum J(�).
III APPLICATION OF THE VECTOR FIELD ES-
TIMATION ALGORITHMTOMOTION, DIS-
PARITY AND DEPTH ESTIMATION
The speci�c way the vector �eld a�ects the quality of the reconstructed image will de-
termine the distortion index Di(vi(�)). A number of such distortion measures have been
proposed in the literature. In the case of block-based motion estimation, the simplest
and most commonly used is the temporally displaced frame di�erence
Di(vi) =bxXk=0
byXl=0
���imt(m+ k; n+ l)� imt�1(m+ k + v(i)x ; n+ l+ v(i)y )��� ; (8)
where (m;n) are the upper left hand corner coordinates of block i, imt(), and imt�1() is
the image at time instant t and t� 1, respectively and bx; by are the dimensions of the
block.
The corresponding distortion index for block-based disparity estimation is the spa-
tially displaced frame di�erence
Di(vi) =bxXk=0
byXl=0
���iml(m+ k; n+ l)� imr(m+ k + v(i)x ; n+ l + v(i)y )��� ; (9)
5
where again (m;n) are the upper left hand corner coordinates of block i, iml(t), and
imr(t) are the left and right images respectively and bx; by are the dimensions of the
block.
If depth is evaluated from disparity using methods such as given in [5] or from motion
using the method in [23], a pixel-wise distortion index may be used such as
Di(vi) =bxX
k=�bx
byXl=�by
���iml(m+ k; n+ l)� imr(m+ k + v(i)x ; n+ l + v(i)y )��� ; (10)
where now (m;n) are the coordinates of the working pixel, bx; by are the dimensions of a
rectangular window centered at (m;n) and v(i)x ; v(i)y are the components of the disparity
vector computed from depth information.
Alternately, if the depth �eld is modeled by a wireframe [5], only the node information
is transmitted, hence an appropriate distortion measure would be
Di(vi) =X
(m;n) 2 block i
(z(m;n)� z(m;n))2 ; (11)
where z(m;n) is the wireframe modeled depth and z(m;n) is the depth computed from
the disparity �eld vi.
Also the transmission cost R(vi(�)) will depend on the speci�c method used for the
coding of the vector �elds. Assume �rst that entropy coding (e.g. Hu�man or arithmetic
coding) is used, with an adaptive probability model. In this case, the transmission cost
or rate is the entropy
R = �m1X
x=�m1
m2Xy=�m2
pxylog2(pxy) ; (12)
where m1; m2 are the maximum allowed x- and y- components respectively, of the dis-
placement vectors, and
pxy =1
Nb
NbXk=1
dxy(vk) ; (13)
where
dxy(vk) =
(1 if (v(k)x = x) and (v(k)y = y)0 otherwise
: (14)
The above rate R may be rewritten in the form (3)
R =NbXi=0
�Ri(vi) ; (15)
6
with the de�nition for i > 1
�Ri(vi) = �m1X
x=�m1
m2Xy=�m2
p(i)xy log2(p(i)xy) +
m1Xx=�m1
m2Xy=�m2
p(i�1)xy log2(p(i�1)xy ) ; (16)
and
�R1(v1) = �m1X
x=�m1
m2Xy=�m2
p(1)xy log2(p(1)xy ) ; (17)
with
p(i)xy =1
i
iXk=1
dxy(vk) : (18)
The algorithm in Section II may be applied using (16) for the determination of the
contribution of vi to the rate increase. Alternately, since the second term of the right
part of (16) is independent of vi, step 3 of the algorithm may be equivalently carried out
by minimizing for each i the sum
D?(vi(�new)) + �newR?(vi(�new)) ; (19)
where D?(vi) is the total distortion of the �rst i blocks of the image and is given by
D?(vi) =iX
k=1
Dk(vk(�new)) ; (20)
and R?(vi) is equal to the �rst term in (16), thus to the total entropy of the �rst i blocks
of the image :
R?(vi) = �m1X
x=�m1
m2Xy=�m2
p(i)xy log2(p(i)xy) ; (21)
where p(i)xy is given by (18).
Note also that (18) is equivalent to the following e�cient formula for the incremental
computation of pxy(vi(�)) :
p(i+1)xy =i
i+ 1p(i)xy + dxy(vi+1(�)) : (22)
Alternately, DPCM coding of the vector �eld may be assumed, followed by appropriate
entropy coding. In this case the coding rate is expressed by
R = �2m1X
x=�2m1
2m2Xy=�2m2
qxylog2(qxy) ; (23)
where qxy is the probability that the di�erence between the vector minimizing the index
J(�) and the vector corresponding to the previous block, dv = vi�vi�1 satis�es dvx = x
7
and dvy = y. The probability qxy is computed using equation (18) or (22) with vk
replaced by dvk.
A more computationally e�cient approach, which does not involve incremental com-
putation of the probability density of the vector �eld or the �rst order vector �eld dif-
ferences, is to assume a model for this probability density function. Speci�cally, the
assumption of Gauss-Markov Random Field to describe both motion [19] and disparity
[20] vector di�erences could be used so as to accelerate the rate-distortion minimization
procedure.
IV ADAPTIVE VECTOR FIELD ESTIMATION
In the general procedure outlined in the preceding section, the mechanism for the mini-
mization of the rate constraint index, is based on limiting the entropy, which as expected,
and as shown by the experimental results, is attained by smoothing of the displacement
vector �elds. Uniform smoothing of motion or disparity vector �elds may have very un-
desirable e�ects, and in fact, may result to severe loss of motion or depth information,
especially in the neighborhood of edges, and thus to an increased reconstruction error.
An adaptive application of the proposed algorithm is therefore necessary so as to limit
the smoothing only to the homogeneous areas and avoid highly textured areas or edges.
In this way, the discontinuities of both motion and disparity [21], are preserved and the
reliable vector �elds corresponding in highly textured areas are excluded from smoothing.
Furthermore, the use of edge information reduces the well known \corona e�ect" which
is one of the drawbacks of block-based matching methods.
Adaptive vector �eld estimation may be accomplished by the minimization of the
following index :
J(�) =NbXi=1
Di(vi(�)) + �NbXi=1
E(vi(�))�Ri(vi(�)) ; (24)
where E(vi(�)) is a bitmap de�ned as :
E(vi(�)) =
(0 if vi is reliable1 otherwise
: (25)
The motion vector vi corresponding to block i may be considered reliable whenever the
block contains edges or highly textured areas. For the detection of edges and textured
8
areas a variant of the technique in [22] was used, based on the observation that highly
textured areas exhibit a high local intensity variance in all directions while on edges the
intensity variance is higher across the direction of the edge.
Accordingly, for each pixel in the image, two parameters were computed. The �rst
parameter, �2, is the unbiased estimate of the local variance. If the variance exceeds a
threshold �2c , the pixel is considered to be a part of either an edge or a highly textured
area. The second parameter was used to di�erentiate highly textured areas from edges.
For this purpose the local variances in eight directions (multiples of �=4), were computed
and the second parameter p was de�ned by
p =maxi �2imini �2i
: (26)
Since p is close to 1 in areas with uniform texture, and much higher in edge areas, pixels
were assigned to edges whenever p exceeded a threshold pc. Finally, blocks containing
edges or consisting in large part (over 50%) of highly textured area were assumed to
produce reliable vi.
However, with the adaptive version of the algorithm, there is an increased probability
that no solution of the constrained optimization problem (3,4) will exist. In this case it
will be impossible to determine an initial �l satisfying (5) and the algorithm (5)-(7) will
be inapplicable. If this happens, the restriction on smoothing may be dropped for the
highly textured areas and retained only for blocks containing edge information. If this
still fails to produce a rate equal to Rbudget according to (4), all restrictions on smoothing
are lifted and the adaptive version of the algorithm reverts to the non-adaptive one.
Both the non-adaptive (3) and the adaptive (24) versions of the vector �eld estimation
techniques were experimentally applied for the estimation of motion, disparity and depth
�elds, using appropriate distortion measures to evaluate the quality of the reconstructed
images.
V EXPERIMENTAL RESULTS
In order to evaluate the performance of the proposed approach for the low-bitrate coding
of videophone and videoconferencing image sequences, the proposed Rate Distortion
Optimization Vector Field Estimation (RDOVFE) algorithm was applied �rst to the
9
typical QCIF sequences \Claire" and \Miss America". Further, the disparity estimation
algorithm was tested in the stereo image sequences \Sergio" of dimension 256� 256 and
\Tunnel" 1 of dimension 360 � 288.
First, the RDOVFE algorithm was sequentially applied for the coding of the �rst 50
frames of \Claire" using motion compensation. On a R4400 Silicon Graphics machine it
required an execution time of about 20 seconds compared to 4 seconds for the exhaustive
block-matching search algorithm. The average bitrate versus the average mean square
error (MSE) is shown in Figure 1.
As seen, the proposed technique may achieve very considerable bitrate savings with
very modest corresponding increases in the mean-square error. For example, Figure 1
indicates that coding of the �rst 50 frames of the \Claire" sequence may be achieved
with 1:1 bits=vector (0:01718 bits=pixel) and an average mean square error of 2:64. This
contrasts with 3:23 bits=vector (0:05046 bits=pixel) required for lossless coding with a
mean square error of 2:53. That is, a bitrate decrease by a factor of 2:93 is achieved at the
cost of mean-square error increase of the order of 4% with no visible image deterioration.
Entropy coding, without DPCM, was used to compress the vector �elds.
Next, full search block matching motion estimation was performed between the �rst
and �fth frame of \Miss America" (see Figures 8a and 8b) and the �rst and the third
frame of \Claire" (see Figures 9a and 9b), in order to evaluate the quality of the computed
vector �elds. A block size of 8�8 pixels and a search area of �8; : : : ; 8 pixels for both x�
and y� components of the motion vector �eld was used. The rate distortion minimization
algorithm was run for various values of Rbudget and was seen to converge in fewer than 6
iterations. Figure 3 and 4 show the MSE versus the bitrate (in bits=vector) needed for
the coding of the motion vectors corresponding to the �fth frame of \Miss America" and
the third frame of \Claire", respectively. The value of � corresponding to each of the
operating points of the algorithm is also shown.
The original vector �eld computed using an exhaustive search block matching algo-
rithm is shown in Figures 8c and 9c for the two sequences, respectively. The vector �eld
coded at 1:58 bits=vector for \Miss America" computed using the proposed algorithm,
1The sequence \Tunnel" which is a MPEG-4 stereo test sequence, was prepared by the CentreD'�Etudes de T�el�edi�usion et T�el�ecommunications (CCETT) for use in the DISTIMA RACE and thePANORAMA ACTS projects.
10
is shown in Figures 8d. The motion compensated estimate of the third frame of \Claire"
corresponding to motion �eld coded at 1:1bits=vector is shown in Fig. 9d.
Edge extraction was subsequently performed using the technique proposed in Section
IV with �2c = 40 and pc = 10 with results shown in Figures 8e and 9e. In these �gures
homogeneous areas are shown in white, edges in grey and highly textured areas shown in
black. The adaptive version of the algorithm described in section IV was then applied to
the two test sequences and produced the vector �elds shown in Figures 8f and 9f. In this
case, the adaptive smoothing technique further reduced the bitrate (at 1.202 bits/vector)
by eliminating the scattered motion vectors corresponding to the background.
The performances of the adaptive and non-adaptive versions of the RDOVFE algo-
rithm were also compared. Figure 7 shows the rate-distortion curve for the adaptive
and non-adaptive versions in application for the coding of \Miss America". In curve
RDOVFE-(T+E) of Figure 7 the no-smoothing constraint of the adaptive version of the
proposed algorithm is used in both highly textured areas and blocks containing su�cient
edge information. In curve RDOVFE-T of Figure 7 the no-smoothing constraint is re-
laxed only in the highly textured areas in order to obtain lower bit rates. If again, the
Rbudget constraint is not satis�ed, the no-smoothing constraint is relaxed for all blocks in
the image, and the adaptive version of algorithm reduces to the non-adaptive one (curve
RDOVFE in Figure 7).
The displacement vector �eld estimation algorithm was then applied for the coding
of the �rst 25 frames of \Tunnel" using pixel-based disparity compensation. Figure 2
shows the average MSE versus the average bitrate for the coding of depth information of
the �rst 25 frames of the sequence \Tunnel'. Accurate depth information (with negligible
increase in MSE) can be extracted at 2:61 bits=pixel.
The original left and right second frames of \Sergio" and \Tunnel" are shown in
Figures 10a and 10b and 11a and 11b. Block-based disparity estimation with a block size
of 8 � 8 was performed �rst. The computed x-component of the block-based estimated
disparity �eld is shown in Fig. 10c and 11c. Figure 5 shows the MSE versus the bitrate
(in bits=vector) needed for the coding of the block-based disparity �eld corresponding to
the second frame of \Sergio".
The dense disparity �eld was estimated using a window of 9 � 9 pixels and a search
area of �16; : : : ; 16 and �1; : : : ; 1 pixels for the x- and y- components of the disparity
11
�eld, respectively. Figure 6 shows the MSE versus the bitrate needed for the coding of
the depth map corresponding to the second frame of \Tunnel".
The output of the edge extraction procedure applied to the right channel second
frames of \Sergio" and \Tunnel" is shown in Figure 10d and 11d respectively. The
computed depth maps using the adaptive algorithm are then illustrated in Figures 10e
and 11e. The depth map is shown quantized into 256 levels and has the same resolution
with the original image (since it is computed by a dense disparity �eld). Brighter areas
represent those closer to the cameras. As seen, the smoothing properties of the depth
estimation method results in realistic depth-map estimates.
Spatial interpolation was also performed using the technique in [6] for the generation
of the intermediate images of \Tunnel" and \Sergio" using the information of the left
camera image and the computed dense disparity �eld. The resulting intermediate images
are shown in Figures 10f and 11f.
VI CONCLUSIONS
A rate-distortion framework was used to de�ne a vector �eld estimation technique which
achieves maximum reconstructed image quality under the constraint of a target bitrate
for the coding of the vector sequence. An adaptive version of this technique limits the
smoothing e�ect to homogeneous areas and avoids highly textured areas and edges. Ap-
plication of this technique were investigated for the estimation of motion vectors in very-
low bitrate image sequence coding. In this case, the proposed algorithm can be combined
with an appropriate rate control strategy to optimize the coding of the motion vectors
corresponding to all frames of an image sequence. The technique was also evaluated in
application for the estimation and coding of dense disparity vector �elds. The latter are
needed to enable a multiview receiver to generate intermediate images by use of spatial
interpolation and also in other applications requiring accurate depth knowledge, such as
scenes to be postprocessed using depth keying. Experimental results were presented and
evaluated for both above application areas.
12
References
[1] H. Li, A. Lundmark, and R. Forchheimer, \Image Sequence Coding at Very Low Bi-
trates - A Review," IEEE Trans. on Image Processing, vol. 3, pp. 589{609, Sep. 1995.
[2] H. G. Musmann, P. Pirsch, and H. J. Grallert, \Advances in Picture Coding," Proc.
IEEE, vol. 73, pp. 523{548, Apr. 1985.
[3] D. Tzovaras, M. G. Strintzis, and H. Sahinoglou, \Evaluation of Multiresolution
Techniques for Motion and Disparity Estimation," Signal Processing : Image Com-
munication, vol. 6, pp. 59{67, Mar. 1994.
[4] M. Ziegler, \Digital Stereoscopic Imaging and Application, A Way Towards New
Dimensions, The RACE II project DISTIMA," in IEE Colloq. on Stereoscopic Tele-
vision, (London), 1992.
[5] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \Disparity Field and Depth Map
Coding for Multiview 3D Image Generation," Signal Processing (Image Communi-
cation), accepted for publication.
[6] M. G. Strintzis, D. Tzovaras, and N. Grammalidis, \Depth Map and Disparity Field
Coding for the Communication of Multiview Images," in Proc. 35th Int'l Conf. on
Digital Signal Processing '95, (Limassol, Cyprus), June 1995.
[7] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \Object-Based Coding of Stereo
Image Sequences using Joint 3-D Motion/Disparity Compensation," IEEE Trans.
on Circuits and Systems for Video Technology, to appear 1996.
[8] G. Adiv, \Determining Three-Dimensional Motion and Structure from Optical Flow
Generated by Several Moving Objects," IEEE Trans. on Pattern Analysis and Ma-
chine Intelligence, vol. 7, pp. 384{401, Jul. 1985.
[9] N. Grammalidis, S. Malassiotis, D. Tzovaras, and M. G. Strintzis, \Stereo image se-
quence coding based on 3D motion estimation and compensation," Signal Processing
: Image Communication, vol. 7, No. 2, pp. 129{145, Aug. 1995.
[10] D. Tzovaras and M. G. Strintzis, \Motion Estimation Using Rate Distortion Theory
for Very Low Bit Rate Image Sequence Coding," in Proc. Int'l Conf. Telecommuni-
cations '96, (Istanbul, Turkey), Apr. 1996.
[11] J. Ribas-Corbera and D. L. Neuho�, \Optimal Bit Allocations for Lossless Video
Coders : Motion Vectors vs. Di�erence Frames", in ICIP' 95, pp. 180-183, Sep. 1995.
13
[12] J. Ribas-Corbera and D. L. Neuho�, \Optimal Motion Vector Accuracy for Block-
based Motion-Compensated Video Coders", SPIE Electronic Imaging 1996, Digital
Video Compression : Algorithms and Technologies 1996, pp. 302-314, Jan. 1996.
[13] J. Ribas-Corbera and D. L. Neuho�, \Reducing Rate/Complexity in Video Cod-
ing by Motion Estimation with Block Adaptive Accuracy", VCIP'96, pp. 615-624,
Mar. 1996.
[14] Y. Shoham and A. Gersho, \E�cient Bit Allocation for an Arbitrary Set of Quan-
tizers," IEEE Trans. on Acoust., Speech, Signal Processing, vol. 36, pp. 1445{1453,
Sep. 1988.
[15] K. Ramchandran and M. Vetterli, \Best Wavelet Packet Bases in a Rate-Distortion
Sense," IEEE Trans. on Image Processing, vol. 2, pp. 160{175, Apr. 1993.
[16] G. J. Sullivan and R. Baker, \E�cient Quadtree Coding of Images and Video,"
IEEE Trans. on Image Processing, vol. 3, pp. 327{331, May 1994.
[17] H. Everett, \Generalized Langrange Multiplier Method for Solving Problems of Op-
timum Allocation of Resources," Operation Res., vol. 11, pp. 399{417, 1963.
[18] W. K. Press, B. P. Flannery, S. A. Tenkolsky, and W. T. Vetterling, Numerical
Recipes in C : The Art of Scienti�c Computing, Cambridge, U.K., Cambridge Univ.
Press, 1988.
[19] J. Konrad and E. Dubois, \Bayesian Estimation of Motion Vector Fields," IEEE
Trans. on on Pattern Analysis and Machine Intelligence, vol. 14, pp. 910{927,
Sep. 1992.
[20] S. Malassiotis and M. G. Strintzis \Joint Motion/Disparity MAP Estimation for
Stereo Image Sequences", in IEE Proceedings: Vision, Image & Signal Processing,
Vol. 143, No. 2, pp. 101-108, Apr. 1996.
[21] L. Falkenhagen, \3D Object-based Depth Estimation From Stereoscopic Image Se-
quences," in Proc. Int'l Workshop on Stereoscopic and 3D Imaging '95, (Santorini,
Greece), pp. 81{86, Sep. 1995.
[22] W. L. O. Egger and M. Kunt, \High Compression Image Coding Using an Adaptive
Morphological Subband Decomposition," Proc. IEEE, vol. 83, pp. 272{287, Feb.
1995.
[23] J. Weng, T. Huang, and N. Ahuja, \Motion and Structure from Two Perspective
Views: Algorithms, Error Analysis and Error Estimation," IEEE Trans. on Pattern
Analysis and Machine Intelligence, vol. 11, pp. 451{476, May 1989.
14
2:5
3
3:5
4
4:5
0:5 1 1:5 2 2:5 3
MSE
Bitrate
\Claire"
Figure 1: Average MSE versus average bitrate (in bits=vector) for the coding of the �rst50 frames of \Claire" using motion compensation.
10
15
20
25
30
35
40
45
1 2 3 4 5 6
MSE
Bitrate
\Tunnel"
Figure 2: Average MSE versus average bitrate (in bits=vector) for the coding of the �rst25 frames of \Tunnel" using disparity compensation.
15
3
3:5
4
4:5
5
5:5
6
6:5
0:5 1 1:5 2 2:5 3 3:5 4 4:5 5
MSE
Bitrate
�=0�=2029�=3092
�=6228
�=20776
�=50000Miss America 3
3
3
3
3 33
Figure 3: MSE versus bitrate (in bits=vector) for the block-based coding of the �fthframe of \Miss America".
3
3:5
4
4:5
5
5:5
6
6:5
0:5 1 1:5 2 2:5 3
MSE
Bitrate
�=0�=2064
�=3732
�=7181
�=12502
�=30598 \Claire" 33
3
3
33
3
Figure 4: MSE versus bitrate (in bits=vector) for the block-based coding of the thirdframe of \Claire".
16
8
9
10
11
12
13
14
15
0:5 1:5 2:5 3:5 4:5 5:5
MSE
Bitrate
�=0
�=22859
�=32474
�=60831
�=85125
�=200000\Sergio" 3
3
3
3
3
3
3
Figure 5: MSE versus bitrate (in bits=vector) for the block-based coding of the secondframe of \Sergio".
10
15
20
25
30
35
40
45
1 2 3 4 5 6
MSE
Bitrate
�=0�=5163
�=14354
�=29862
�=44365
�=328618
�=600000 \Tunnel" 33
3
3
3
33 3
Figure 6: MSE versus bitrate (in bits=vector) for the coding of the dense disparity �eldcorresponding to the second frame of \Tunnel".
17
3
3:5
4
4:5
5
5:5
6
6:5
7
7:5
1 2 3 4 5
MSE
Bitrate
\RDOVFE" 33
3
3
3
3 33
\RDOVFE+T"\RDOVFE+T+E"
Figure 7: Adaptive versus non-adaptive versions of the proposed RDOVFE algorithm interms of MSE versus bitrate (in bits=vector) for the block-based coding of the �fth frameof \Miss America".
18
(a) (b)
(c) (d)
(e) (f)
Figure 8: (a) Original frame 1 of \Miss America". (b) Original frame 5 of \Miss America".(c) Original motion vector �eld estimated with the block matching algorithm. (d) Motionvector �eld estimated with the rate-distortion algorithm at 1:58bits=vector. (e) Theoutput of the edge extractor (homogeneous areas marked white, edges are marked greyand highly textured areas marked black). (f) Computed motion vector using the adaptivesmoothing algorithm.
19
(a) (b)
(c) (d)
(e) (f)
Figure 9: (a) Original frame 1 of \Claire". (b) Original frame 3 of \Claire". (c) Originalmotion vector �eld estimated with the block matching algorithm. (d) Reconstructedframe 3 of \Claire" using the computed vector �eld at 1:1 bits=vector. (e) The outputof the edge extractor. (f) Computed motion vector �eld using the adaptive smoothingalgorithm.
20
(a) (b)
(c) (d)
(e) (f)
Figure 10: (a) Original left channel image \Sergio" (frame 2). (b) Original right channelimage \Sergio" (frame 2). (c) Block-based estimate of disparity. (d) The output of theedge extractor. (e) Pixel-based estimate of depth. (f) Intermediate image generatedusing the computed depth.
21
(a) (b)
(c) (d)
(e) (f)
Figure 11: (a) Original left channel image \Tunnel" (frame 2). (b) Original right channelimage \Tunnel" (frame 2). (c) Block-based estimate of disparity. (d) The output ofthe edge extractor. (e) Pixel-based estimate of depth. (f) Intermediate image generatedusing the computed depth.
22
Top Related