Disparity Estimation with Modeling of Occlusion and Object Orientation

11

Transcript of Disparity Estimation with Modeling of Occlusion and Object Orientation

Disparity Estimation with Modeling of Occlusion and ObjectOrientationAndr�e Redert, Chun-Jen Tsai+, Emile Hendriks, Aggelos K. Katsaggelos+Information Theory Group, Department of Electrical EngineeringDelft University of Technology, Delft, The Netherlands+ Department of Electrical and Computer EngineeringNorthwestern University, Evanston, IL , USA1 ABSTRACTStereo matching is fundamental to applications such as 3-D visual communications and depth measurements.There are several di�erent approaches towards this objective, including feature-based methods,1,2 block-basedmethods,3,4 and pixel-based methods.5 Most approaches use regularization to obtain reliable �elds. Generallyspeaking, when smoothing is applied to the estimated depth �eld, it results in a bias towards surfaces that areparallel to the image plane. This is called fronto-parallel bias.4 Recent pixel-based approaches5 claim that nodisparity smoothing is necessary. In their approach, occlusions and objects are explicitly modeled. But thesemodels interfere each others in the case of slanted objects and result in a fragmented disparity �eld. In thispaper we propose a disparity estimation algorithm with explicit modeling of object orientation and occlusion.The algorithm incorporates adjustable resolution and accuracy. Smoothing can be applied without introducingthe fronto-parallel bias. The experiments show that the algorithm is very promising.2 INTRODUCTIONThe estimation of disparity �elds is a di�cult problem. The disparity �eld between a pair of images dependsdirectly on two things: �rst, the distance of the objects to the cameras, and second, the geometry of the stereocameras. Although there are a number of techniques for disparity/depth estimation, many of them require cameracalibration. For applications like multiview image generation,6 stereo image sequence coding,7 etc., a 2-D imageregistration method without camera calibration is preferred. To reduce the complexity of disparity estimation,many researchers assume that the cameras are arranged in parallel-axis con�gurations so that the pixels on onescanline in the left image are matched to pixels on the same scanline in the right image. In this case, the stereomatching problem is simpli�ed to an intra-scanline pixel matching problem.One popular approach for solving this intra-scanline matching is the disparity space technique.1,8,5 Thistechnique involves the computation of a matching error image (disparity space image), the de�nition of a matchingpath model, and an algorithm for minimal error path search. Previous approaches use a simple path model toreduce the size of the search space.8 In these methods, di�erent constraints are used to resolve the problemsof multiple possible paths. For example, in Ref [1], inter-scanline matching is used along with intra-scanline

step type �xL; �xR �x; �dleft occlusion 1, 0 12 ; 12match 1, 1 1; 0right occlusion 0, 1 12 ; � 12Table 1: Increments for elementary steps in a path.matching. In Ref [8], reliable feature points (Ground Control Points) are used to con�ne the matching paths.Ref [5] requires the matching path to contain minimal number of discontinuities. However, all these methods failto recognize the fact that many ambiguities in path search come from the simpli�ed path model which does notdistinguish between occlusions and slanted objects. In this paper we propose a disparity estimation algorithmwith explicit modeling of object orientation and occlusion. The experiments show that this new approach handlesocclusions and slanted objects very well.The paper is organized as follows. In Section 3, a disparity space is introduced and the proposed algorithmfor disparity estimation in this space is formulated. In section 4, the implementation details are described.Experiments with both a real and a synthetic image sequences are conducted, and described in detail in the samesection. Finally, Section 5 concludes the paper.3 PROPOSED DISPARITY ESTIMATION ALGORITHMThe proposed disparity estimation algorithm is based on a disparity space technique. A disparity space is atwo-dimensional matching space computed from a pair of corresponding scanlines in the left and the right images.There are di�erent ways for generating this space. In this paper, disparity spaces are computed by transformingthe disparity space coordinates (xL; xR), �rst used by Ohta and Kanade,1 through the following transformation:� xd � = � 12 1212 � 12 �� xLxR � ; (1)where the x and d axes span the disparity spaces de�ned in this paper, while the xL and xR axes span thedisparity spaces described in Ref [1] . In equation (1), 0 � x; xL; xR �W � 1 and jdj �W � 1 if the image widthis W . Also note that the unit for x and d is di�erent from the unit for xL and xR because the transformation isnot norm-preserving. The relation between the two disparity spaces is illustrated in Figure 1.A path in a disparity space de�nes a mapping that associates left scanline pixels to corresponding right scanlinepixels. In the xL�xR space, a path starts at (0; 0) and ends at (W;W ), while in the x�d space, a path starts at(0; 0) and ends at (W; 0). Each path is composed of many elementary steps. Most researchers only consider threetypes of elementary steps: the match step, the left occlusion step, and the right occlusion step. The incrementsof each step for the xL�xR and x�d spaces, denoted respectively by (�xL; �xR) and (�x; �d), are listed in Table 1.From this table we can see that the slope �d�x of an elementary step in the new disparity space is con�ned to �nitevalues between �1 and 1. This is an advantage of the new disparity space.Di�erent costs are assigned to the elementary steps so that a low cost path de�nes a good correspondingmapping between left scanline pixels and right scanline pixels. A dynamic programming (DP) technique istypically used to �nd the minimal cost path. Previous works5,8 assign constant penalty costs to occlusion stepsand use intensity di�erence measures as costs for matching steps. Since each cluster of matching steps hererepresents a planar object parallel to the camera image planes, slanted planar objects or curved objects whichcause contracting/stretching mapping will be modeled by many small fronto-parallel objects with occlusion gapsin between if we only allow three kinds of elementary steps. Obviously, this e�ect is not desirable for mostnatural scenes. To avoid fragmenting a scene, one should consider other elementary matching step types with

xL

xR

+W/2

-W/2

W

W

W

x

d

∆x

∆d

Figure 1: An example of a path. Note that the units for the xL�xR space and for the x�d space are di�erent.0 < j�dj < 12 . This is equivalent to explicitly modeling of objects which are not parallel to image planes.To model the new elementary steps, the disparity space is �rst resampled at �x and �d intervals withN = �x=�d an integer (Figure 1). The derivatives (slopes) of a disparity path now have 2N +1 di�erent valuesbetween 1 and -1, corresponding to 2N + 1 elementary match steps. There are also 2N + 1 elementary occlusionsteps. The resolution of the estimated disparity �eld is determined by �x, while the accuracy is limited to �d.At each node in Figure 1, there are 4N + 2 possible incoming paths (including 2N + 1 object paths and 2N + 1occlusion paths). There are also 4N + 2 possible outgoing paths. Just like the incoming paths, each of theseoutgoing paths takes one of the 2N + 1 possible slopes and one of the following 2 types: object or occlusion.To �t the abovementioned approach in the framework of dynamic programming, we must de�ne a recursivecost function for a path. Let us use an integer index (k) to label a node and a noninteger index (k + 12 ) to labelthe path that stems from this node (see Figure 2). Let us also denote the slope of the incoming path by, d0k� 12 ,and the type of that path by, ok� 12 . The slope and type of the outgoing path are denoted respectively by d0k+ 12 ,ok+ 12 . That is, we want to choose the slope and the type of the path so that the cost is minimized. With thisnotation the recursive cost function for a path is de�ned by:∆x

∆d

x

d

(xk , dk)

(xk-1, dk-1)x

d

(xk , dk)

(xk-1, dk-1)

(a) (b)

(d’k+1/2 , ok+1/2)

Figure 2: a) Possible outgoing steps (dotted lines) at node k. In this example, N = 2. b) Steps (dotted lines)compared by DP algorithm in equation (2).

C(xk; dk; d0k+ 12 ; ok+ 12 ) = mind0k� 12 ;ok� 12 C(xk�1; dk�1; d0k� 12 ; ok� 12 )+ CN (xk ; dk)+ CLP (xk ; dk; xk�1; dk�1; ok� 12 )+ CT (d0k� 12 ; ok� 12 ; d0k+ 12 ; ok+ 12 ): (2)The transition from one node to the next is computed by:xk = xk�1 +�x; dk = dk�1 + d0k� 12 ��x: (3)The CN term sets a limit on the possible disparity range. For example,CN (xk ; dk) = � 0; if jdkj < maximal allowable disparity1; otherwise (4)The CLP term is the matching cost for a local path (a single step) that extends the global path from nodek � 1 to node k. For an object step, the matching cost is computed by �rst transforming the single step back toxL�xR space, then computing the average absolute luminance di�erence between the corresponding pixels in theleft and the right scanlines. To get more accurate matching costs, resampling of scanline pixels is required. Inthe case of an occlusion step, a constant cost is used. In both cases an extra term for disparity �eld smoothingcan be added.The CT term is a cost used to constrain the possible transitions from incoming to outgoing path. If an object-to-object transition takes place, we can use CT to impose a smoothness constraint on second order disparity �eldderivatives. If an occlusion-to-occlusion transition occurs, a cost may be added here to penalize change in thedirection of the occlusion. Finally, attaching costs to object-to-occlusion and occlusion-to-object transitions keepthe algorithm from breaking the scene into many small occlusions and objects. This is similar to the cost thatCox5 used in the MLMH algorithm to reduced the number of horizontal discontinuities. A di�erence however isthat in our case the cost is only assigned for real occlusion-segmentation transitions, where in Cox5 it is assignedalso for slanted or curved objects.In addition to the sampling of the x and d axes we resample the y axis with �y intervals. Algorithm (2) isapplied once per �y scanlines. The computational load to obtain a disparity �eld for a stereo image pair has theorder O�Ny�y � Nx�x � Nd�d � (4N + 2)2� � O�16NyNxNdN3�y�x2 � ; (5)where Ny is the total number of scanlines in each image, Nx is the number of pixels on one scanline, Nd is thesize of the allowed disparity range, and N = �x�d as de�ned before.4 EXPERIMENTSExperiments on both real and synthetic stereo image pairs are conducted and the results are shown in thissection. In these experiments, CLP and CT are de�ned as below:CLP = � (luminance matching cost) +A2 � jd0j; for object path(constant occlusion cost, A0) +A1 � (1� jd0j); for occlusion path (6)

Figure 3: The left image from the MAN sequence. Figure 4: The right image from the MAN sequence.CT = 8<: 0; for occlusion - occlusion transitionA3; for occlusion - object transitionA4 � jd00j; for object - object transition (7)So next to �x, �y, and �d (N = �x�d ) we have 5 additional parameters:� A0 : occlusion penalty, same as in Cox's algorithm.� A1 : occlusion direction bias towards jd0j = 1 (single image occlusion).� A2 : disparity path bias towards jd0j = 0 (fronto parallel bias).� A3 : object - occlusion segmentation penalty.� A4 : disparity 2nd horizontal derivative smoothing.With A1 = A2 = 1 and A3 = A4 = 0 we have the best approximation to Cox's ML algorithm.5 Due to theelementary di�erences in path modeling (shown in Table 1), there are no choices for �x, �d and �y that resultin an exact match with ML algorithm. Two choices give the best approximation. With �x = �d = �y = 1 ouralgorithm has an elementary occlusion stepsize twice as large (lower occlusion resolution). With �x = �d = 0:5and �y = 1, the elementary matching step has half the size of that in the ML algorithm (higher object resolution).Figures 3 and 4 are the stereo image pair from the MAN image sequence in the European PANORAMA9project. The parameters used in the algorithm are: �x = 1;�y = 2;�d = 1; A0 = 5; A1 = A2 = A4 =0; and A3 = 5. The estimated disparity map is shown in Figure 5. The 3-D mesh surface plot of this map isshown in Figure 6. Note that the absence of the fronto-parallel bias allows the uniform background to have anoisy shape, while the face is still recognizable.To quantify the performance of the proposed algorithm, we created a synthetic pair of images for testing(see Figure 7 and 8). The true disparity map is shown in Figure 9. The results are shown in Table 2 and inFigure 11-16. In Table 2, the column o ! o shows the percentage of correct detection of occlusions, the columnd! d lists the percentage of correct detection of disparities, and the column PSNR (= 10 log 2552MSE) is computedbased on the MSE of all d! d disparities. The algorithm has no less than 8 parameters, while the computationalload can be very high for certain parameter regions. It is a di�cult task to cover the whole parameter space bya small number of experiments.The experiments in Table 2 give a fairly good overview of the aspects of the algorithm. They lead us to thefollowing observations:

Figure 5: The estimated disparity map of MAN. Inthis map, pixels with high intensity correspond topixels with large disparity. Occlusions are markedas white pixels. 010

2030

4050

6070

0

10

20

30

40

50

60

70

120

140

160

180

200

Figure 6: The 3D plot of the estimated disparity of MAN.� Comparing experiments 1, 2, and 3 with the rest, it seems that the results get better for smaller �x;�y;and �d, as expected.� From experiments 2, 3 and also 6, 19, it seems that the fronto-parallel bias works better than the smoothingon second derivatives.� From experiments 7, 11, and 12 we can see that the occlusion direction parameter A1 does not have anysigni�cant e�ect. Therefore, the luminance data alone constrains the occlusion path to the condition jd0j = 1.� From experiments 4 and 6, we see that it is possible to model occlusions and object direction separately.In experiment 4, the high A2 forces objects path to jd0j = 0, so objects are segmented into small ob-jects/occlusions, with the fronto-parallel bias. In experiment 6, the low A2 makes sure slanted or curvedobjects can exist without segmentation into occlusions. For a good comparison we set A2 to 5 (equal to A0in experiment 4), keeping the fronto-parallel bias equal to experiment 4.� From experiments 6 and 7, the segmentation cost reduces the d ! o error signi�cantly (the o ! d error israised, but it should be kept in mind that there are many more disparities than occlusions in the map, soin absolute sense the number of pixels that gets the correct segmentation into object/occlusion increases).� From experiments 21, 22, and 23, it seems that a fronto-parallel bias equal to the bias in normal DPalgorithm (where A2 = A0) gives best results. In experiment 23 the PSNR is highest, but compared toexperiment 21, the number of correctly detected disparities is lower (see Figure 16).� Experiment 9 took nine hours on a SGI octane machine. The computational load is very high for small�x;�y, and �d. 5 CONCLUSIONSWe have presented a new disparity estimation algorithm, based on deterministic dynamic programming. Themain contribution of this paper is the separation in the modeling of occlusions and slanted objects. The algorithmhandles adjustable resolution, accuracy, and the degree of scene segmentation. It also includes a second ordersmoothing penalty term.

Figure 7: The left image from the synthetic se-quence. Figure 8: The right image from the synthetic se-quence.

Figure 9: The true disparity map. 020

4060

80

020

4060

80120

125

130

135

140

145

150

155

160

165

170

Figure 10: The 3D plot of the true disparity map.

Exp �x �y �d A0 A1 A2 A3 A4 o!o d!d PSNR Figure1 1 2 1 5 1000 1000 0 0 93.8% 96.3% 49.42 1 2 1 5 1000 5 0 0 93.7% 97.8% 49.43 1 2 1 5 1000 0 0 5 82.6% 97.8% 42.94 0.5 1 0.5 5 1000 1000 0 0 95.4% 96.7% 53.1 115 0.5 1 0.5 10 1000 1000 0 0 94.9% 96.9% 52.26 0.5 1 0.5 5 1000 5 0 0 95.1% 98.3% 52.97 0.5 1 0.5 5 1000 5 5 0 93.8% 99.7% 52.7 12,178 0.5 1 0.25 5 1000 5 5 0 92.8% 99.8% 50.89 0.5 1 0.17 5 1000 2 5 5 86.3% 99.8% 51.010 0.5 1 0.5 5 1000 0 5 5 83.3% 99.7% 47.711 0.5 1 0.5 5 5 5 5 0 93.8% 99.7% 52.712 0.5 1 0.5 5 0 5 5 0 93.9% 99.7% 52.413 0.5 1 0.5 5 1000 0 5 10 83.6% 99.3% 43.714 0.5 1 0.5 5 1000 1 5 0 84.8% 99.8% 50.315 0.5 1 0.5 5 1000 1 5 5 85.9% 99.7% 50.116 0.5 1 0.5 5 1000 0 0 0 78.9% 99.7% 42.2 1317 0.5 1 0.5 5 1000 1 0 0 82.1% 99.6% 50.318 0.5 1 0.5 5 1000 1 0 5 83.3% 98.1% 50.719 0.5 1 0.5 5 1000 0 0 5 80.2% 98.9% 45.620 0.5 1 0.5 5 1000 5 0 0 95.1% 98.3% 52.9 1421 0.5 1 0.5 3 1000 5 7 0 94.8% 99.4% 54.0 1522 0.5 1 0.5 3 1000 3 7 0 92.6% 99.6% 52.623 0.5 1 0.5 3 1000 7 7 0 95.6% 99.0% 55.8 16Table 2: Experimental results of the synthetic image pair.

Figure 11: Experiment 4. This experiment resem-bles Cox's ML algorithm. The path model allowsonly fronto-parallel objects. One can see false seg-mentation of objects and occlusions in the centerof the map. Figure 12: Experiment 7. The path model allowsslanted objects and bias toward fronto-parallel ob-jects. Segmentation penalty is also added in themodel.

Figure 13: Experiment 16. The path model allowsslanted objects and has no fronto-parallel bias. Figure 14: Experiment 20. The path model allowsslanted objects and bias toward fronto-parallel ob-jects. No segmentation penalty is added.

Figure 15: Experiment 21. This experiment issame as experiment 7 but with higher segmenta-tion penalty. Figure 16: Experiment 23. The fronto-parallelbias in this experiment is higher than in conven-tional algorithms.

020

4060

80

020

4060

80120

125

130

135

140

145

150

155

160

165

170

Figure 17: The 3-D plot of the estimated disparity �eld from experiment 7.Good results have been obtained with both real and synthetic stereo image pairs. Disparity maps were obtainedwith a physically meaningful segmentation into objects and occlusion. The algorithm allows an adjustable fronto-parallel bias for objects and an adjustable bias for occlusion direction. In the experiments we found that thepresence of the fronto-parallel bias enhances the results. The occlusion direction bias did not have any signi�cantin uence.To enhance these promising results, our future work will focus on new de�nitions for the cost functions inthe algorithm. Currently we are investigating how to transform some assumptions about the 3D world to costfunctions in the disparity space. 6 ACKNOWLEDGMENTThis work has been funded by the European ACTS project PANORAMA and a NATO Collaborative ResearchGrant. 7 REFERENCES[1] Y. Ohta and T. Kanade \Stereo by Intra- and Inter-Scanline Search Using Dynamic Programming," IEEETransaction on Pattern Analysis and Machine Intelligence, vol. PAMI-7, No.2, pp. 139-154, March 1985.[2] J. Liu and R. Skerjanc, \Stereo and motion correspondence in a sequence of stereo images," Signal Processing:Image Communication 5, pp. 305-318, 1993.

[3] E.A. Hendriks and G. Marosi, \Recursive disparity estimation algorithm for real time stereoscopic videoapplications", Proceedings of the International Conference on Image Processing, pp. 891-894, 1996.[4] T. Kanade and M. Okutomi, \A stereo matching algorithm with an adaptive window: theory and experiment,"IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16, No. 9, pp. 920-932, 1994.[5] I.J. Cox, S.L. Hingorani, and S.B. Rao, \A maximum likelihood stereo algorithm," Computer Vision andImage Understanding, vol. 63, No. 3, pp. 542-567, April 1996.[6] P.A. Redert, E.A. Hendriks and J. Biemond, \Synthesis of multi viewpoint images at non- intermediatepositions," Proceedings of ICASSP, M�unchen, Germany, pp. 2749-2752, 1997.[7] D. Tzovaras, N. Grammalidis, and M.G. Strintzis \Depth map coding for stereo and multiview image sequencetransmission," in Proceedings of the International Workshop on Stereoscopic and Three Dimensional Imaging(IWS3DI), Santorini, Greece, pp. 75-80, 1995.[8] S.S. Intille and A.F. Bobick, \Disparity-Space Images and Large Occlusion Stereo," MIT Media Lab PerceptualComputing Group Technical Report No. 220, 1993.[9] European ACTS PANORAMA project, http://www.tnt.uni-hannover.de/project/eu/panorama