Reconstructing 3D independent motions using non-accidentalness

Reconstructing 3D Independent Motions Using Non-Accidentalness

K. E. Ozdena K. Cornelisa L. Van Eyckena L. Van Goolab

aESAT/PSI bComputer Vision Lab.K.U. Leuven ETH

Leuven, Belgium Zurich, Switzerland

Abstract

Reconstructing 3D scenes with independently moving ob-jects from uncalibrated monocular image sequences stillposes serious challenges. One important problem is to findthe relative scales between these different reconstructedob-jects. The perspective reconstruction of a single objectcan only be known up to a certain scale which results ina one-parameter family of relative scales and trajectoriesper moving object. The paper formulates this ambiguityand proposes solutions in the vein of “non-accidentalness”.Two instantiations of its use are analyzed: planar motionand the ’heading constraint’.

1 Introduction

Structure-from-Motion (SfM) has been a popular researchtopic in computer vision for at least two decades now. Mostof this work is restricted to static scenes [1]. A typical scenenaturally contains independently moving objects however.

There has recently been a surge of papers on the de-tection and analysis of reconstructing dynamic scenes.With their multibody factorization method, Costeria andKanade [2] have extended the static scene factorizationmethod of Tomasi and Kanade [3] to scenes with multiple,independently moving rigid objects. Fitzgibbon and Zisser-man [4] demonstrated the usefulness of multiple, indepen-dently moving rigid objects in self-calibration. Machlineet al. [5] used linear subspace constraints to segment a dy-namic scene into its independently moving, but not neces-sarily rigid, components. Wolf and Shashua [16] introducedthe two-body algebraic structure called segmentation matrixwhich was extended by Vidalet al. [6] to multiple movingobjects. Brand [7] and Bregleret al.’s work [8] showed that3D reconstruction is possible even when the object is mov-ing non-rigidly. There is also interesting research where themoving object is supposed to move in a constrained way.Avidan and Shashua [9] investigated the case where a pointis moving on a line or conic. Han and Kanade [10] demon-strated the calibration of a single camera when objects are

moving on a line with constant speed. Sturm [11] analyzedthe case of a stereo camera observing points that are mov-ing on a plane. Wolf and Shashua [12] extended the classi-cal notion of three- to two-dimensional image projection ton-dimensional space for non-rigid scenes.

In this paper, we follow a path similar to Avidan andShashua [9] where a single point is moving on a line or aconic. However, our constraints are geometrically less re-strictive, at the expense of requiring full projection matricesfor the object reconstructions. We will assume that segmen-tation has already been performed (several techniques havebeen proposed to achieve this [2, 5, 6, 13, 14]). Rather thanimposing strict geometric assumptions, we propose anon-accidentalnessprinciple: among all possible relative scales,choose the one for which the reconstruction exhibits someproperties which are unlikely to be encounteredacciden-tally. The non-accidentalness principle has been introducedand successfully exploited first in the area of grouping andobject recognition [18].

3D reconstruction from uncalibrated images has an in-herent scale ambiguity [1]. When there are multiple mov-ing objects in a scene, the relative scale ambiguity betweentheir reconstructions will result in an unrealistic total recon-struction. A one-parameter family of ambiguity exists forevery object with respect to the static background. Con-sider a video of a moving car. Without giving high levelknowledge about the world, a computer algorithm cannotdistinguish between a small toy car flying in front of thecamera or a real car further away on the road. Since allthese reconstructions are equally correct in terms of geome-try, the only way to solve this problem is to introduce furtherassumptions on the scales and/or the trajectories. In this pa-per, we propose two such constraints, as examples of themuch wider principle of non-accidentalness.

The paper is organized as follows. Section 2 discussesthe relative scale problem in more detail. In this section wederive the basic equation on which the non-accidentalnessprinciple is based. Section 3 discusses this principle andanalyzes two such cases which are planar motion and the’heading constraint’. The viability of these ideas is corrob-

1

https://www.researchgate.net/publication/3940516_Two-body_segmentation_from_two_perspective_views?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/2630974_Geometric_Motion_Segmentation_and_Model_Selection?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/2624322_Multibody_Structure_and_Motion_3-D_Reconstruction_of_Independently_Moving_Objects?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/3613080_Multi-body_factorization_method_for_motion_analysis?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/3613080_Multi-body_factorization_method_for_motion_analysis?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/215458778_Kanade_T_Shape_and_motion_from_image_streams_under_orthography_A_factorization_method_Int_J_Comput_Vis92_137-154?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/2609352_Motion_Segmentation_and_Tracking_Using_Normalized_Cuts?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/2532847_Structure_and_Motion_for_Dynamic_Scenes_-_The_Case_of_Points_Moving_in_Planes?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/3193094_Trajectory_Triangulation_3D_Reconstruction_of_Moving_Points_from_a_Monocular_Image_Sequence?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/3193094_Trajectory_Triangulation_3D_Reconstruction_of_Moving_Points_from_a_Monocular_Image_Sequence?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/3854346_Recovering_non-rigid_3D_shape_from_image_streams?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

https://www.researchgate.net/publication/3906057_Multiple_motion_scene_reconstruction_from_uncalibrated_views?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

orated through real life experiments. Section 4 summarizesthe main ideas of the paper and discusses future work.

2 The Relative Scale Problem

As mentioned before, the relative scale ambiguity is causedby the unknown scale factor in the reconstruction of a sin-gle object. To analyze this phenomenon, consider an imagesequence of a scene which is static, except for one rigid, in-dependently moving object. This restriction on the numberof moving objects is introduced to simplify the discussion.The case of several independently moving objects can bedealt with similarly. For every framei of the sequence wecan compute, through uncalibrated perspective SfM tech-niques, the camera’s orientation and position relative to thestatic part of the scene –’the background’ – and with respectto the moving object. The3 × 3 rotation matricesRi

c andRi

x represent these two orientations respectively. The3 × 1translation vectorstic andtix represent these positions. Whatwe would like to find is the rotationRi

o and the translationtio which represent the motion of the object in the scene forevery framei. These transformations and their correspond-ing notation are illustrated in Fig. 1. The relation amongthem can be written as follows:

[

RTx −RT

x tx0 1

]

=

[

RTc −RT

c tc0 1

] [

Ro to0 1

]

(1)

where we have dropped the frame indices for compact nota-tion. T stands for the matrix transpose,I is a3 × 3 identitymatrix and0 is a 1 × 3 vector of zeroes. The left handside states that, to directly transform a point from the ob-ject coordinate system to the current camera coordinate sys-tem, apply the relative transformation defined byRT

x and−RT

x tx. The right hand side of this expression tells us that,alternatively, to transform a point in the object local coor-dinate system to the current camera coordinate system, firstrotate it around the object coordinate system withRo, per-form the actual translationto and as a last step transformit to the current camera coordinate system usingRT

c and−RT

c tc. The rotation and translation parts of Eq. (1) yield

RTx = RT

c Ro (2)

−RTx tx = RT

c to − RTc tc (3)

If we fix the coordinate system of the reconstructions toan arbitrary location, the rotationsRT

x andRTc are unam-

biguous. So we can find exactRTo matrices using Eq. (2).

However, due to the nature of uncalibrated perspective SfM,we cannot extract the translation of the camera with respectto the background and the object at an absolute scale [17].Since only the relative scale is important for us, we can fix

Figure 1: Transformations between the static and the dy-namic parts of the scene.

the scale for the background. However, there still remainsthe relative scale of the moving objects. Given the fact thattx is only known up to a relative scales, Eq. (3) will give adifferentto for every scale.

s(−RTx tx) = RT

c tos − RTc tc (4)

wheretos is the object trajectory at scales. Supposeto rep-resents the true object trajectory. Merging equations (3) and(4), and multiplying both sides withRc yields the followingrelation between the true trajectory of the objectto and thecomputed trajectorytos with an incorrect, relative scale fac-tor s 6= 1.

tos = sto + (1 − s)tc (5)

Hence, the object translationtos which one finds at the rel-ative scales is a linear interpolation of the true object trans-lation to and the camera translationtc. This leads to a one-parameter family of object trajectoriestos for different rel-ative scale valuess, all equally compatible with the imagedata. Whens = 1, tos equals the trueto and whens getscloser to zero,tos evolves towards the camera path. For dif-ferent values ofs other than1, tos will always undergo theinfluence of the camera motiontc.

A further analysis shows that such a coupling exists notonly for the translation components of the object motion(the path of the origin of the object coordinate system) butalso for any point attached to the object coordinate system.Assumep0 is the position of a point on the object in the firstframe. Its positionp in framei can be written as:

p = Rop0 + to (6)

in which the frame indexi is dropped again to simplify thenotation. Similarly, its scaled versionps moves accordingto the following transformation:

ps = Rop0s + tos (7)

2

https://www.researchgate.net/publication/2550166_A_Factorization_Method_for_3D_Multi-body_Motion_Estimation_and_Segmentation?el=1_x_8&enrichId=rgreq-09b6eabc-2250-4f97-b9e6-923b38c4389d&enrichSource=Y292ZXJQYWdlOzIyMTM2NDU1NDtBUzoxNDUyMTMwNDE0MTgyNDVAMTQxMTYzMjg5NTczNA==

Without loss of generality, we can assume the fixed worldcoordinate system to be attached to the initial camera pose(this is also assumed for the rest of the paper). Then,p0

s isequal tosp0. Introducing this fact and Eq. (5) into the aboveequation yields:

ps = s(Rop0 + to) + (1 − s)tc (8)

and by incorporating Eq. (6), it can be written as

ps = sp + (1 − s)tc (9)

which is similar to Eq. (5).However, the input coming from the uncalibrated SfM

software is notto or p but sometos or ps (as defined byeqs. (5) and (9)). So one should reverse the procedure, tofind the true object trajectory and corresponding relativescale. If one bringsp to the left side of Eq. (9), it becomes:

p = mps + (1 − m)tc (10)

wherem = 1/s. A similar expression can also be writtenfor Eq. (5). The correctm would remove the additive com-ponents oftc in ps or tos. To find suchm is the topic of thenext section.

3 The non-accidentalness principle

It is clear from Eq. (5) that while computing the objecttrajectories, there is a systematic addition of the cameratranslation at wrong relative scales. Consequently, we ex-pect the computed object motion to lose special properties,such as planarity, periodicity, piecewise linearity, constantspeed, etc., at wrong scales. This is the basis of our non-accidentalness principle. To demonstrate it, consider therobot arm in Fig. 2. Its trajectory consists of three straightline segments which are perpendicular to each other. Fig. 3shows the reconstructed ball trajectory for different relativescales including the correct one. As we get further awayfrom the correct scale, the trajectory loses its simple shape.In this paper, only two of the many possible non-accidentalproperties will be explored: the ‘planarity constraint’ andthe ‘heading constraint’.

3.1 The planarity constraint

In this section, the existence of a planar trajectory amongthe one-parameter family of relative scales, is taken as anindication of non-accidentalness.

The usefulness of the planarity constraint depends on thecoupled nature of the object and the camera trajectory. Adegenerate case occurs when the camera moves parallel tothe plane of the object motion or when there exists an affine

Figure 2:Several frames from the robot sequence. The robottranslates the ball through the static scene. The hand-heldcamera moves around while taking the images.

Figure 3: The ball trajectory as it is found from uncali-brated SfM for different relative scales, and between whichit cannot decide. The applied relative scale factorsm are0.20, 0.25, 0.30, 0.36, 0.40, 0.45 from left to right. 0.36 isthe correct relative scale (middle right).

3

transformation between concurrent camera and object po-sitions. A more extensive discussion of these degeneratecases is given elsewhere [19] .

There are several techniques to impose the planarity cri-terion among the one-parameter family of possible objecttrajectories. Principal Component Analysis (PCA) is oneway to achieve it. From equations (2), (3) and (6) we findthe trajectory of a point on the object as:

pi = tic + Riomp0

s − Riomtixs (11)

wherep0s = sp0, txs = stx, andm = 1/s. Such substitu-

tions are necessary since we only know some scaled versionof these vectors. We are looking for the scale value whichmakes the left hand side the most planar. As a planaritycriterion, we look for the minimum of the determinant ofthe corresponding ‘covariance matrix’ computed from the 3components of the object trajectoryto. The determinant ofthis matrix indicates the volume of the space spanned by theprincipal components which should be zero in the case of aplanar trajectory. However due to the effect of noise, thisdeterminant will never be zero so we prefer the minimiza-tion of this determinant over finding its roots directly.

First, we translate both sides of Eq. (11) to their mean.We get an expression of the formmAi + Bi in which Ai

represents the mean-shiftedRiop0

s−Riotixs andBi represents

the mean-shiftedtic. The expression we want to minimizeis:

V = det(

#points∑

i=0

(mAi + Bi)(mAi + Bi)T ) (12)

This is a polynomial of6th degree inm. To minimize it, onecan take the derivative, and find the roots with numericalmethods. Among the roots the one which gives the minimalvalue is chosen.

If the object only translates on a plane and rotates aroundan axis orthogonal to that plane, i.e.planar motion[1], theabove solution always works since every point on the objectmoves on a plane. But we are interested in more generalcases where only a part of the object is moving planarly. Asan example, take a ball that is rolling on a planar surface.None of its observable points is performing a purely planarmotion, although the global trajectory of the ball is planar.Only the centroid is moving planarly. The average of theobservable points will not coincide with the real centroid.To find the real centroid and the relative scale, we proposean iterative approach where the point cloud centroid and thePCA solution form are used as initial values. Starting fromthese data, a consecutive refinement step tries to simultane-ously come up with an enhanced relative scale and a pointwith maximally planar motion for that scale. Hence, we

have to solve for four unknowns, namely the values for thethree coordinates ofp0

s – the centroid in this case – andm.We used Levenberg-Marquardt (LM) to minimize the ra-tio of the third to the second eigenvalue of the covariancematrix in order to find the point that yields the maximal pla-narity.

To demonstrate the usefulness of this approach we exper-imented with video data of a ball rolling on a ground plane.Fig. 5 demonstrates the results. Two possible trajectoriesof the ball and their reconstructions are shown in front ofthe reconstructed background. The ball at the wrong scaleis indicated by a square around it, and the correctly scaledone is shown with a triangle. As can be seen from the bot-tom image of Fig. 5, the latter trajectory is close to planarand parallel to the ground plane. The other trajectory is notplanar at all and the ball is found flying in the air.

Figure 4: Several frames from the original ball sequencewhich is 400 frames long.

3.2 The heading constraint

Many types of moving objects, such as humans, cars, bikesetc. have a natural frontal side and therefore heading direc-tion. Hence, these heading directions or vectors are usuallyparallel to the tangent of the object trajectory. If not, theobjects would undergo strange motions like cars going intoa skid. This does not usually happen. The mathematicalequation describing this ’heading constraint’ is:

lijRijo vi

o = vjo (13)

whereRijo is the rotation of theobjectfrom framei to frame

j. lij is a scale factor.vio is the tangent to the object’s

trajectory at framei which can be approximated by:

vio = gi+1

o − gi−1o (14)

4

Figure 5: The reconstructed scene for the ball sequencefrom different views. The rectangle denotes the un-scaledball reconstruction and the triangle denotes the scaled ver-sion of the ball reconstruction.

wheregio is the position of the centroid of the object atith

frame. This is a valid approximation since we generally usevideo sequences with their relatively high frame rates. Asimilar expression is also used for the approximation of thecamera velocities.

Eq. (5) states that the object trajectory will contain com-ponents from the camera translation for the wrong relativescales. Just as they may destroy planarity, they would alsotend to lead to the violation of the heading constraint. How-ever, also here, there are degenerate cases where this con-straint can not help us. In such cases, Eq. (13) will holdfor every relative scale. To analyze such cases, let us applyequations (14) and (13) together with a relative scales. Bychanging all subscriptso to os, the following equation re-sults if the heading constraint is to hold for other, incorrect

scaless:

kRijo (gi+1

os − gi−1os ) = gj+1

os − gj−1os (15)

wherek is a scale factor. Then we introduce Eq. (9) to comeup with the following equation:

kp = q (16)

where

p = Rijo (s(gi+1

o − gi−1o ) + (1 − s)(ti+1

c − ti−1c ))

q = s(gj+1o − gj−1

o ) + (1 − s)(tj+1c − tj−1

c )

By introducingvio, vi

c, vjo, vj

t , Eq. (16) turns into :

k(sRijo vi

o + (1 − s)Rijo vi

c) = svjo + (1 − s)vj

c (17)

If the above equation holds for values ofs other than 1, onehas a degenerate case. If we insert Eq. (13) into the aboveequation and leave the termRij

o vio on the left side, we come

up with

s

1 − s(k − l)Rij

o vio = vj

c − kRijo vi

c (18)

which should still hold for different choices ofi andj. Theleft hand side spans a line passing through the origin for dif-ferent values ofs. This results in a constraint on the righthand side thatvj

c is a constant multiple ofRijo vi

c. For adegenerate case to occur, this constraint should hold for ev-ery frame pair which means that all the camera translationsshould be determined by the object rotation. Fortunately,this is really hard to find in real life except for some simplemotion cases. One such example is the case where both thecamera and the object move on a line and the directions ofthese lines are in keeping with Eq. (18) .

Let us return to the actual use of the constraint. Giventwo frames, finding the relative scale amounts to solvinga polynomial equation which is formulated next. MergingEq. (10) and Eq. (14) yields:

vio = mvi

os + (1 − m)vic (19)

The expression we want to maximize is coming from theheading constraint in Eq. (13). For the corresponding paral-lelism of velocity vectors to hold, we can maximise thecosbetween the vectors:

atb = cos(Rijo vi

o, vjo) (20)

where

a =Rij

o (mvios + (1 − m)vi

c)√

(mvios + (1 − m)vi

c)T (mvi

os + (1 − m)vic)

(21)

b =mvj

os + (1 − mvjc)

√

(mvjos + (1 − m)vj

c)T (mvjos + (1 − m)vj

c)(22)

5

This is the scalar product of twonormalizedvectors and ithas the form of a rational polynomial. One can maximizethe square of the cosine expression in Eq.( 20) in case ofan image sequence where the object suddenly decides to gobackwards somewhere in the sequence. We discarded suchrare cases to simplify the solution.

Solving for m with different framesi and j results indifferentm’s. One reason is the fact that an object may notalways follow its heading perfectly. For example a personmay twist his torso for a few frames. Such cases should betreated as outliers and we can use Eq. (20) in a RANSAC [1]scheme for a robust estimation ofm. Therefore, several ran-dom choices ofi andj were made, and them with maximalsupport was chosen, i.e. depending on how many otheri, jpairs have a super-threshold value for thatm according toEq. ([?]) .

Another problem we saw in our experiments is that ‘ob-jects’ like humans obey the heading constraint globally butnot instantaneously, e.g. during a single step. The centerof gravity of the torso oscillates between left and right atthis level of granularity. To avoid that, while calculatingvelocities, especially for human gait, we suggest to use theformula

vi = ti+n − ti−n (23)

wheren depends on the speed of the person and the sam-pling rate of video. We can estimate a good value forn dur-ing RANSAC random sampling, as an additional parameterto be estimated.

At the end, an additional refinement step is included onthem value supplied by RANSAC. The selection of the op-timal m is based on the values for which the inliers mini-mize an error functional of the form

#inliers∑

angle2(Rijo vi

o, vjo) (24)

around the initial value ofm. The reason for using anglesat this stage rather than cosines as in Eq. (20) is the fact thatangles are geometrically more meaningful.

Figure 6:Three frames from the original walking sequenceconsisting of 61 images.

To demonstrate the usefulness of our algorithm, werecorded the video of a person walking in front of a static

Figure 7:Reconstruction of the walking sequence in Fig. 6from different virtual camera positions. The right and theleft column depict, resp., the reconstruction applying andincorrect and the correct relative scale factor, with the latterfound using the heading constraint. The ellipses in the rightcolumn encircle the tiny reconstructed person, obviously atthe wrong scale.

scene. Several frames from the original61 frame sequenceare shown in Fig. 6. We reconstructed the background andthe person separately using our sequential perspective SfMsoftware. We were only able to reconstruct the upper partof the body since the lower parts are non-rigid and do nothave enough features. The optimaln we found is8, whichis approximately the duration of a single step of the person.The results are shown in Fig. 7. Every row correspondsto a different virtual camera position. They depict the re-constructed 3D world at the time that the middle frame ofFig. 6 was shot. The right column corresponds to the re-construction at an incorrect relative scale. The circle sur-rounds the tiny human at the wrong scale. The left col-umn shows the result after we applied the relative scale wefound using the heading constraint. Note that for the cam-

6

era positions corresponding to the original images, both leftand right images will look the same. As can be seen, therelative scale we found with the heading constraint givesquite realistic results. To further corroborate the usefulnessof the non-accidentalness assumption, we applied the pla-narity constraint to the same sequence and came up with aclose solution. They differed less than 3 percent in terms ofscale [19].

4 Conclusion

Reconstructing scenes containing independently movingobjects remains a big challenge for the computer visioncommunity. One problem which did not receive much atten-tion is the relative scale ambiguity between the reconstruc-tions of different, independently moving parts of the scene.In this paper, we gave a simple formulation for the prob-lem and demonstrated that at wrong relative scales, objecttrajectories would contain components of the camera trans-lation. If the object motion has some special property, suchas planarity, periodicity etc., we expect those propertiestobe lost at wrong relative scales due to the effect of cameratranslation. This observation forms the basis of our ”non-accidentalness” principle which states that among severalmathematically correct solutions, if one of them results inan object motion with special properties, this is not acciden-tal and therefore, it is very likely that we are at the correctscale. We demonstrated the use of two such properties. Oneof them is the assumption of planar motion since many ob-jects move on a plane. In this approach, we simultaneouslysearch for a point on the object and the scale which makesthis point’s motion planar. As a second property, we in-vestigated the heading constraint which is based on the factthat objects generally move in their heading direction. Wedemonstrated the applicability of both techniques with reallife experiments.

Future work will focus on the automatic selection andpossible combination of multiple non-accidental properties.Even if a single non-accidentalness criterion may not work,eg. when a camera travels in a line parallel to that of the ac-tion, a combination of several like the two proposed might.So, in the end the system would need to explore a battery ofspecial properties and arbitrate.

Finally, it is worthwhile mentioning that Eq. (5) givesrise to a second scale selection criterion, the ‘independencecriterion’, as an alternative to non-accidentalness. Moreonthis can again be found in [19].

Acknowledgement: The authors gratefully acknowledgesupport by the Flemish Fund for Scientific Research and theK.U.Leuven Research Council GOA project ‘MARVEL’.

References[1] R. Hartley and A. Zisserman.Multiple View Geometry. Cam-

bridge University Press, 2000.[2] J. Costeira, T. Kanade, A Multi-Body Factorization Method

for Motion Analysis. ICCV, pages 1071-1076, 1995.[3] C. Tomasi and T. Kanade. Shape and motion from image

streams under orthography: a factorization method.IJCV,9(2):137-154, November 1992.

[4] A.W. Fitzgibbon, A. Zisserman. Multibody structure andmo-tion: 3-D reconstruction of independently moving objects.ECCV, pages 891-906, 2000.

[5] M. Machline, L. Zelnik-Manor and M. Irani. Multi-body seg-mentation: Revisiting motion consistency.ECCV Workshop onVision and Modeling of Dynamic Scenes, Copenhagen, 2002.

[6] R. Vidal, S. Soatto, Y. Ma, and S. Sastry. Segmentation ofdy-namic scenes from the multibody fundamental matrix.ECCVWorkshop on Vision and Modeling of Dynamic Scenes, Copen-hagen, 2002

[7] M. Brand. Morphable 3d models from video.CVPR, 2001.[8] C. Bregler, A. Hertzmann, and H. Biermann. Recovering non-

rigid 3d shape from image streams.CVPR, 2000.[9] S. Avidan, A. Shashua. Trajectory triangulation: 3D re-

construction of moving points from a monocular image se-quence.PAMI 22(4):(348-357), 2000.

[10] M. Han, T. Kanade. Multiple motion scene reconstructionfrom uncalibrated views.ICCV, pages 163-170, 2001.

[11] P. Sturm. Structure and motion for dynamic Scenes:The caseof points moving in planes.ECCV, Copenhagen, 2002.

[12] L. Wolf and A. Shashua. On projection matrices Pˆk→Pˆ2, k=3,...,6, and their applications in computer vision.ICCVVancouver, pages 412-419, July, 2001

[13] P. H. S. Torr. Geometric motion segmentation andmodel selection.Phil. Trans. Royal Society of London A,356(1740):1321-1340, 1998.

[14] J. Shi, J. Malik. Motion segmentation and tracking using nor-malized cuts.ICCV, 1998.

[15] T. M. Cover and J. A. Thomas.Elements of Information The-ory. John Wiley & Sons, 1991.

[16] L. Wolf and A. Shashua. Two-body segmentation from twoperspective Views, ICCV, pages 263-270, 2001.

[17] R. Vidal, S. Soatto, Y. Ma, and S. Sastry. A FactorizationMethod for 3D Multi-body Motion Estimation and Segmenta-tion, 2002.

[18] D. Lowe, 3D object recognition from single 2D images, Ar-tificial Intelligence, Vol. 31, pages 355-395, 1987.

[19] K.E. Ozden, K. Cornelis, L. Van Eycken, L. Van Gool. Re-constructing 3D trajectories of independently moving objectsusing generic constraints,CVIU special issue on Model-basedand Image-based 3D Scene Representation for Interactive Vi-sualization, 2004.

7

Reconstructing 3D independent motions using non-accidentalness

Documents

Transcript of Reconstructing 3D independent motions using non-accidentalness