Matching segments in stereoscopic vision

6
W e have built a stereo- scopic sensor and imple- mented algorithms that obtain 3D coordinates of objects from images. This system aids localization and the navigation of a mobile robot. The sensor uses two small cameras oriented vertically to form a stereoscopic vision system for the mobile robot. We show that it is possible to match segments from the top and bottom images without us- ing an epipolar constraint. Stereoscopic Sensor and Image Processing The stereo sensor head possesses two small cameras whose characteristics aren’t known. These low-cost cameras give de- graded quality images. The two cameras rig vertically (see Fig. 1). The stereo head mounts on top of a mobile robot. The vision field of the bottom camera points toward the floor, the top cam- era points horizontally. The two vision fields partially overlap (see Fig. 2). Calibration characterizes the cam- eras by estimating the intrinsic and ex- trinsic parameters of each one. For that, two types of models are considered. The first model is a pinhole camera model that neglects all optical distortion [2]. The second model accounts for radial distortion [9]. The calibration opera- tion requires a set of image points whose real coordinates are known [5]. For the calibration, we use the two patterned planes seen in Fig. 2. Equations (1) and (2) express the relationships between the camera’s coordinates and the image’s coordinates of a point for the linear and nonlinear pinhole models, respectively: u u k x z v v k y z u c c v c c = + = + 0 0 , (1) March 2001 IEEE Instrumentation & Measurement Magazine 37 1094-6969/01/$10.00©2001IEEE © 1997 Digital Stock Humberto Loaïza, Jean Triboulet, Sylvie Lelandais, and Christian Barat

Transcript of Matching segments in stereoscopic vision

We have built a stereo-scopic sensor and imple-mented algorithms thatobtain 3D coordinates of

objects from images. This system aidslocalization and the navigation of amobile robot. The sensor uses twosmall cameras oriented vertically to form a stereoscopic visionsystem for the mobile robot. We show that it is possible tomatch segments from the top and bottom images without us-ing an epipolar constraint.

Stereoscopic Sensorand Image ProcessingThe stereo sensor head possesses two small cameras whosecharacteristics aren’t known. These low-cost cameras give de-graded quality images. The two cameras rig vertically (see Fig.1). The stereo head mounts on top of a mobile robot. The visionfield of the bottom camera points toward the floor, the top cam-

era points horizontally. The two visionfields partially overlap (see Fig. 2).

Calibration characterizes the cam-eras by estimating the intrinsic and ex-trinsic parameters of each one. For that,two types of models are considered.The first model is a pinhole camera

model that neglects all optical distortion [2]. The secondmodel accounts for radial distortion [9]. The calibration opera-tion requires a set of image points whose real coordinates areknown [5]. For the calibration, we use the two patternedplanes seen in Fig. 2.

Equations (1) and (2) express the relationships between thecamera’s coordinates and the image’s coordinates of a pointfor the linear and nonlinear pinhole models, respectively:

u u k xz

v v k yz

uc

c

vc

c

= + = +0 0 ,(1)

March 2001 IEEE Instrumentation & Measurement Magazine 37

1094-6969/01/$10.00©2001IEEE

© 1997 Digital Stock

Humberto Loaïza,Jean Triboulet,

Sylvie Lelandais,and Christian Barat

u u k xz

k ruc

cd= + +0

21( ),(2a)

and

v v k yz

k rvc

cd= + +0

21( ),(2b)

where r x yz

c c

c

22 2

= + , x y zc c c, ,

are coordinates of a point inthe camera system, u, v arepixel coordinates in the retinalplane, u0, v0 are image centercoordinates, ku , kv are row andcolumn focal length, and kd isa radial distortion parameter.

The image processing cor-rects the cameras’ distortionsand extracts segments from the images. We have retainedstraight-line segments as primitives, because they are easilydetected. We can compute many descriptors associated withthis kind of primitive and obtain a function that is very sensi-tive to variations in the descriptors. The different steps of im-age processing are [10]:

◗ Image enhancement—Correct the radial distortion by us-ing a bidimensional lookup table and give a high con-trast to the image using histogram specification.

◗ Optimal filtering—The gradient method obtains edgepoints. The gradient components are computed by sepa-rable filters, as Deriche defined them.

◗ Local maximum—Obtain thin edges by extracting edgepoints with a gradient mag-nitude value greater thantheir neighbors.

◗ Threshold computation—Im-plement an algorithm that ob-tains two threshold values.

◗ Hysteresis segmentation—Compute both the remain-ing edge points and thebuilding of the segments.The cameras gave the two

indoor images in Fig. 3(a) and(b). Fig. 3 also shows the segments (c) and (d) computed on itafter all the processing. The number of segments is sufficientto compute most 3D coordinates of real objects.

Characteristics of SegmentsAfter we obtain the segments, we calculate values of thedescriptors on each one. We chose four kinds of descriptors:geometric, gray scale, textural, and neighborhood features.During image processing and segment building, we com-pute both the geometric parameters and the mean gradientvalue. Fig. 4 shows pixel areas to the right and left of the seg-ments defined for the other gray scale features and for thetextural features.

We compute the mean gray scales of these areas, the inter-nal contrast and the gray scale differences in four directions(0°, 45°, 90°, and 135°) that represent local variation. These tex-tural parameters come from a simplified co-occurrence matrixthat obtains a result quickly [6]. To compute neighborhoodfeatures, we use the “buckets” method with sliding windowsto optimize the number of neighbors for one segment. Conse-quently, each segment is described by 16 parameters [1].

Next we compare a couple of segments, one from the topimage and the other from the bottom image, and decide if theyrepresent the same part of the visual scene. We create twoclasses to do that. Members are pairs of segments that aregood or bad, matched respectively. A 16-component vector isassociated to each member of one class. These 16 variables( )x i

are computed as the differences between the parameters ofeach segment of the pair:

x

x

x

x

x

x

upp

upp

upp

1

2

16

1

2

16

… …

=

x

x

x

low

low

low

1

2

16

…,

(3)

where xuppi , xlow

i , feature segments from the upper and thelower images, respectively. Table 1 shows these variables. To

38 IEEE Instrumentation & Measurement Magazine March 2001

Fig. 1. The stereoscopic sensor with the two cameras.

Supérieure

Inférieure

Fig. 2. The mobile robot and two views of the calibration pattern.

The best classifier for a given

task can be found by comparing

several different criteria:

classification error, computational

complexity, and hardware-

implementation efficiency

test the quality of these variables wehave built four data bases, each oneof 660 members. Bases 1 and 2 usesegments from very textured im-ages. Bases 3 and 4 use segmentsfrom poorly textured images. Ele-ments of bases 1 and 3 are from class1, and elements of bases 2 and 4 arefrom class 2.

The PatternRecognition ProblemNow we search for the best classi-fier. Several methods might solvethe classification problem. Thechoice depends on the applicationconstraints and the a priori knowl-edge of both the input data andphysical phenomena. Generally, thebest classifier for a given task can befound by comparing several differ-ent criteria: classification error, com-putational complexity, and hard-ware- implementation efficiency [7].Four well-known methods havebeen compared: the Bayesianmethod, linear discriminant analysis(LDA), quadratic discriminant anal-ysis (QDA), and multilayer percep-tron (MLP) [3].

Table 2 shows the classification performances of these fourmethods. The Bayes and the MLP methods give the best meanresults, but the Bayes method is better for class 1 and the MLPmethod works best for class 2. We combine these two classifiersto improve the results in both classes.

Combination of Two ClassifiersBefore looking for a combination rule, we must reduce thenumber of variables. This step is necessary because 16 vari-ables are too numerous to develop a real-time application; it isalso possible that the variables are correlated. We tested threeapproaches and found that eight variables are preferable [10].The eight components of the vector are: x 1, x 2, x 4 , x 5, x 6, x 8, x 9,and x 14 (see Table 1). Note that these variables come fromthree kinds of features: geometrical, gray scale, and textural.

The sum rule is the method that combines the classifiers[8]. We assumed that the representation is conditionally statis-tically independent:

Assign: Z wk j→ if

P w P w P w P wB j k N j k i

n

B j i N

candidat

( | ) ( | ) max[ ( | ) (x x x+ = += 1 j i| )]x ,

(4)

where P wB j k( | )x and P wN j k( | )x are a posteriori probabilitythat the couple Zk , described by the eight-component vector,

belongs to class w j . They derive from Bayesian and neuralclassifiers. Table 3 presents the results obtained on the data-bases using this combination scheme.

Using the two methods gives better mean results than us-ing only one. It is possible to confirm an answer by onemethod with the same answer by the other method. We in-crease the percentage of good matching in the two classes.

March 2001 IEEE Instrumentation & Measurement Magazine 39

Y

Left Area

SegmentRight Area

X

Fig. 4. Definition of the areas on the left and on the right of a segment.

(a) (b)

(c) (d)

Fig. 3. Corrected images and detected segments: (a) and (c) are the top images; (b) and (d) are the bottom images.

Matching AlgorithmClassical matching algorithms use an epipolar constraint tofind candidates for building a couple of primitives [4]. Ournew algorithm is built from two simple, but robust facts. First,it uses a decision function arising from the fusion of two effi-cient discrimination methods. This function determines if acouple of segments form a correct match. Second, neighbor-hood information used by the algorithm reduces significantlythe field of research for the possible matching. Actually,neighboring segments from a matched couple form a reducedspace and a probable one that allows a new match.

The initialization problem is not trivial. Taking into ac-count the particular configuration and environment of oursensor, vertical segments can present a correct answer. Thesevertical segments can be detected and matched reliably be-cause they fill similar horizontal positions in high and low im-ages. Moreover, they constitute a reduced set of detectedsegments. Consequently, they form an appropriate subset forthe initialization step in the matching program.

When the extraction and characterization program pro-vides lists of detected segments in both images, two lists ofvertical segments are extracted and sorted in relation totheir horizontal image coordinates. Then, the matching al-gorithm begins.

To the vertical segments from the high and low imageswith a coordinate difference of| |v v vms mi− < ∆ pixels, the rulegiven by (4) applies. If several candidates are present, a sec-ond rule applies:

add Zk as a potential element of w j class if

P w P wB j k N j k( | ) . ( | ) .x x≥ ≥0 5 0 5or . (5)

A list of correctly matched vertical segments is then ob-tained. During the matching step, we estimate the results be-tween the segment from the high image and all the neighborsof the matched segment in the low image. Matching that satis-fies rule (5) constitutes the list of matching candidates be-tween high and low images. The final result of the programrepresents a list of matched segments, with their a posterioriprobability obtained using Bayesian and neural classifiers.

When no vertical segments are available, it is possible touse the same algorithm with horizontal segments and re-search their neighbors. Only in the worst case, with no verticalor horizontal segments, we use the epipolar constraint to findthe potential candidates.

Results of MatchingTable 4 shows the matching list obtained by the algorithm onthe segments in Fig. 3. Of these 11 matching results, ten arecorrect (over 90% of correct matching). But the couple (22,17)corresponds to a bad pair. We have tested this algorithm onten other samples, and the correct matching rate is over 95%.

Reconstruction PrincipleFig. 5 illustrates the method that performs 3D reconstruction.We start with a couple of matched segments. “P1” and ”P2” arethe start and end of the segment in bottom image and “P3” and

40 IEEE Instrumentation & Measurement Magazine March 2001

Table 1. Definition of the 16 Components of theVector for a Pair of Segments.

Variables Computed as the difference between:

x1 Vertical coordinates of segments’ centers

x2 Horizontal coordinates of segments’centers

x3 Gradient intensities along the segments

x4 Segments’ orientations (in degrees)

x5 Inside contrast in the left areas

x6 Inside contrast in the right areas

x7 Means of gray scales in the right areas

x8 Means of gray scales in the left areas

x9 Gray scales variations at 0° in the rightareas

x10 Gray scales variations at 45° in the rightareas

x11 Gray scales variations at 90° in the rightareas

x12 Gray scales variations at 135° in the rightareas

x13 Gray scales variations at 0° in the leftareas

x14 Gray scales variations at 45° in the leftareas

x15 Gray scales variations at 90° in the leftright areas

x16 Gray scales variations at 135° in the leftareas

Table 2. Classification Results: Success forClass k = Dk., Class 1 = Good Matching,

Class 2 = Bad Matching.

D1 D2 Mean result

LDA 63.39 54.54 58.97

QDA 97.3 82.7 90

MLP 91 93.6 92.3

Bayes 97.3 90 93.65

Table 3. Classification Results with the Sum Rule.

Class 1 Class 2 Mean result

Sum (Bayes,neural) 93.75 96.36 95.06

“P4” correspond with the top image. Forthe bottom image, we find two points“P1” and “P2” that satisfy the condition,P1∈O pb 1 and P2∈O pb 2, with thesestraight-line equations, which are givenby the internal model of the camera.

We do the same thing for finding“P3” and “P4.” So it’s possible to buildtwo planes, one passing through Ob , P1,and P2 and the other through Ot, P3,and P4. Then, we find the equation ofthe intersection straight line (Dinter onFig. 5) between these two planes. Theend points of the reconstructed seg-ments are given by the intersection be-tween the precedent straight line andO pt 3 or O pt 4 .

Accuracy of 3DReconstructionTo analyze the accuracy in determiningthe 3D object’s position, we performedthe experimentation shown in Fig. 6.We put three objects on a horizontalplane and measured out the relative po-sition between them. Three distanceswere chosen: d1 = distance (P1, P2) isequal to 44 cm, d2 = distance (P3, P4) isequal to 29.6 cm, and d3 = distance(P P1 2;P P3 4) is equal to 51 cm. Note thatd3 is along the z-axis.

We performed six acquisitions withdifferent positions of the sensor. Fig. 7presents the error obtained on thesethree distances. The errors are less than5% when the objects are near the calibration volume sensor (ex-periences 1, 2, and 3). When the sensor is far away from the ob-jects, the errors grow very quickly. The error on the zcoordinate is the most important.

The aim of our application is to localize a mobile robot inan indoor environment. An error of about 5 cm, when the ob-ject is about 1.8 m from the sensor, is not a problem.

SummaryWe have shown that it’s possible to realize a stereoscopic sen-sor with poor cameras. We developed image processing that isrobust and allows us to quickly obtain results for the matchingalgorithm. We computed an important number of features oneach segment, and with these features, we built a16-component vector used in the classification step. After anexhaustive study, we decided to combine two methods,Bayesian and neural, to construct an efficient classifier. Thetests for indoor images had better than 90% good matching.With segment couples, it is possible to compute the 3D coordi-nates of the objects. Therefore, the mobile robot is able to local-ize and move about in the environment.

References

[1] T. Asano, M. Edahiro, H. Imai, and K. Murota, “Practical use of

bucketing techniques in computational geometry,” in

Computational Geometry. G. Toussaint, Ed. North-Holland, 1985,

pp. 153-195.

[2] N. Ayache and O.D. Faugeras, “Maintaining representation of the

environment of a mobile robot,” in Proc. IEEE Int. Symp. Expo.

Robots, Esprit Projet, 1989.

[3] C. Barat, H. Loaiza, E. Colle, and S. Lelandais, “Neural and

statistical classifiers—can these approaches be complementary?”

accepted to IMCT 2000 Conference, Baltimore, MD, May 2000.

[4] O. Faugeras, Three-Dimensional Computer Vision. Cambridge, MA:

MIT Press, 1996.

[5] W.I. Grosky and L.A. Tamburino, “A unified approach to the

linear camera calibration problem,” IEEE Trans. Pattern Anal.

Machine Intell., vol. 12, no. 7, pp. 663-671, July 1990.

[6] R.M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features

for image classification,” IEEE Trans. Syst. Man Cybernetics, vol. 3,

pp. 610-621, Nov. 1973.

March 2001 IEEE Instrumentation & Measurement Magazine 41

P3

P3

P4

P4

P1 P2

P1P2

Ob

Plane 1Plane 2

Dinter

P3

Ot P4

P1

Ob

P2

Bottom Camera

Top Camera

Fig. 5. Geometry for 3D reconstruction.

Table 4. Results from Matching Segments from Fig. 5(c) and (d).

Badmatching

Segments matched A posteriori probabilities

Top image Bottomimage Bayes Neuronal

1 2 0.999 0.945

4 5 0.970 0.992

6 1 0.812 0.982

20 18 0.884 0.948

21 19 0.801 0.985

14 10 0.772 0.985

13 9 0.934 0.970

15 11 0.872 0.986

X 22 17 0.999 0.963

19 14 0.989 0.965

18 13 0.515 0.992

[7] L. Holmström, P. Koistinen, J. Laaksonen, and E Oja, “Neural and

statistical classifiers—taxonomy and two case studies,” IEEE

Trans. Neural Networks, vol. 8, no. 1, Jan. 1997.

[8] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On combining

classifiers,” IEEE Trans. Pattern Anal. Machine Intell., vol. 20, no. 3,

pp. 226-238, 1998.

[9] R.K. Lenz and R.Y. Tsai, “Techniques for calibration of the scale

factor and image center for high accuracy 3-D machine vision

metrology,” IEEE Trans. Pattern Anal. Machine Intell., vol. 10, no.

5, Sept. 1988.

[10] H. Loaiza, J.Triboulet, S. Lelandais, and C. Barat, “A new

method for matching segments in stereoscopic vision,” presented

at the International Workshop on Virtual an Intelligent

Measurement Systems, Annapolis, MD, April 2000.

Humberto Loaïza received his electrical engineering andM.Sc. in automation degrees from the University of Valle,Cali, Colombia, in 1990 and 1993, respectively. In 1999, he re-ceived his Ph.D. in robotics from the University of Evry Vald’Essonne, France. He is Professor at the School of Electricaland Electronic Engineering of the University of Valle. His cur-

rent research interests include image and signal processing,artificial vision and mobile robotics.

Jean Triboulet was born in Paris, France, in 1966. He receivedhis Ph.D. degree in robotics from Evry University in 1996. Heis currently working at CEMIF Complex Systems Laboratory,a joint laboratory with Evry University and CEA (French Nu-clear Organization). His major fields of interest are sensor cali-bration, 3D vision, and image processing for telerobotics andmobile robots.

Sylvie Lelandais received her Ph.D. in automatic and signalprocessing from the Technological University of Compiègne,France, in 1984. Her thesis was about 3D numerization. From1985 to 1990, she worked in an image processing laboratory in aschool of engineering at Nantes, France. Now, she is with theCEMIF Complex Systems Laboratory at Evry. Her current re-search interests include image processing, texture analysis,shape from texture, wavelets, and vision for robotic and bio-medical image processing.

Christian Barat was born in Nice, France in 1969. He receivedhis Ph.D. in robotics from the University of Evry Vald’Essonne in 1996. He is working at CEMIF laboratory as anAssistant Professor. His research interests are intelligent sen-sors based on video cameras, ultrasonics, and laser range find-ers. The main application field is mobile robotics.

42 IEEE Instrumentation & Measurement Magazine March 2001

14

12

10

8

6

4

2

01 2 3 4 5 6

Fig. 7. Error computation on d1 (red), d2 (green), and d3 (blue). The error valueis given in centimeters.

d2

P3 P4

d3

P1 P2

d1

Stereovision Sensor

Fig. 6. Accuracy evaluation of 3D reconstruction.