Hopfield Neural Network Based Stereo Matching Algorithm

13
Journal of Mathematical Imaging and Vision 16: 17–29, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Hopfield Neural Network Based Stereo Matching Algorithm KARIM ACHOUR AND LYES MAHIDDINE Robotics and Artificial Intelligence Laboratory, Advanced Technologies Development Center, 128, Chemin Mohamed Gacem, El-Madania, 16075, Algiers, Algeria [email protected] [email protected] Abstract. In this paper, a neural network based optimization method is described in order to solve the problem of stereo matching for a set of primitives extracted from a stereoscopic pair of images. The neural network used is the 2D Hopfield network. The matching problem amounts to the minimization of an energy function involving specified stereoscopic constraints. This function reaches its minimum when these constraints are satisfied. The network converges to its stable state when the minimum is reached. In the initial step, the primitives to match are extracted from the stereoscopic pair of images. The primitives we use are specific points of interest. The feature extraction technique is the one developed by Moravec, and called the interest operator. Its output comprises mostly corners or feature points with high variance. The Hopfield network is represented as a N l × N r matrix of neurons, where N l is the number of features in the left image and N r the number of features in the right one. An update of the state of each neuron is done in order to perform the network evolution and then allowing it to settle down into a stable state. In the stable state, each neuron represents a possible match between a left candidate and a right one. Keywords: neural network, Hopfield network, stereo vision, matching, 3-D vision 1. Introduction The extraction of 3D data (position, color, texture, ...) from an image captured by an artificial system (e.g. an electronic camera) constitutes a general goal of com- puter vision. Since the beginning of image analysis, researchers looked for a methodology allowing the de- composition of the vision process into modular tasks. Biological systems offer a vivid proof of the possi- bility of using vision in order to solve some complex problems of navigation, scene analysis, pattern recog- nition etc... It seems logical (although not a priori nec- essary) to try to imitate existing biological systems in order to inculcate a sense of biological vision into an artificial system. The study of vertebrate binocular vision systems has revealed that the brain constructs the 3D image perceived from the two retinal images [11, 14, 16]. Prompted by this fact and in order to im- itate biologic binocular vision systems various tech- niques have been developed for recovering 3D infor- mation [3, 7, 8, 20, 21]. The use of images from two or multiple cameras or multiple images from a moving camera (passive stereo vision) provides a passive way to acquire 3D information from a scene and then, is an interesting alternative of direct range measurement which includes triangulation by structured light and time of flight techniques (pulse time delay, phase shift). Passive stereovision is generally used for close range purposes such as robot manipulation, mobile robot guidance and location, inspection, part acquisition and assembly. Marr [13] defined vision as a process that creates, starting from a set of pictures, a complete and precise representation of the observed scene. The vision pro- cess is consequently oriented in a precise direction. Raw data is converted into a more abstract and com- plex form, resulting in a representation permitting the semantic understanding of the world. However, we note that artificial vision does not make profit of such a set- ting in order to imitate, in some modest aspects, the biological vision systems that constitute the reference for a vision system.

Transcript of Hopfield Neural Network Based Stereo Matching Algorithm

Journal of Mathematical Imaging and Vision 16: 17–29, 2002c© 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.

Hopfield Neural Network Based Stereo Matching Algorithm

KARIM ACHOUR AND LYES MAHIDDINERobotics and Artificial Intelligence Laboratory, Advanced Technologies Development Center,

128, Chemin Mohamed Gacem, El-Madania, 16075, Algiers, [email protected]

[email protected]

Abstract. In this paper, a neural network based optimization method is described in order to solve the problemof stereo matching for a set of primitives extracted from a stereoscopic pair of images. The neural network usedis the 2D Hopfield network. The matching problem amounts to the minimization of an energy function involvingspecified stereoscopic constraints. This function reaches its minimum when these constraints are satisfied. Thenetwork converges to its stable state when the minimum is reached. In the initial step, the primitives to match areextracted from the stereoscopic pair of images. The primitives we use are specific points of interest. The featureextraction technique is the one developed by Moravec, and called the interest operator. Its output comprises mostlycorners or feature points with high variance. The Hopfield network is represented as a Nl × Nr matrix of neurons,where Nl is the number of features in the left image and Nr the number of features in the right one. An update ofthe state of each neuron is done in order to perform the network evolution and then allowing it to settle down into astable state. In the stable state, each neuron represents a possible match between a left candidate and a right one.

Keywords: neural network, Hopfield network, stereo vision, matching, 3-D vision

1. Introduction

The extraction of 3D data (position, color, texture, . . .)from an image captured by an artificial system (e.g. anelectronic camera) constitutes a general goal of com-puter vision. Since the beginning of image analysis,researchers looked for a methodology allowing the de-composition of the vision process into modular tasks.Biological systems offer a vivid proof of the possi-bility of using vision in order to solve some complexproblems of navigation, scene analysis, pattern recog-nition etc... It seems logical (although not a priori nec-essary) to try to imitate existing biological systemsin order to inculcate a sense of biological vision intoan artificial system. The study of vertebrate binocularvision systems has revealed that the brain constructsthe 3D image perceived from the two retinal images[11, 14, 16]. Prompted by this fact and in order to im-itate biologic binocular vision systems various tech-niques have been developed for recovering 3D infor-mation [3, 7, 8, 20, 21]. The use of images from two

or multiple cameras or multiple images from a movingcamera (passive stereo vision) provides a passive wayto acquire 3D information from a scene and then, isan interesting alternative of direct range measurementwhich includes triangulation by structured light andtime of flight techniques (pulse time delay, phase shift).Passive stereovision is generally used for close rangepurposes such as robot manipulation, mobile robotguidance and location, inspection, part acquisition andassembly.

Marr [13] defined vision as a process that creates,starting from a set of pictures, a complete and preciserepresentation of the observed scene. The vision pro-cess is consequently oriented in a precise direction.Raw data is converted into a more abstract and com-plex form, resulting in a representation permitting thesemantic understanding of the world. However, we notethat artificial vision does not make profit of such a set-ting in order to imitate, in some modest aspects, thebiological vision systems that constitute the referencefor a vision system.

18 Achour and Mahiddine

In this paper, we present a method to solve the cor-respondence problem in computer vision. The stereo-scopic vision problem can be worked out by the neuralapproach. This approach represents an effective wayto achieve an optimal matching solution using neuralnetwork techniques. This seems to be well adapted be-cause neural networks are systems constructed accord-ing to some of the organizational principles used inthe human brain. Indeed, some researchers view neu-ral networks from a biological and psychological per-spectives. They tend to consider neural networks as amean of implementing cognitive mechanisms, operat-ing over a wide range of domains including vision pro-cessing, associative memories and models of diverseneurobiological functions. On the other hand, some re-searchers consider neural networks as physical systems[9]. They tend to consider the neural network as an en-ergy surface support in which the states of minimalenergy represent solutions to many combinatorial opti-mization problems [10]. Each of such systems solves aproblem through a process in which the energy state ofthe network slides down to an energy minimum. TheHopfield network configuration is a typical example ofsuch systems.

An other fact concerning the use of neural networksin stereovision is that it permits essentially to reducethe complexity and the computational cost of the algo-rithm. Indeed, in the case of the classical techniques(e.g. relaxation) the constraints on the solution can-not be explicitly included simultaneously in the algo-rithm. While in using the neural network technique, thematching problem can be formulated as a minimizationof a cost function where all the constraints on the so-lution can explicitly be held in that function.

The present attempt is elaborated on a previous workof N . Nasrabadi [19] to exploit the problems describedabove. We have investigated some improvements to theproposed algorithm and have described a more efficientapproach. Thus, in this paper, we first make a thoroughdiscussion of these problems and then we propose anew neural matching algorithm. Our main contributionis as follows:

1. Making realistic assumptions to show that the useof the gradient magnitude image to extract theprimitives, instead of the gray level, improves thematching result.

2. Noise is implicitly taken into account and dealt within the implementation.

3. In order to eliminate the matching ambiguities, thetest on the vertical disparity is optimized.

The rest of this paper is organized as follows. First,we discuss briefly the matching primitives (Section 2),and then we give a detailed analysis of the match-ing problem. We discuss the available matching con-straints effectively used, focusing on the 2D Hopfieldneural network configuration. Based on this analysis,our algorithm is described in Section 3. After reportingthe experimental results, a performance evaluation willbe given (Section 4). Finally, we conclude with someperspectives.

2. Feature Extraction

Research on stereovision has attracted considerable at-tention in recent years: although results were promis-ing, a major problem remains in identifying in real timecommon points within two images from which stereodisparity can be measured.

Matching consists of giving a list of candidate pairsrepresenting the same object (or part of the object) inthe real scene. The type of the primitives used has adirect effect on the processing. Indeed, how to matchoften depends on what to match. To solve this problemone can try to associate either the points of equal lumi-nous intensity while possibly coping with perspectivedistortion between the two images or rather, featuresof the scene (image features) that are, in general, moresteady and reliable. They vary from intermediate-levelones such as contours, line segments, curves, corners,regions, to high-level ones such as structured forms.But rare are those satisfying practical considerations;that is to say they should be general, available andmatchable. “General” means that the primitives shouldconvey the majority of useful information in any givenscene, “available” means that there must be techniquesto extract them reliably and finally, “matchable” meansthat there must be a way to apply effectively matchingconstraints.

Generally speaking, low level primitives can be ex-tracted more easily than high level ones but easily ob-tained primitives are difficult to match and vice versa.For example, regions would be ideal for surface recov-ering but the region segmentation itself proves to bea very difficult problem. It might be interesting to de-scribe indoor scenes by line segments but these are notsuitable to scenes containing complex forms as foundin outdoor situations. Intensity pixels are widely usedin area-based algorithms. However, apart from the sen-sitivity to perspective distortion, the methods operat-ing on intensity often fail at occluding boundaries of

Stereo Matching Algorithm 19

surfaces because such locations cannot be coped withcorrectly unless they are already detected; which re-quire again a non-trivial prior segmentation. In general,unless the scene contents are well constrained, exten-sive analysis must be carried out on the total scene andthe primitives, when identified, tend to be sparse andprovide a very limited depth description of the scene.

The substance of our work is not to obtain a densedepth map but to show the feasibility of the neuralmatching. Thus, in order to limit the number of can-didates to match and to decrease the combinatory ofthe algorithm, feature points seem to be the most at-tractive. These characteristic points are defined as thelocal maxima of the directional variance minima. Theydescribe pertinent events in a scene and have the prop-erty of being clearly distinguishable from their neigh-bors. They can easily be extracted using the efficientMoravec operator [18].

In general, they are located at surface boundariesand, considering the fact that a great deal of informa-tion about a scene is found at such locations, the dis-parity continuity constraint that allows the selection ofgood matches or the suppression of false ones, can beexploited appropriately.

In [18], Moravec has used his extraction operatorin order to obtain the primitives to match. To realizethe images correspondence, he introduced a successiverefinement-based correlation technique. Yakimowskyand Cunningham [23] developed a correlation tech-nique using a hand-based method to extract the pointsof interest to match. Barnard and Thompson [5]have presented a correspondence algorithm based onMoravec characteristic points. It consists in a relax-ation method used to improve an initial labeling. P.Limozin-long [12] suggested to match points of in-terest by a relaxation technique in the case of indoorscenes. In [1], reference points are extracted from thescene in order to compute geometrical invariants for a3D-reconstruction problem. In the algorithm presentedbelow, points of interest are used as input data, but thisdoes not mean that they should be the only types ofmatching primitives. Rather, we can think of an al-gorithm working on edges, line segments or regionsand furthermore working with the available matchingconstraints.

The points of interest are selected as follows: wesearch, for each point, the minimum of the directionalvariance among the four directions (horizontal, verticaland the two diagonals), i.e. the sum of the squared dif-ferences between the luminance values within a 5 × 5

window in one direction.

V (i, j) = minθ

Vθ (i, j) with θ = π

4,π

2,

4, π

(1)This variance value V (i, j) located at the cross point

of the i th column and the j th raw of the image is highif and only if all the directional variances are high.The point (i, j) is retained as feature or Moravec pointM(i, j) if its corresponding variance is locally maximaland if it is improved to a fixed threshold in order toassure the quality of choice, that is:

(i, j) = M(i, j) if and only if{∀(u, v) neighbor of (i, j) V (i, j) > V (u, v)

V (i, j) > to a given threshold(2)

The first condition in (2) is introduced because in re-gions of high variance a high number of points could beextracted. In order to reduce this number and in order tohave some distinct points, the variance of points of in-terest must be a local maximum (a window of 20 × 20centered on the Moravec point is used as a neighbor-hood).

Figure 1 shows the results of the application of theMoravec operator on the “Lena” image. The points ofinterest are marked by a + symbol overlapped onto theimage. We can immediately note the particular positionof these points in the picture.

Figure 1. Points of interest extracted with the Moravec operator.

20 Achour and Mahiddine

3. Stereo Matching with a Neural Approach

The recovery of 3D structures of a scene is one ofthe fundamental problems in computer vision. Theuse of stereovision as a passive technique for thispurpose has been widely investigated over the pastyears. While the underlying principle (triangulation)is straightforward, the task of identifying (match-ing) homologous items in two (or more) images hasproved to be very difficult. This problem is com-monly referred to as the correspondence problemand many authors suggested some solutions by us-ing some techniques of correlation [23], relaxation[5, 12], prediction and verification of hypothesis [4,17], dynamic programming [15, 22], and multi-scalematching [2].

To carry out the matching, constraints are neededto find a unique solution. In their computationaltheory, Marr and Poggio [14] proposed two con-straints for the correspondence problem: unique-ness and continuity, both based on physical obser-vations. However, theoretical works do neither showus how the available constraints should be exploredin an algorithm, nor how they should be imple-mented in a computer program. Rather, one is freein making his preferred choice in the design andthe implementation. Concretely, one must considerthe following problems in the design of a stereoalgorithm:

Which primitives are to be used? How should thecontinuity constraint be used? How to cope withnoise in data? How to choose an adequate compro-mise among conflicting requirements? Different an-swers to these questions lead to different algorithms.In this paper, the matching is brought back to an op-timization problem where an energy function, repre-senting the problem constraints, must be minimizedusing a Hopfield neural network. In [19], Nasrabadipointed out that the use of a certain configuration(two-dimensional) for this type of neural network al-lows solving the matching problem as an optimiza-tion problem. The energy function of the Hopfield net-work is designed to reflect the matching constraintsand the minimization of this function leads to the opti-mal feature correspondence establishment. The synap-tic interconnection weights between the neurons rep-resent the constraints imposed by the correspondenceproblem, and the best solution should satisfy all theconstraints such as the uniqueness constraint, that is,each feature in the left image can only have a unique

homologous in the right image and vice versa. Then,we will specify the matching constraints to ensurea stable and coherent feature correspondence estab-lishment between the two images. Points of interestare used as input data. This information is used byeach neuron to force the network to converge to astable state in order to make a decision. The energyfunction, then, represents the collective behavior ofthe network and when the network is in its stablestate, the energy function is said to be at its localminimum.

3.1. Hopfield Network Description

The Hopfield model is relatively different from multi-layer models. There is no dynamics of connectionsbut, in return, the relaxation of the network is dy-namical. During the evolution of the network, whichtends to converge to its steady state, an energy func-tion decreases toward a local minimum [10, 24]. Inour case, a 2D Hopfield network is used in order tofind the correspondences between points of interestin the left image and the right one. The network ispresented as a matrix of Nl × Nr neurons. Ni andNr are respectively the total number of left and rightprimitives. Figure 2 shows the architecture of the net-work and the weights of the connections between theneurons.

Tikjl represents the weight of the connection betweenthe nik and njl neurons. This connection is symmetri-cal, i.e. Tikjl = Tjlik. Tikik = 0 shows that every neurondoes not have self-feedback connection. Iik representsthe external entrance for every neuron. The networkis binary since every neuron can be in one of the two

Figure 2. 2D Hopfield network.

Stereo Matching Algorithm 21

states 0 or 1 (inactive or active). Vik designates the stateof the neuron. The Lyapounov function for a 2D binaryHopfield network is given by:

E =(

−1

2

) Nl∑i=1

Nr∑k=1

Nl∑j=1

Nr∑l=1

TikjlVikVjl −Nl∑

i=1

Nr∑k=1

IikVik

(3)

3.2. Application to the Correspondence Problem

The fact that many stereo algorithms have been pro-posed and new ones are still emerging comes from thefact that existing algorithms are not robust enough.The reason for this is that the available matchingconstraints are not effectively used. These constraintsgovern the interaction between good matches thatrepresent the global consistency. Global constraintsand local constraints constitute, together, the start-ing point and the base for the design of any stereoalgorithm.

Local constraints include epipolar geometry andsimilarity. They are intended to limit the number ofcandidate matches for a given primitive in one im-age. To reduce the search space, one first restrictsthe regions where corresponding primitives are lookedfor (epipolar constraint). Then the properties of theseprimitives are examined: only primitives with sim-ilar properties are retained as candidates (similarityconstraints).

At this stage, local constraints are sufficient; nev-ertheless, the solution is not unique. To approach thesolution, candidate matches are submitted to a globalconsistency verification using global constraints. Theseconstraints are derived from physical properties ofthe scene but they are rather general and not readyfor use. They may or may not be sufficient. Theunderlying idea is that while good matches tend tobe mutually consistent, false ones are not. There-fore global constraints can be used for two purposes:match selection and false match detection and elim-ination. Note that in practice, the local and globalconstraints checking operations are usually performedseparately.

In order to solve the correspondence problem, theneural matching approach considers the problem interms of minimization of a cost function and whereall the constraints on the solution can explicitly be inthe cost function. Nasrabadi [19] suggested an energyfunction to be minimized, including the stereoscopic

constraints:

E = −Nl∑

i=1

Nr∑k=1

Nl∑j=1

Nr∑l=1

CikljVikVjl +Nl∑

i=1

(1−

Nr∑k=1

Vik

)2

+Nr∑

k=1

(1 −

Nl∑i=1

Vik

)2

(4)

The minimum of this function will be the favorable andsteady state of the matching. Therefore, one have to as-sociate to every stereoscopic constraint a term that de-creases every time the favorable state of the constraintis approached.

The first term in (4) represents the compatibility de-gree of the matching between a couple of points (i, j)in the left image and a couple of points (k, l) in theright one since it interprets the imposed constraints onthe solution. The second and third terms reinforce theuniqueness constraint. Indeed, each of these terms nul-lifies if the probability of meeting an only active neuronon a column (respectively a line) is elevated. Every neu-ron nik represents a matching possibility between thei th point of the left image and the kth point of the rightimage. If this neuron is active, the matching is valid;otherwise the matching is invalid. Cijkl is considered asa function y:

y = f (X) = Cikjl = 2(1 + eλ(X−θ)

) − 1 (5)

This non-linear function (Fig. 3) allows grading, in acontinuous manner between +1 and −1, the compat-ibility measurement. It tends toward 1 when X tendsto zero and tends toward −1 when X is greater than θ .So, the first term of (4) is at its minimum when X tendsto zero. The θ parameter allows choosing the positionof good compatibility. λ is a parameter that allows se-lecting the slope of the function. A high values of λ

permits a high selectivity; however, a small λ permitsto take into account a high number of compatible pairs.

Figure 3. Graph of y = f (X) = Cikjl.

22 Achour and Mahiddine

Since one seeks to minimize the energy function (4),it seems convenient in our case that we include thesolution constraints in the X term in a way that theseconstraints are to their minimal values for good matchesand higher in the case of false ones.

In [19], Nasrabadi proposed three constraints basedon physical observations: (i) uniqueness, that is eachitem from each image may be assigned at most onedisparity value, (ii) a geometric constraint, that is thedifference �D between the distance separating twopoints in the left image and the distance separatingtheir correspondents in the right one. This differenceis small when the feature points are correctly matched.This length invariance constraint is justified since sur-face patches in a scene, projected into two cameras,form similar image patches. Furthermore, the objectsare generally such that the changes in surfaces andperspective distortions are very small compared withtheir distance from the viewer. Except for cases wherethe perspective distortion produces deformed imagepatches and eventually dissimilar patches at occlud-ing boundaries. And (iii) continuity constraint, i.e. dis-parity varies smoothly almost everywhere over theimage.

The disparity d(x, y) measures the displacement be-tween two points, x in the left image and y in the rightone, along the epipolar line:

d(x, y) =√

(xx − yx )2 + (xy − yy)2 (6)

The homogeneous disparity constraint can be dealt withby establishing hypothesis, a priori on the regularity ofthe observed surfaces, which results in some hypothe-sis on d: d is piecewise continuous, piecewise derivableand one can also suppose a strong regularity for d inplaces such as surfaces (such hypothesis are naturalsince the most homogeneous parts of the image corre-spond in general to the regular surfaces).

The fact that d is piecewise continuous is bound toan important problem: to a fixed point y in an image,if at point x in the second image d(x−, y) �= d(x+, y)

then, all the points in the second image situated betweenx + d(x−, y) and x + d(x+, y) do not have correspon-dents in the first image and belong to a forbidden zonein virtue of the monotonicity principle which statesthat the relative positional order of image primitivesalong an epipolar line in one image is preserved in theother image. The continuity disparity constraint is for-mulated in the energy function as a difference �d be-tween the disparities of two matched point pairs (i, k)

and ( j, l). If the two points belong to the same object,this term tends to zero.

In the present, work we introduce the use of an otherconstraint based on the local similarity information. Itcan be an intensity constraint and one can use it to mea-sure the degree of similarity of the immediate neigh-borhoods of the two candidates to match. However, thepresence of noise in the image inhibits the excessiveemphasis concerning the use of this information anda strict similarity criterion, based on the intensity, ismeaningless especially at occluding boundaries. In ouralgorithm, this similarity constraint is used in a waythat it is applied to points of interest extracted with theMoravec operator on gradient magnitude images. Thisalleviates the noise burden since the gradient magni-tude image is noise-free constrained provided that weuse the filter which gives the best performance withrespect to some noise-based criteria. The filter we useto obtain the gradient magnitude image is the Canny-Deriche [6] one, which deals with signal-to-noise ratioand good localization. Note also that a very good lo-calization of points of interest, extracted from a filteredimage, is obtained since the points marked as local max-ima by the filter are as close as possible to the true edgeswhere the points of interest are mostly localized. Thisin turn allows performing a more considerable efficientmatching process. Therefore, in the following, we firstdefine the similarity constraint, based on which we pro-pose a new form for the X term in Eq. (5). To measurethe degree of similarity for the immediate neighbor-hoods of two points i in the left image and k in theright image, we introduce the method below:

Given the feature points detected in each of the twoimages, a difference measure Mik is computed:

Mik = 1

n

∑(xl ,yl )∈Wi ,(xr ,yr )∈Wk

|Vi (xl , yl) − Vk(xr , yr )|(7)

With:

Vi (xl , yl) = Ii (xl , yl) − Ii (8)

And

Vk(xr , yr ) = Ik(xr , yr ) − Ik (9)

Where Wi is a window centered on point i and Ii (xl , yl)

the gradient magnitude of point (xl , yl) within Wi andIi its mean gradient magnitude. Wk is a window cen-tered on point k, and Ik(xr , yr ) the gradient magnitude

Stereo Matching Algorithm 23

of point (xr , yr ) within Wk and Ik its mean gradientmagnitude. M jl is computed in a similar way:

The X term includes the three constraints previouslyquoted. It is given by:

X = |�d| + |�D| + (Mik + Mjl) (10)

One can see that this function is well applied in ourcase since one seeks to understate the energy function(5) when the constraints �d , �D, Mik and Mjl are totheir minimal values. Equation (4) could be written inthe following form:

E =(

−1

2

) Nl∑i=1l

Nr∑k=1

Nl∑j=1

Nr∑l=1

(Cijkl − δij − δkl)VikVjl

−Nl∑

i=1

Nr∑k=1

2Vik (11)

δij and δkl interpret the constraint of singleness whereδij = 1 if i = j and δij = 0 if i �= j , and δkl = 1 if l = kand 0 otherwise. This equation takes the form of theHopfield energy function (3), where Tikjl = (Cikjl −δij − δkl) and the Iik entrance of every neuron is equalto 2.

A variation �Vik of the neuron nik leads to a variation�Eik of the global energy:

�Eik = −[

Nl∑j=1

Nr∑l=1

(Cijkl − δij − δkl)Vjl + 2

]�Vik

(12)

This equation describes the dynamics of the networkand it is always negative, which implies the followingconditions (see [19]):

Vik 1 → 0 if

[Nl∑

j=1

Nr∑l=1

(Cikjl − δij − δkl)Vjl + 2

]< 0

Vik 0 → 1 if

[Nl∑

j=1

Nr∑l=1

(Cikjl − δij − δkl)Vjl + 2

]> 0

Vik no change if

[Nl∑

j=1

Nr∑l=1

(Cikjl − δij − δkl)Vjl + 2

]= 0

(13)

These conditions permit the update of the network. Theoptimal solution is obtained when the network state

does not change. Its energy function is then at its lowestlevel.

4. Tests and Results

The different stages of the matching algorithm appliedon a stereo pair are presented below. After their extrac-tion, the points of interest are labeled respectively from1 to Nl for the left image and from 1 to Nr for the rightimage. A 2D Hopfield network of Nl×Nr neuron is then

(a)

(b)

Figure 4. Original stereoscopic pair, (a) left image and (b) rightimage.

24 Achour and Mahiddine

(a)

(b)

Figure 5. Gradient magnitude images, obtained with the Canny-Deriche filter, and the Moravec feature points (a) left: 50 featurepoints and (b) right: 56 feature points.

implemented. The state of one neuron Vik is modified bythe update rule quoted in (13). Initially, all the neuronsare zeroed. The update is done by the random choice ofone left point i . Then, a window 40 pixels wide and 30pixels high is opened in the right image. This windowis centered on point i shifted by a distance equivalent tothe maximal disparity estimated in the scene (10 pixelsin our case) and in which, one looks for the matchingcandidates. The choice of a narrow window allows ver-ifying the epipolar constraint and its width is chosen inorder to include the largest disparity in the stereo pair.

Figure 6. Magnified image: the Moravec points have a very goodlocalization.

For every point k found in this window, the corre-spondent neuron nik is set to 1. Every point k is thensupposed to be a correct match. An other window ofsize 40 × 20 is opened around point i in the left im-age. All the points belonging to this window representthe j’s of Eq. (12). For every point j , a similar proce-dure to the one applied for the research of the potentialcorrespondents of i is applied. These points representthe index i of Eq. (12). This random updating proce-dure is iterated until the network reaches its state ofstability and so, the neurons state do not change anymore. However, the network could converge withoutupdating all its cells. This is due to the random choicefor the updating process. Also, in order to increase theprobability of visiting every cell of the network, thecondition to stop the updating process is that the net-work remains steady during 200 iterations. After net-work stabilization, the valid matchings correspond tothe neurons whose states are valid.

Despite the constraints imposed to all the pairs tomatch, some multiple matches could subsist. In orderto eliminate these ambiguities, we do a test on the ver-tical disparity for all the multiple matches and then weproceed symmetrically. This allows eliminating all thecases of multiplicity. The procedure is as follows:

After stabilization of the network, we search forevery column of the matrix and therefore for everypoint i of the left image, the corresponding point inthe right image. Thus, we will have two matching

Stereo Matching Algorithm 25

Table 1. Matched points and their corresponding disparity table.

Pair label Left feature label Right feature label Disparity xleft yleft xright yright

1 3 4 14 110 11 124 11

2 4 5 11 183 11 194 11

3 1 2 3.1 26 11 36 11

4 5 3 24 68 12 92 11

5 7 7 12 108 29 120 30

6 8 8 11.1 78 34 89 32

7 9 10 10.1 27 49 37 47

8 11 11 11 61 51 72 50

9 15 14 13 108 69 120 64

10 20 20 12.1 108 104 120 102

11 21 21 10 46 106 56 105

12 22 22 10 76 107 86 107

13 23 24 10 81 123 91 122

14 24 25 11 95 126 106 125

15 25 26 8.2 120 126 128 128

16 26 27 10 67 136 77 137

17 27 29 2 11 158 11 156

18 28 30 10.2 77 166 87 164

19 29 31 10.2 53 168 63 166

20 31 34 9 185 180 194 179

21 30 32 10.2 123 180 133 178

22 32 35 10 103 181 113 181

23 33 36 14 11 183 25 184

24 34 37 1 244 187 244 188

25 35 39 17.1 112 196 129 194

26 37 40 10 89 197 99 196

27 38 42 14.1 11 203 25 201

28 39 44 11 177 203 188 202

29 41 47 16.3 11 221 27 218

30 43 48 16 113 226 129 225

31 46 49 9.2 36 233 45 231

32 47 52 3 244 241 244 238

33 50 56 9 101 244 110 244

classes: singular and multiple. We then calculate themean vertical disparity for all the singular pairs. Thisdisparity is compared to the vertical disparities of theelements belonging to the class of multiple pairs. If thedifference between the vertical disparity of one pair andthe mean disparity is inferior to a predefined threshold,then this pair rejoins the class of singular pairs. Themean disparity Y is updated and the process is iteratedfor every point of the multiple class. This updating pro-

cedure optimizes the research process for it allows re-covering some pairs whose vertical disparities did notverify the test of the difference although it is very nearto Y . After the test of disparity, only singular matchessubsist. The multiple matches are eliminated. In orderto perform the symmetry of the process, we do a similarprocess from right to left, i.e. we check every line of thenetwork matrix and therefore for every point k of theright image, the corresponding point in the left image.

26 Achour and Mahiddine

(a)

(b)

Figure 7. Neural matching result: 23 matched points with θ =10, (a) left gradient magnitude image, (b) right gradient magnitudeimage.

The matching of one pair is definitely validated if forboth processes, right-left and left right, the matching issingular.

This algorithm was tested on a stereo pair. It con-verges after 500 to 800 iterations and the execution timeon a SUN SPARC-10 system is estimated between 2and 5 seconds. Figure 4(a) and (b) show the left andright images of the initial stereo pair, which representsan indoor scene with geometric objects. Some differ-ences appear in the images (e.g. books on the roofs).

(a)

(b)

Figure 8. Neural matching result with θ = 34, 25 matched points(a) left image (b) right image.

This is done in order to test the behavior and robustnessof the algorithm in presence of some artifacts.

Figure 5(a) and (b) show the corresponding gradi-ent magnitude images with the feature points extractedwith the Moravec operator. Each feature point appearsincrusted on the images as a + symbol. One can notethe position of the extracted points at the vertices of thescene objects that allows a very good localization andimproves the disparity computing (see Fig. 6).

Figure 7(a) and (b) give the result of the match-ing process where each matched pair is labeled and

Stereo Matching Algorithm 27

Table 2. Matched points and their corresponding disparity table.

Pair label Left feature label Right feature label Disparity xleft yleft xright yright

1 1 1 11 66 5 77 5

2 2 2 13.2 23 12 36 14

3 5 8 13 181 60 192 53

4 7 14 18.7 6 88 24 93

5 8 15 3.16 41 90 40 93

6 9 16 13.9 181 90 193 97

7 10 13 19.7 61 94 78 84

8 12 17 10 46 107 56 106

9 13 18 11 74 109 85 108

10 19 22 10.4 181 133 191 130

11 20 23 11 46 141 57 140

12 21 24 11.2 78 142 89 140

13 22 25 11.4 124 150 133 143

14 24 26 11.7 181 158 191 152

15 25 30 25.5 77 166 102 171

16 26 29 10 50 168 60 167

17 27 33 8.06 124 178 132 179

18 28 31 12.2 181 178 191 171

19 29 32 6.4 5 180 10 176

20 36 38 9.22 50 203 57 197

21 40 41 10.4 15 218 5 215

22 41 48 9.06 80 224 81 233

23 45 46 9.06 36 233 45 232

24 46 51 5.39 109 243 107 238

25 47 52 3.61 212 243 210 246

represented by a symbol that is the points have beenmatched without a test on the vertical disparity or by a�symbol which means that the points have been matchedafter the vertical disparity test.

Among the 50 points extracted from the left imageand the 56 ones from the right image, 33 points havefound unique correspondents. In Table 1, we give thelist of the matched pairs with the corresponding dis-parities. One can note that the matching is efficient andthe disparity distribution is coherent but one can alsosee that for the two pairs 4 and 17 the matching isnot good. For the artifacts introduced in the images nomatching occurs, showing the robustness of the algo-rithm. A summary of results is given in Table 1.

We can also discuss the stability of the match-ing algorithm. Indeed, for a given pair, the match-ing result is identical on different tests. Then, thealgorithm converges to a stable solution. But thisis not true when the algorithm works on gray-level

images. In this case, the stability is not guaranteed;that is some correspondences differ for different testsand then the convergence to a stable solution is notverified. We show, in the following, the result ofthe neural matching algorithm on intensity images.Figure 8(a) and (b) show the corresponding points ob-tained with the neural algorithm applied to the orig-inal scene. One can note the poor localization of thefeatures in the two images and that the matching isnot good enough, since some different features arematched (pairs 3, 4, 6, 15, 18). The points that corre-spond to the artifacts have also found false correspon-dents (pairs 5 and 7). A summary of results is given inTable 2.

The presented results show that the use of the neural-based method to solve the matching problem is wellfounded and that the use of a similarity constraint, cou-pled with the features extraction on gradient magnitude,improves the matching result.

28 Achour and Mahiddine

5. Conclusion

Although the application of artificial neural networktechniques to conventional computer vision problemsis a relatively new development, it has received in-creasing attention in recent years. The research pre-sented in this paper contributes to this area by propos-ing an approach to solve classical matching problems.Our choice was carried on the 2D Hopfield network. Itis configured to establish the corresponding points intwo stereo images. It incorporates the matching con-straints and can always lead to optimal matching so-lutions for a pair of images. Its energy function is atits minimal value when the imposed stereoscopic con-straints are satisfied. The network then converges to-wards its steady state. This proposed matching tech-nique uses points of interest, extracted by the Moravecoperator from the gradient magnitude image, as fea-tures to match. Moreover, the structure of the proposedneural network suggests a very fast implementationscheme. The test done on a pair of images repre-senting an indoor scene has given some satisfying re-sults and has shown the efficiency of the algorithm interms of computational cost and quality of the matchingresult.

References

1. K. Achour, M. Benkhelif, and S. Aouat, “3D Reconstructionusing four reference points without camera calibration for a mo-bile robot,” in IEEE, RSJ, International Conference of Intelli-gent Robots and Systems, IROS’98, Victoria, Canada, October1998.

2. K. Achour and L. Mahiddine, “A multiscale stereovision algo-rithm for a mobile robot,” in 24th Annual Conf. of IEEE Indus-trial Electronics Society, IECON’98, Aachen, Germany, 1998.

3. G. Adiv, “Determining three-dimensional motion and structurefrom optical flow generated by several moving objects,” PatternAnal. Machine Intell., Vol. PAMI-7, No. 4, pp. 384–401, 1985.

4. N. Ayache and B. Faverjon, “Fast stereo matching of edges seg-ments using prediction and verification of hypotheses,” in CVPR,1985.

5. S.T. Barnard and W.B. Thompson, “Disparity analysis of im-ages,” IEEE Trans. Pattern Anal. Machine Intell., Vol. 2,pp. 333–340, 1981.

6. R. Deriche, “Using Canny’s criteria to derive a recursively imple-mented optimal edge detector,” International Journal of Com-puter Vision, Vol. 1, No. 2, pp. 167–187, 1987.

7. W. Hoff and N. Ahuja, “Surfaces from stereo images: An in-tegrated approach,” in First Int. Conf. Comput. Vision, 1987,pp. 284–294.

8. W. Hoff and N. Ahuja, “Surfaces from stereo: Integrating fea-ture matching, disparity estimation and contour detection,” IEEETrans. Pattern Anal. Machine Intell., Vol. 11, pp. 121–136, 1989.

9. J. Hopfield, “Neural networks and physical systems with emer-gent collective computational abilities,” Proc. Nat. Acad. Sci.,Vol. 79, 1982.

10. J. Hopfield and D.W. Tank, “Neural computations of decisionsin optimization problems,” Biol. Cybern., Vol. 52, pp. 141–152,1985.

11. W.S. Kuffler and G.J. Nicholls, From Neurons to Brain, Sinauer:Boston, MA, 1984.

12. P. Limozin-Long, “Vision Stereoscopique Appliquee a laRobotique,” Doctorate Thesis, Paris, October 7, 1986.

13. D. Marr, Vision, W.H. Freeman: New York, 1982.14. D. Marr and T. Poggio, “A theory of human stereo vision,”

Massachussetts Institute of Technology, Artificial IntelligenceLaboratory, A.I. Memo No. 451, October 1977.

15. H. Maitre and Y. Wu, “Improving dynamic programming tosolve image registration,” Pattern Recognition, Vol. 20, No. 4,pp. 443–462, 1987.

16. J.E.W. Mayhew and J.P. Frisby, “Psychological and compu-tational studies towards a theory of human stereopsis,” Artif.Intell., Vol. 17, pp. 349–385, 1981.

17. G. Medioni and R. Nevatia, “Segment based stereo matching,”Computer Vision and Image Processing, Vol. 31, pp. 2–18, 1985.

18. H.P. Moravec, “Towards automatic visual avoidance,” in Proc.5th Int. Joint Conf. Artificial Intell., Cambridge, August 1977,p. 584.

19. N. Nasrabadi and C.Y. Choo, “Hopfield network for stereovi-sion correspondence,” IEEE Trans. On Neural Networks, Vol. 3,No. 1, 1992.

20. J.J. Rodriguez and J.K. Aggarwal, “Matching aerial images to3D terrain maps,” Pattern Anal. Machine Intell., Vol. 12, No. 12,pp. 1138–1149, 1990.

21. V. Salari and I.K. Sethi, “Feature point correspondence in thepresence of occlusion,” Pattern Anal. Machine Intell., Vol. 12,No. 1, pp. 87–91, 1990.

22. Y. Wu and H. Maitre, “A new dynamic programming methodfor stereovision ignoring epipolar geometry,” in 9th ICPR,November 1988, pp. 146–148.

23. Y. Yakimowski and R. Cunningham, “A system for extract-ing three dimensional measurements from a stereo pair of TVcameras,” CGIP, Vol. 7, pp. 195–210, 1978.

24. S.S.Yu and W. Tsai, “Relaxation by the Hopfield neuralnetwork,” Pattern Recognition, Vol. 25, No. 2, pp. 197–200,1992.

Karim Achour was born in Tizi-Ouzou (Algeria) in 1958. He re-ceived his Ph.D. degree in electrical and computer engineering fromRennes university (France) in 1987 and the M.S. degree in elec-trical engineering from ENSM Nantes (France) and his B.S. at thepolytechnic school in Algiers (Algeria). Dr. Achour is the author

Stereo Matching Algorithm 29

of numerous publications for conferences proceedings and journals.He has been active in research on perception, computer vision andpattern recognition. He has participated in many conferences as anauthor, a panel member and a session chairman. He has been a re-viewer for many conferences. He was a head of robotics and A.I.laboratory in high research center in Algiers.

Lyes Mahiddine was born in Algiers (Algeria) in August 1969. Hereceived the magister degree in electronics, signals and systems inreal time, from the electronic institute of USTHB university, Algiersin 1997. His domain of research is computer vision, perception andpattern recognition. Today, he is a researcher in the vision team ofthe Robotics and Artificial Intelligence Laboratory of the AdvancedTechnologies Development Center (Algiers).