Geometric Hashing

Geometric HashingIsidore Rigoutsos �and Haim Wolfson yOctober 6, 1997AbstractIn this paper we describe the Geometric Hashing paradigm for matching of a set of geometric featuresagainst a database of such feature sets. Speci�c examples are model based object recognition in computervision for which this technique was originally developed, matching of volumetric data obtained from CTor MRI images of di�erent persons, matching of an individual �ngerprint versus a database, matchingthe molecular surface of a receptor molecule against a data base of drugs and so on.The features considered can be points, segments, in�nite lines, corners and any other geometricentities. The matching is performed under any non-elastic geometric transformation such as the rigid,similarity, a�ne and projective transformations in any dimension. Moreover, the problem addressed isthe much more di�cult partial matching problem, where one tries to detect (previously unknown) largesubsets of the feature set which are compatible with subsets of the database feature sets.The technique is highly e�cient and is of low polynomial complexity in the feature set. The power ofthe polynomial depends on the class of transformations that the feature sets are allowed to undergo. Thee�ciency is achieved by an indexing scheme which preserves the geometric rigidity constraints of a shape,thus distinguishing this approach from standard indexing techniques which exploit local transformationinvariants.The technique is �rst described in the simple case of a 2-D similarity transformation. Then itsextensions are discussed and a weighted voting scheme is presented. This scheme allows to reformulatethe technique in the Bayesian framework.KEYWORDS: geometric hashing, indexing, matching, model-based recognition, invariants, Bayeslaw.1 IntroductionObject recognition is the ultimate goal of the greater part of Computer Vision research. An ideal objectrecognition system should be able to recognize objects in an image, where the objects might be partiallyoccluded and have undergone geometric transformations, which are a function of the object pose and ofthe imaging device. Usually, we assume that the system has a large database of models, with which it isfamiliar. This is the, so called, model based recognition.Assume that one wants to endow a robot with the ability to recognize all the objects and tools on afactory oor. Assuming that this number is limited to few hundreds, it seems plausible that one coulddesign a database of these objects and store it into the robot's memory. Now, when the robot gets a�IBM T.J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10594, USA.yComputer Science Dept., School of Mathematical Sciences, Tel Aviv U., Ramat Aviv, Tel Aviv 69 978, Israel.1

sensory image1 of its environment, it should be able to retrieve fast those objects, which appear in theimage. This task, which is quite natural for the human visual system, requires the solution of severalcomplicated sub-problems. First, the objects in the acquired scene appear rotated and translated relativeto their initial (database) position and the whole scene undergoes a sensor dependent transformation, suchas the projective transformation of a video camera. Second, the objects in the scene may partially occludeeach other and additional objects, not included in the database, may be present. Finally, it would beundesirable to retrieve each individual object from the database and compare it against the observed scenein search for a match. For example, if the scene contains only round objects, it does not make muchsense to retrieve a rectangular table to match against it. We need a method, which allows direct accessonly to the relevant information. Indexing based approaches seem to be a natural choice. For example,if one is looking for words in long strings of text, he could prepare tables, accessed by indices, whichare functions of individual words. The information stored is the strings, where the word appears, andthe location of the word in a such string. Then, when faced with a relevant word it would be easy toretrieve all its appearances from the table. An indexing approach of the same avor for geometric objectrecognition has to use indices which are based on local geometric features and should be invariant to theobject transformation. The features must be local to handle partial occlusion. Their indexing function hasto be invariant to the relevant transformation, since, unlike words in a text, geometric features have bothlocation and orientation.For over a decade now, indexing-based approaches to model based recognition have been gaining groundas the method of choice for building working recognition systems that can operate with large modeldatabases. This discussion is mostly limited to treatments that led directly to the geometric hashingmethod or used the geometric hashing terminology.The idea of geometric hashing, in its modern incarnation, has its origins in the work of Schwartz andSharir [7, 8]. These �rst e�orts were concentrated on the recognition of rotated, translated and partiallyoccluded 2D objects from their silhouettes by boundary curve matching techniques. It is important tonote that our previous analogy with text strings is incomplete. Shape information has more to it, thanjust the location of local features. Two shapes may have the same local features, yet be entirely di�erentin their appearance. The key observation is that if the rigidity of a shape is conserved, than not only thelocal features are important, but also their relative spatial con�guration.In order to exploit the above mentioned geometric consistency and also to tackle model-based objectrecognition not just for curves but in a general 2D or 3D setting Schwartz, Wolfson, and Lamdan de-veloped a new geometric hashing technique which was applicable to arbitrary point sets (constellations)under various geometric transformations. E�cient algorithms were developed for recognition of at rigidobjects represented either by point sets or by curves under the a�ne approximation of the perspectivetransformation [12, 11, 13]. The technique was extended to the recognition of point sets under arbitrarytransformations and to the recognition of rigid 3D objects from single 2D images [9]. This technique isthe basis of our overview and Section 2 contains a thorough description of it.Geometric hashing systems have since been built and explored by many research groups. It is fair tosay that most implementations of geometric hashing systems seem to work as well as classical model-basedvision systems, while delivering in terms of the promise of greater e�ciency.One of the big advantages of the geometric hashing method is that it is inherently parallel. In fact, theunderlying data structure can easily be decomposed and shared among a number of cooperating processors1the sensor might be a video camera or a range sensor 2

with minimal communication and maintenance costs. Parallel implementations of geometric hashing, fora Connection Machine, are described in [1, 14, 15].Soon after the conception of the geometric hashing technique, researchers realized that one of thecharacteristics of the method is the non-uniform distribution of indices over the space of invariants [2,19, 16, 5, 20]. This non-uniformity is not speci�c to the geometric hashing technique, but appears to beendemic to all indexing schemes [25, 27].>From a practical angle, the non-uniform distribution over the space of invariants results in di�erentlengths for the hash entry lists. Since the length of the longest such list dominates the time neededto carry out the histograming phase of the algorithm, a non-uniform distribution will adversely a�ectthe method's performance. On the other hand, a uniform distribution not only reduces execution timebut can also result in a much more e�cient storage of the hash table data structure. Additionally, in aparallel implementation of the method, a more or less constant co-occupancy of all the hash bins resultsin an improved load balancing among the processors [15]. Knowledge of the expressions for the indexdistributions over the space of invariants greatly facilitates the equalization of the hash bin occupancy.We show how one can incorporate additive Gaussian noise into the geometric hashing framework andanalytically determine its e�ect on the invariants that are computed for the case where the models areallowed to undergo similarity or a�ne transformations. This formulation provides a detailed descriptionof the method's behavior in the presence of noise.With the augmented framework at hands, one can proceed and develop a Bayesian formulation forobject recognition with geometric hashing. This formulation is described in great length elsewhere [26].Augmentation of the traditional geometric hashing algorithm with an error model layer and a Bayesianlayer allows the creation of working systems that can operate with real-world photographs and large modeldatabases [24].This paper is organized as follows. Section 2 reviews the underlying ideas of the Geometric Hashingmethod. Section 3 discusses the distribution of the (transformation invariant) indices under the assumptionthat feature points are generated by a Gaussian random process. A rehashing function, which achievesa uniformly distributed hash-table is suggested. In Section 4 a Gaussian noise model for the accuracy ofthe invariant indices is derived. In Section 5 a Bayesian formulation of the geometric hashing approachis presented. Finally, in Section 6 we discuss the various implications and applications of the presentedmethod.2 Review of Geometric HashingThe geometric hashing/indexing methods represent an e�cient and highly parallelizable approach to per-forming partially occluded object recognition. It is especially attractive for the model-based scheme, buthas also signi�cant advantages in pairwise object-scene comparison.The desire in Geometric Hashing is to be able to recognize objects appearing in a scene at a glance,so to speak. However, one cannot know in advance which database objects will appear and in which pose.Thus the model information is encoded in a preprocessing step and stored in a large memory. This memory(hash table) is scene independent and thus can be computed o�-line without a�ecting the recognition time.The access to the memory is based on geometric information which is invariant of the object's pose and iscomputed directly from the scene.During the recognition phase, when presented with a scene from which features are extracted, the hashtable data structure is used to index geometric properties of the scene features to candidate matching3

models. A search over the scene features is still required. However, the geometric hashing scheme obviatesa search over the models and the model features.The scene features are typically points, linear and curvilinear segments, corners, etc. and are accumu-lated during the feature extraction stage. Any such collection of features can be represented by a set ofdots: each dot represents the feature's location; associated with each dot is a list of one or more attributes(the feature's attribute list) which depends on the corresponding feature's type.In the discussion that follows, and without any loss of generality, we will examine the simple case wherewe concentrate on the problem of recognition of clusters of point features (dot patterns).rotation

translation -1/2 1/2

yappend (M1,(4,1))to end of hashbin list

3

45

2

1scaling

Model

2

5

1

4

3

x

4

5

1

3

2Figure 1: Determining the hash table entries when points 4 and 1 are used to de�ne a basis. The modelsare allowed to undergo rotation, translation and scaling. On the left of the Figure, a modelM1 comprising�ve points is shown.Suppose that we wish to perform recognition of patterns of point features that may be translated,rotated and scaled (similarity transformations). 2 Figure 1 (left) shows a model (M1) consisting of �vedots with position vectors p1;p2;p3;p4 and p5 respectively.We want to encode this dot information appropriately and store it into a table. This way, if one detectsthis collection of dots in a scene, one could conclude that they belong to the model M1. An ideal codingscheme could be \color indexing." If we knew that all the dots (in all the models and in any scene) havedistinct colors, we could build a table were the color of a dot serves as its index and the informationencoded and stored is the model number. In the recognition stage, one could just scan the dots, access thetable using the the colors found and count how many times each model's name was accessed. This can bedone in linear time (as a function of the scene size), if the features have distinctive enough attributes.What happens though in the least informative case, where dots belonging to a model have no attributesexcept for their geometric con�guration? Is there a distinctive geometric color? A natural \geometric color"of a dot is the set of its coordinates. However, coordinates depend on a reference frame. The question thenbecomes whether there is a natural reference frame for a model which will remain visible under partial2Other transformations such as rigid and a�ne can also be tackled by the geometric hashing method; we concentrate onthe similarity transformation since it is of moderate di�culty and e�ectively showcases the methodology.4

occlusion. If nothing better is available, one can choose a pair of dots belonging to the model and de�nean unambiguous reference frame, which will remain unchanged if the model undergoes rotation, translationand scaling. Let us take the pair of dots p4;p1 as an ordered basis to such a reference frame. We scalethe model M1 so that the magnitude of �!p4p1 in the Oxy system is equal to 1. Suppose now that we placethe midpoint between dots \4" and \1" at the origin of a coordinate system Oxy in such a way that thevector �!p4p1 has the direction of the positive x-axis. The remaining three points of M1 will land in threelocations. Let us record in a quantized hash table, in each of the three bins where the remaining pointsland, the fact that model M1 with basis \(4; 1)" yields an entry in this bin. This is shown graphically inFigure 1.o o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o oo

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

y

x

Figure 2: The locations of the hash table entries for model M1. Each entry is labeled with the information\model M1" and the basis pair (i; j) that was used to generate the entry. The models are allowed toundergo rotation, translation and scaling.Since our goal is to perform recognition under partial occlusion, we are not guaranteed that both basispoints p1;p4 will appear in each scene where model M1 will be present. Consequently, we encode themodel's dot information in all possible ordered basis pairs. Namely, the hash table contains three entries ofthe form (M1; (4; 2)), three entries of the form (M1; (4; 3)), etc. Each triplet of entries is generated by �rstscaling the model M1 so that the corresponding basis has unit length in the Oxy coordinate system, andthen by placing the midpoint of the basis at the origin of the hash table in such a way that the basis vectorhas the direction of the positive x-axis. The same process is repeated for each model in the database.Of course, some hash table bins may receive more than one entry. As a result, the �nal hash table datastructure will contain a list of entries of the form (model; basis) in each hash table bin. Figure 2 shows thelocations of all the hash table entries for model M1.What we have in essence done here is de�ne orthonormal bases in turn for the coordinate system Oxy,using a pair of vectors psx , (p�2 �p�1) and psy , Rot(psy); �1 and �2 where distinct points taken from the5

model M1. For each choice of a basis, the remaining points p of M1 were represented in this basis usingthe equation p� ps0 = upsx + vpsy (1)where ps0 = p�1+p�22 is the midpoint between p�1 and p�2 . Quantization of the scalar quantities u andv which remain invariant under similarity transformation of M1 allows us to determine an index (uq; vq)into a two-dimensional hash-table data structure. In the hash bin that is accessed via (uq; vq) we enter theinformation (m; (p�1;p�2)).In the recognition phase, a pair of points, (p�1 ;p�2), from the image is chosen as a candidate basis.As before, this ordered basis de�nes a coordinate system Oxy whose center coincides with the midpointof the pair; the direction of the basis vector p�2 � p�1 coincides with that of the positive x-axis. Themagnitude of the basis vector de�nes the \unit" length for Oxy. The coordinates of all other points arethen calculated in the coordinate system de�ned by the chosen basis. Each of the remaining image pointsis mapped to the hash table, and all entries in the corresponding hash table bin receive a vote. In essence,for the selected basis and for each of the remaining points in the scene, Eqn. 1 is used to determine theindex (uq; vq) of a hash bin to access. Each pair of (model,basis) found in the accessed bin gets a vote.Figure 3 shows this graphically.5

3

21

4

cast 1 vote

for each entry

in bin’s list

in the end histogram

all entries with one

or more votes

Model

Image

-1/2 1/2

y

xFigure 3: Determining the hash table bins that are to be noti�ed when two arbitrary image points areselected as a basis. The allowed transformation is similarity.If we have selected a pair of scene points that corresponds to a basis on one of the models, we expectit to accumulate as its score the votes from all the other unoccluded points belonging to this model. Ifthere are su�cient votes for one or more (model; basis) combinations, then a subsequent stage attemptsto verify the presence of a model with the designated basis matching the chosen basis points. In the casewhere model points are missing from the image because they are obscured, recognition is still possible, aslong as there is a su�cient number of points hashing to the correct hash table bins. The list of entries ineach hash table bin may be large, but because there are many possible models and basis sets, the likelihoodthat a single model and single basis set will receive multiple votes is quite small, unless a con�guration oftransformed points coincides with a model. In general, we do not expect the voting scheme to give only6

one candidate solution. The goal of the voting scheme is to act as a sieve and reduce signi�cantly thenumber of candidate hypotheses for the veri�cation step. For the algorithm to be successful it is su�cientto select as a basis tuple any set of image points belonging to some model. It is not necessary to hypothesizea correspondence between speci�c model points, and speci�c scene points, since all models and basis pairsare redundantly stored within the hash table. Classi�cation or perceptual grouping of features can be usedto make the search over scene features more e�cient, for example, by making use of only special basistuples.To recapitulate, the following two stages are at the core of a geometric-hashing system:Preprocessing PhaseFor each model m do:(1) Extract the model's point features. Assume that n such features are found.(2) For each ordered pair (i.e. basis) of point features do:(a) Compute the coordinates (u; v) of the remaining features in the coordinate frame de�ned by the basis .(b) After a proper quantization, use the tuple (uq; vq) as an index to a two-dimensional hash table datastructure, and insert in the corresponding hash table bin the information (m; (basis)), namely, the modelnumber and the basis tuple which was used to determine (uq ; vq).For the recognition phase, the method dictates the following steps:Recognition PhaseWhen presented with an input image,(1) Extract the various points of interest. Assume that S is the set of the interest points found; let S be thecardinality of S.(2) Choose an arbitrary ordered pair (i.e. basis) of interest points in the image.(3) Compute the coordinates of the remaining interest points in the coordinate system Oxy that the selected basisde�nes.(4) Appropriately quantize each such coordinate and access the appropriate hash table bin; for every entry foundthere, cast a vote for the model and the basis.(5) Histogram all the hash table entries that received one or more votes during step (4). Proceed to determinethose entries that received more than a certain number (threshold) of votes: each such entry corresponds to apotential match.(6) For each potential match discovered in step (5), recover the transformation T that results in the best least-squares match between all corresponding feature pairs.7

(8) Transform the features of the model according to the recovered transformation T and verify them against theinput image features. If the veri�cation fails, go back to step (2) and repeat the procedure using a di�erentimage basis pair.We have seen that two points su�ce to de�ne a basis when the models are allowed to undergo a 2Dsimilarity transformation.However Geometric Hashing represents a uni�ed approach which applies also to many other usefultransformations encountered in object recognition problems. The only di�erence from one application toanother is the number of features that have to be used to form a basis for the reference frame. This, ofcourse, a�ects the complexity of the algorithm in the di�erent cases.The following list gives a number of examples in which this general paradigm applies. Almost in all thecases we will discuss point matching. Use of other features, such as lines, can be understood by analogy.We discuss �rst recognition of 2D objects from 2D data.1) Translation in 2D : the technique is applicable with a one-point basis. This point may be viewed asthe origin of the coordinate frame.2) Translation and rotation in 2D : a two-point basis can be used, however, one point with a direction(obtained, say, from an edge segment) has enough information for a unique de�nition of a basis.3) Translation, rotation and scale in 2D : this was discussed above.4) A�ne transformation in 2D : a three-point basis de�nes an unambiguous reference frame (see [12, 11,13] for some recognition results).5) Projective transformation in 2D : a four-point basis is needed to recover a projective transformationbetween two planes.When 3D data (such as range- or stereo-data of the objects) is available the recognition of 3D objectsfrom 3D images has to be considered. Development of techniques for this case is especially important inan industrial environment, where 3D data can be readily obtained and used. Geometric Hashing for the3D rigid transformation (translation and rotation) has been applied in CAD-CAM, Medical Imaging, andprotein comparison and docking in Molecular Biology (see also other papers in this issue).6) Translation in 3D : exactly as the 2D case. One point basis will su�ce.7) Translation and rotation in 3D : this is the interesting case corresponding to rigid motion. A basisconsisting of two non-collinear lines su�ces. Alternatively, three points can be used to de�ne a basiswith additional triangle-side length information.8) Translation, rotation and scale in 3D : a basis comprising of two non-collinear and non-planar linessu�ces. Alternatively, a point and a line can be used.In general, if the database contains m known models each comprising n features, the scene duringrecognition contains S features, and c features are needed to form a basis, the time complexity of the8

preprocessing phase is O(mnc+1). The recognition-phase complexity is O(Sc+1H), where H representsthe complexity of accessing a hash-table bin. As in all hashing techniques, H depends on the hash-tableoccupancy and bin distribution. If the size of the table is of the order of the elements it contains and thedistribution is uniform, the access complexity H will be equal to O(1). If, on the other hand, the tableis small, or all the features hash into a few bins only, the access time can be dominated by the numberof elements in the table, which is mnc+1. This is, however, an unlikely situation. It is important to notethat the hash-table and its structure are known in advance and before the recognition phase. Thus, onecan evaluate this structure and decide, whether it requires the application of re-hashing procedures, thesplitting of the table into several tables, or the change of the index structure to a higher-dimensional one.One should also note that the bins with high occupancy, which cause signi�cant computational e�ort, cansimply be ignored, since their information content is not salient enough to assist recognition. Thus, one candecide a-priori on an upper bound of the size of the hash-table bins that will be processed. An extensionof this idea leads to weighted voting, where the bin information is inversely proportional to the bin's size.In the previous discussion we considered cases where both the data of the object and the data ofthe scene have been given in the same dimension, either 2D or 3D. However, in the recognition of 3Dobjects from single 2D images we have the additional problem of the reduced dimension in the image spacecompared with the model space. A number of methods have been suggested to tackle this problem by theGeometric Hashing technique and can be found in [9].3 The Index DistributionsOne issue of particular importance is that of the index distribution over the space of invariants when theallowed transformation is known and �xed. The assumption is that all point features are identically andindependently distributed following a random process with a known probability density function f().Recall that the indices used to access the hash table are the (quantized) solution (u; v) to Eqn. 1. Sincethe properties of the random process generating the point features are known, the joint probability densityfunction f(u; v) of u and v can be computed using the expressionZR4 f(x(u; v); y(u; v))f(x�1; y�1)f(x�2 ; y�2) j J j�1 dx�1dx�2dy�1dy�2 ; (2)where J is the Jacobian of the transformation (for example, Eqn. 1). Evaluation of this integral yields thedistribution of indices over the space of invariants for the transformation under consideration.For the case of similarity transformations and feature points generated by a Gaussian random processN (0; ( �0 0�)), evaluation of the integral 2 yields f(u; v) = 12� � 1(4 (u2 + v2) + 3)2 : The distribution overthe space of invariants for synthetically generated indices, as well as several of its contours are shown inFigure 4 with the occupancies of the hash table encoded as heights in this Figure.When the random process giving rise to the feature points comprising the various database modelsis not known, it is typically possible to obtain an approximation f�() of the probability density functionusing numerical techniques: f�() can then be used instead of f() in the Eqn. 2. Alternatively, a numericalapproximation of f�(u; v) can be used. Details and results for additional transformations can be foundin [24, 22, 23]. 9

Figure 4: The distribution over the space of invariants, and several of its contours for the case of Gaussian-distributed point features. The allowed transformation is similarity.3.1 From Hashing To RehashingIf the probability density function for the distribution of indices over the space of invariants is available,it can be used e�ectively to provide substantial improvements in storage requirements and performance.The outlined methodology is generally applicable to all indexing-based object identi�cation methods.The non-uniform occupancy of the hash bins results in bins that contain many more entries than theexpected average. Since the longest such list will dominate the time spent in the histograming phase, auniform distribution of the entries is desirable.A uniform distribution can be achieved via a method that transforms the coordinates of point loca-tions in such a way that the equispaced quantizer in the space of invariants yields an expected uniformdistribution. To this end, knowledge of the expected joint probability density functions f(u; v) (or anapproximation of it) for the distribution of the untransformed coordinates (i.e. tuple of invariants) isrequired.We in essence seek a mapping h : R 2 ! R 2; that evenly distributes the hash bin entries over arectangular hash table. Notice that the range of the function h is the space of transformed invariants, andnot the space of features extracted from the input.For the example of similarity transformation above, it can be shown that one such mapping is given byh(u; v) = �1� 34 (u2 + v2) + 3 ; atan2(v; u)� : (3)In this expression, atan2(�; �) returns the phase in the interval [��; �]. It is interesting to note that thecomputed rehashing function does not include the standard deviation � of the feature generating processas a parameter. Figure 5 shows the result of hash table equalization for synthetic data. The referencespike at the upper left corner of the hash table is reproduced from Figure 4 and provides a measure of thebene�ts incurred by the rehashing operation. Clearly, the remapping is very e�cient. This table has thesame number of bins as the one in Figure 4. 10

Figure 5: Hash table equalization for the case of similarity transformations and point features generatedby the Gaussian process N (0; ( �0 0�)). Left: the expected distribution of remapped invariants. Right:several of the distribution's contours.Analogous results for other combinations of transformations and feature distributions can be foundin [24].3.2 Exploiting Existing SymmetriesIn addition to the savings from the use of rehashing functions, further computational and storage savingsare possible. Certain symmetries exist in the storage pattern of entries in the hash table; these symmetriesare independent of the use of rehashing functions and thus can be used in conjunction with the rehashingfunctions.In the cases of rigid and similarity transformations the symmetries are with respect to a point. I.e., forevery entry of the form (m; (�1; �2)) at location (u; v) of the hash table there will be a entry (m; (�2; �1))at location (�u;�v). For the a�ne transformation, the symmetries are with respect to an axis: everyentry of the form (m; (�1; �2)) at location (u; v) of the hash table will have a counterpart (m; (�2; �1)) atlocation (v; u). The practical importance of this is that we can dispose of half of the hash table, at theexpense of minimal additional bookkeeping. This will result in entry lists that, on the average, will be halfas long, when spread among the existing set of processors, leading to an expected speedup by a factor oftwo.4 Noise ModelingSo far we implicitly assumed that the features points in both the preprocessing and recognition phases arenoise-free, an assumption that does not hold in practice. We next discuss a suggestion that has been madein this context.A series of experiments examined the performance of geometric hashing with noisy feature points and11

cluttered scenes [24]. Figure 6 shows the method's performance as a function of the amount of noise.Noise at the input leads to positional errors, which in turn translate to errors in the computed invariants.\Small" input errors will give rise to the same computed invariant and thus the same index as the noise-free input. The semantics of \small" directly depends on the coarseness of quantization of the space ofinvariants. Once this coarseness is decided, an associated degree of tolerance is implicitly built into thehash table.Positional errors typically translate into the computation of \wrong" hash bin indices. But because ofthe nature of the employed hashing functions, the respective \wrong" bins are in the neighborhood (in aEuclidean sense) of the bins that would have been accessed had the input been noise-free (\correct" bins).In [10], it was suggested that a rectangular region of the hash table be accessed instead of a single bin. Inrelated work [2], weighted-voting was proposed.The exact shape of the neighborhood to be accessed during the recognition phase is generally compli-cated [24]. In particular, the size, shape and orientation of the regions that need to be accessed directlydepend on the selected basis tuple, as well as on the computed hash locations. Figure 7 shows this de-pendence for certain point arrangements and the similarity transformation. The variations of the region'sshape are much more pronounced when a�ne is the allowed transformation.Since the ultimate goal is the creation of working systems that can perform satisfactorily in the presenceof noise, the method was enhanced by incorporating a noise model. The derived formulas, in addition tobeing useful in quantifying the observed behavior, turned out to also be compatible with a Bayesianinterpretation of geometric hashing (see also the next section).First, the sensor noise was modeled by treating it as an additive Gaussian perturbation. The perturba-tions of the various feature points were assumed to be statistically independent and distributed accordingto a Gaussian distribution of standard deviation �, centered at the \true" value of the variable. Theseerrors were propagated through the hashing function of choice and, second and higher-order error termswere dropped { see [24] for the relevant analysis and results.The derived expressions allowed us to draw the following qualitative conclusions: (a) the larger theseparation of the two basis points, the smaller the spread in the space of invariants in the presence oferror; (b) for a given basis separation, the distance of the point whose coordinates we compute in thecoordinate frame of the basis also a�ects the spread: the smaller this point's distance from the center ofthe coordinate frame, the smaller the spread in the space of invariants; and (c) there exists a trade-o�between the indexing power of an invariant tuple and its sensitivity to noise: index values correspondingto relatively unpopulated regions of the space of invariants carry more information but are very sensitiveto noise { the opposite also holds true.5 A Bayesian FormulationIn this section, we present the Bayesian extension to the geometric hashing algorithm. The discussionherein over-simpli�es and by necessity ignores many of the details of the algorithm; a detailed expositionof the framework can be found in [24].Recall that given a scene S containing S points, S = fp`gS̀=1, the geometric hashing method selectstwo points of as a basis pair, say, B = fp�;p�g and attempts to determine if a model is present in thescene. The knowledge base consists of a database of m models fMkg, for k = 1; : : : ; m. Occasionally, theveri�cation step will fail to �nd a model that obtains su�cient support at which point another basis pair12

0

10

20

30

40

50

60

70

80

90

100

1/3pxl

2/3pxl

4/3pxl

8/3pxl-1 0 5 10 15 20

0

10

20

30

40

50

60

70

80

90

100

1/3pxl

2/3pxl

4/3pxl

8/3pxl-1 0 5 10 15 20Figure 6: Percentage of the embedded model's bases receiving k votes when used as probes, for di�erentamounts of Gaussian noise. The models can only undergo similarity transformations. Top: the modelspoints are distributed according to a Gaussian of �=1. Bottom: the models points are distributeduniformly over the unit disc. In both cases, the database contained 512 models, each consisting of 16points.

13

Figure 7: Regions of the hash table that need to be accessed in the case of Gaussian error in the positionsof the point features. The allowed transformation is similarity. The left graph of each pair shows thefeature space domain, whereas the right shows the space of invariants. For presentation purposes, theamount of Gaussian error was deliberately large.14

is selected and the entire analysis is repeated. The only source of evidence during the analysis each timeis the set S 0 = S � B, of scene points with the exception of those forming the basis.We wish to compute the probability Pr ((Mk; i; j;B) jS 0) that model Mk is present, with points i andj of the model respectively matching p� and p� of the basis set B, based on the information from thehash locations as computed by the scene points S 0 relative to the basis set. In particular, we wish to�nd the maximum of this expression over all possible Mk, i, j and B { a maximum likelihood approachto object recognition. The model/basis combination maximizing this expression is the most likely matchgiven the collection of hash values generated from S 0. A reasonable assumption that is made is: in theabsence of any match, the probability value of even the maximum winner will not be large. On the otherhand, if there is a match, then for some choice of B (and most likely, multiple choices) there will be a largeprobability value for some (Mk; i; j;B). If there are multiple models present, then several model/basiscombinations will share a large probability. It is su�cient to determine those few combinations that leadto the largest probabilities: indeed, one can always subject these \winners" to a veri�cation phase. Itsu�ces to determine the relative probabilities, and not the actual values.Additional conditional independence assumptions are needed: it is assumed that there is a large ex-pectation of hash values near the points of the table where (Mk; i; j) hash entries occur, and a uniformdensity (or some �xed density) elsewhere, regardless of what other hash values are known to occur. Theassumptions are reasonable if the features are chosen judiciously.Using Bayes' theorem the above formulation is shown equivalent to maximizinglog ( Pr ((Mk; i; j;B)) ) + Xp�2S0 log�Pr (p�j (Mk; i; j;B))Pr (p�) � (4)over all possible model/basis combinations and basis selections.This maximization captures the essence of the geometric hashing approach within a Bayesian frame-work: after having de�ned a basis B using points from a scene, votes are tallied for all model/basiscombinations using the information carried by the individual points in the scene, and the hash locationscomputed relative to the basis B. The model/basis combinations that have accumulated a lot of votes(or a large weighted vote) are probable instances of a model: the basis combination from that model willmatch the chosen basis B. Note that the redundant representation of the known models obviates the needfor exhaustive consideration of all model/basis combinations and basis selections before an answer can bereached. Also note that the denominators in each term of the second sum are the expected probabilitydensity values attached to a given location in hash space: their calculation was outlined in Section 3.We can quantify the contribution from a particular hash value based on a scene point p� to a model/basiscombination thus allowing us to extend the geometric hashing method to a Bayesian maximum-likelihoodmodel-matching system as follows:During preprocessing, every model Mk and every basis pair comprising points from Mk computes thehash locations of every other point within M . At each such hash location (x; y) we make an entry thatcontains information about the model Mk, the basis pair (i; j), the model point ` other than the basispoints, and a predicted normalized covariance radius which, for example, in the similarity transformationcase is � = (4(x2 + y2) + 3) � �2. The entries containing this information are organized in such a way thatgiven a location (u; v), all such records having an (x; y) value lying nearby can be easily accessed.During recognition, feature points are extracted from the scene, and a trial basis formed using a pairof these points, e.g. p� and p� . The coordinates of the remaining points in S are then computed relativeto the basis set, and a hash takes place to a location (u; v) in the hash domain. All nearby records of15

the form (x; y;Mk; i; j; `; �) are accessed. For each such nearby record, we record a weighted vote for themodel/basis combination (Mk; i; j). For the similarity transformation, the amount of the vote isz = log 1 + (4 (u2 + v2) + 3)2 kp� � p�k212S� � exp �k(u; v)� (x; y)k2�=kp� � p�k2 !! : (5)The neighborhood is de�ned as an expression involving z: something is considered to be in the neighborhoodif the value of the respective z is greater than some threshold. Note that the above expression incorporatesthe value of �, the expected error in positioning of the scene points, the number of scene points, s, and thebasis-pair separation distance. The value z is only an approximation of the expression (Eqn. 4), which isthe total contribution of the hash (u; v) to the model/basis (Mk; i; j), obtained by neglecting all terms inf except for the entry at (x; y).6 ConclusionIn this paper we have outlined the Geometric Hashing method which �nds large matching portions ofobjects even though the objects have undergone an arbitrary transformation and parts of them might beobscured. The strength of the technique is in its e�ciency, in the ability to su�er a signi�cant amount ofmismatch between the objects (occlusion, obscuration) and still detect the matching portions, and in theapplicability to almost any domain, where geometric matching is required.The method has been successfully applied to pattern matching problems in Computer Vision, inCAD/CAM and in Medical Imaging. Few of these applications are presented in this special issue. Asomewhat surprising application that was introduced by Nussinov and Wolfson( [21]), is to problems inStructural Molecular Biology ([3, 4]) and Medicinal Chemistry.The wide applicability of the method is possible since it does not require any domain speci�c knowledge.The only requirement is to supply the location of certain geometric interest features. We have demonstratedthe implementation of the method for the least informative feature - a point. Features carrying moreinformation, such as line segments, arcs, corners of speci�ed angles can signi�cantly speed up the algorithmby reducing the number of features required to an unambiguous de�nition of a basis. Stein and Medioni [16]have used Gray-encodings of groups of consecutive edge segments (supersegments) of varying cardinalitiesas informative features. Their TOSS system [17] for 3-D object recognition from range sensor data usescharacteristic curves and local di�erential patches. Forsyth et al. [6] present and use descriptors based onpairs of planar curves ; the descriptors are invariant under perspective transformations and thus generalizethe a�ne invariant curve matching approach of Lamdan et al. [11].The method gains its e�ciency from indexing/hashing of geometric invariants in a large memory. Thisallows to separate the matching process into two stages, which complexities add one to another (and do notmultiply each other). In this sense there is a rough similarity to the human recognition process. In the pre-processing or learning stage the relevant information is memorized into the hash table using transformationinvariant access keys. In the recognition stage, one can access directly to the relevant locations of thememory, without retrieving super uous information. An advantage of the technique is that the varioussuccessful retrievals (hits) are scored in a way that preserves the overall rigidity constraints of the objects,thus sharpening the "correct" hypothesis.Recently the method has been extended to handle e�ciently matching of objects with internal degreesof freedom, such as rotational or sliding (prismatic) joints [29]. This is an important extension, since robots,16

humans, many manufactured objects, biological molecules can be modeled as assemblies of rigid subpartsconnected by the aforementioned joints. This technique and its derivatives have been applied to therecognition of articulated objects in Computer Vision [18], docking of exible receptor-drug molecules [28],and detection of partially similar molecules in databases of drugs [27]. An interesting point about thisextension is that both Geometric Hashing and the Generalized Hough transform techniques applied torigid object matching can be viewed as particular cases of this new exible object matching technique.AcknowledgmentsThe research of H.J. Wolfson has been supported in part by the Hermann Minkowski { Minerva Centerfor Geometry at Tel Aviv University, by grant No. 95-0028 from the U. S.- Israel Binational ScienceFoundation (BSF), Jerusalem, Israel, by a grant of the Israeli Ministry of Science, by a Basic Researchgrant from Tel Aviv University and by the Daat consortium grant, administered by the Israeli Ministry ofIndustry and Trade.

17

References[1] Bourdon, O. and G. Medioni. Object Recognition Using Geometric Hashing on the Connection Machine. InProceedings of the International Conference on Patern Recognition, pages 596{600, Atlantic City, New Jersey,June 1990.[2] Costa, M., R. Haralick and L. Shapiro. Optimal A�ne Matching. In Proceedings of the 6th Israeli Conferenceon Arti�cial Intelligence and Computer Vision, Tel Aviv, Israel, December 1989.[3] Fischer, D., R. Nussinov, and H. Wolfson. 3D Substructure Matching in Protein Molecules. In Proceedings ofthe International Conference on Pattern Matching, Arizona, June 1992.[4] Fischer, D., S.L. Lin, H.J. Wolfson, and R. Nussinov. A Geometry-based Suite of Molecular Docking Processes.J. of Molecular Biology, 248:459{477, 1995.[5] Flynn, P. and A. Jain. 3D Object Recognition Using Invariant Feature Indexing of Interpretation Tables.Computer Vision, Graphics, and Image Processing: Image Understanding, 55(2):119{129, 1992.[6] Forsyth, D., J. Mundy, A. Zisserman, C.Coelho, A. Heller, and C. Rothwell. Invariant Descriptors for 3D ObjectRecognition and Pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10):971{991, 1991.[7] J. Schwartz, and M. Sharir. Indenti�cation if Partially Obscured Objects in Two and Three Dimensions byMatching Noisy Characteristic Curves. The International Journal of Robotics Research, 6(2):29{44, 1986.[8] Kalvin, A., E. Schonberg, J. Schwartz, and M. Sharir. Two-dimensionalModel-based Boundary Matching UsingFootprints. The International Journal of Robotics Research, 5(4):38{55, 1986.[9] Lamdan, Y. and H. Wolfson. Geometric Hashing: A General and E�cient Model-based Recognition Scheme.In Proceedings of the International Conference on Computer Vision, pages 238{249, 1988.[10] Lamdan, Y. and H. Wolfson. On the Error Analysis of Geometric Hashing. In Proceedings of the IEEE ComputerVision and Pattern Recognition Conference, Maui, Hawaii, June 1991.[11] Lamdan, Y., J. Schwartz and H. Wolfson. Object Recognition by A�ne Invariant Matching. In Proceedings ofthe IEEE Computer Vision and Pattern Recognition Conference, pages 335{344. Computer Society Press, LosAlamitos, CA, Ann Arbor, Michigan, June 1988.[12] Lamdan Y., J. Schwartz, and H. Wolfson. On Recognition of 3D Objects from 2D Images. In Proceedings of theIEEE International Conference on Robotics and Automation, pages 1407{1413, Philadelphia, PA, April 1988.[13] Lamdan, Y., J. Schwartz and H. Wolfson. A�ne Invariant Model-Based Object Recognition. IEEE Transactionson Robotics and Automation, 6(5):578{589, 1990.[14] Rigoutsos, I. and R. Hummel. Implementation of Geometric Hashing on the Connection Machine. In Proceedingsof the IEEE Workshop on Directions in Automated CAD-Based Vision. Computer Society Press, Los Alamitos,CA, Maui, Hawaii, June 1991.[15] Rigoutsos, I. and R. Hummel. Massively Parallel Model Matching: Geometric Hashing on the ConnectionMachine. IEEE Computer: Special Issue on Parallel Processing for Computer Vision and Image Understanding,25(2):33{42, February 1992.[16] Stein, F. and G. Medioni. E�cient Two-dimensional Object Recognition. In Proceedings of the InternationalConference on Patern Recognition, pages 596{600, Atlantic City, NewJersey, June 1990.[17] Stein, F. and G. Medioni. Structural Hashing: E�cient Three-dimensional Object Recognition. In Proceedingsof the IEEE Computer Vision and Pattern Recognition Conference, pages 244{250, Maui, Hawaii, June 1991.[18] A. Beinglass and H. J. Wolfson. Articulated Object Recognition, or, How to Generalize the Generalized HoughTransform. In Proceedings of the IEEE Computer Vision and Pattern Recognition Conference, pages 461 { 466,Maui, Hawaii, June 1991.[19] Costa, M., R. Haralick and L. Shapiro. Optimal A�ne-Invariant Matching: Performance Characterization. InProceedings SPIE Conference on Image Storage and Retrieval, San Jose, CA, January 1992.18

[20] Mohan, R., D. Weinshall and R. Sarukkai. 3D Object Recognition By Indexing Structural Invariants FromMultiple Views. In Proceedings of the International Conference on Computer Vision, pages 264{268, Berlin,Germany, May 1993.[21] Nussinov, R., and H.J. Wolfson. E�cient Detection of Three-dimensional Motifs In Biological MacromoleculesBy Computer Vision Techniques. Proc. Natl. Acad. Sci. USA, 88:10495{10499, 1991.[22] I. Rigoutsos. A�ne-Invariants That Distribute Uniformly and Can Be Tuned to Any Convex Feature Domain;Case I: Two-Dimensional Feature Domains. Technical Report RC20668, IBM T.J. Watson Research Center,December 1996.[23] I. Rigoutsos. A�ne-Invariants That Distribute Uniformly and Can Be Tuned to Any Convex Feature Domain;Case II: Three-Dimensional Feature Domains. Technical Report RC20879, IBM T.J. Watson Research Center,June 1996.[24] Rigoutsos, I. Massively Parallel Bayesian Object Recognition. PhD thesis, New York University, August 1992.[25] Rigoutsos, I. and A. Califano. Searching In Parallel For Similar Strings. IEEE Computational Science andEngineering, pages 60{75, Summer 1994.[26] Rigoutsos, I. and R. Hummel. A Bayesian Approach to Model Matching with Geometric Hashing. ComputerVision and Image Understanding, 61(7), July 1995.[27] Rigoutsos, I., D. Platt and A. Califano. Flexible 3D-Substructure Matching & Novel Conformer DerivationIn Very Large Databases Of Molecular Information. Technical Report RC20497, IBM T.J. Watson ResearchCenter, July 1996.[28] B. Sandak, R. Nussinov, and H.J. Wolfson. An Automated Robotics-Based Technique for Biomolecular Dockingand Matching allowing Hinge-Bending Motion. Computer Applications in the Biosciences(CABIOS), 11:87{99,1995.[29] H. J. Wolfson. Generalizing the Generalized Hough Transform. Pattern Recognition Letters, 12(9):565 { 573,1991.

19

Geometric Hashing

Documents

Transcript of Geometric Hashing