Geometric hashing: An overview

12
THEME ARTICLE 10 1070-9924/97/$10.00 © 1997 IEEE IEEE COMPUTATIONAL SCIENCE & ENGINEERING O bject recognition is the ultimate goal of most computer vision research. An ideal object re- cognition system should be able to recognize objects in an image that are partially occluded or have undergone geometric transformations. Most sys- tems will use a large database of models and apply model- based recognition. Say you want to give a robot the ability to recognize all objects and tools on a factory floor. If there are only a few hundred objects, you could design a database of these ob- jects and store it in the robot’s memory. When the robot receives a sensory image of its environment from a video camera or a range sensor, it should be able to quickly re- trieve from memory objects that appear in the image. Al- though quite natural in human vision, this task in a robot requires the solution of several complicated problems: 1. The objects in the acquired scene appear rotated and translated relative to their initial database position, and the whole scene undergoes a sensor-dependent transformation, such as the projective transformation of a video camera. 2. The objects in the scene may partially occlude each other, and the scene may include additional objects not included in the database. 3. It is computationally inefficient to retrieve each in- dividual object from the database and compare it against the observed scene in search of a match. For example, if the scene contains only round objects, it does not make sense to retrieve rectangular objects to match against it. We need a method that allows direct access to only the relevant information—such as an indexing-based ap- proach. For example, if you are looking for words in long strings of text, you could use a table accessed by indices that are functions of individual words. The table contains the strings where the word appears and the location of the word in the strings. It would be easy then to locate a word by retrieving all of its appearances from the table. This kind of approach was originally proposed for geo- metric object recognition, making use of indices based on local geometric features that remained invariant to the object transformation. The features were local to handle partial occlusion, and their indexing function was invariant to the relevant transformation, because unlike words in text, geometric features have both location and orientation. For over a decade now, indexing-based ap- proaches have been gaining ground as the method of choice for building working recognition systems that can Geometric Hashing: An Overview HAIM J. WOLFSON Tel Aviv University ISIDORE RIGOUTSOS IBM T.J. Watson Research Center Geometric hashing, a technique originally developed in computer vision for match- ing geometric features against a database of such features, finds use in a number of other areas. Matching is possible even when the recognizable database objects have undergone transformations or when only partial information is present. The technique is highly efficient and of low polynomial complexity. .

Transcript of Geometric hashing: An overview

T H E M E A R T I C L E

10 1070-9924/97/$10.00 © 1997 IEEE IEEE COMPUTATIONAL SCIENCE & ENGINEERING

Object recognition is the ultimate goal of mostcomputer vision research. An ideal object re-cognition system should be able to recognizeobjects in an image that are partially occluded

or have undergone geometric transformations. Most sys-tems will use a large database of models and apply model-based recognition.

Say you want to give a robot the ability to recognize allobjects and tools on a factory floor. If there are only a fewhundred objects, you could design a database of these ob-jects and store it in the robot’s memory. When the robotreceives a sensory image of its environment from a videocamera or a range sensor, it should be able to quickly re-trieve from memory objects that appear in the image. Al-though quite natural in human vision, this task in a robotrequires the solution of several complicated problems:

1. The objects in the acquired scene appear rotated andtranslated relative to their initial database position, and thewhole scene undergoes a sensor-dependent transformation,such as the projective transformation of a video camera.

2.The objects in the scene may partially occlude eachother, and the scene may include additional objects notincluded in the database.

3. It is computationally inefficient to retrieve each in-dividual object from the database and compare it againstthe observed scene in search of a match. For example, ifthe scene contains only round objects, it does not makesense to retrieve rectangular objects to match against it.

We need a method that allows direct access to only therelevant information—such as an indexing-based ap-proach. For example, if you are looking for words in longstrings of text, you could use a table accessed by indicesthat are functions of individual words. The table containsthe strings where the word appears and the location ofthe word in the strings. It would be easy then to locate aword by retrieving all of its appearances from the table.

This kind of approach was originally proposed for geo-metric object recognition, making use of indices basedon local geometric features that remained invariant tothe object transformation. The features were local tohandle partial occlusion, and their indexing function wasinvariant to the relevant transformation, because unlikewords in text, geometric features have both location andorientation. For over a decade now, indexing-based ap-proaches have been gaining ground as the method ofchoice for building working recognition systems that can

Geometric Hashing:An Overview

HAIM J. WOLFSON

Tel Aviv UniversityISIDORE RIGOUTSOS

IBM T.J. Watson Research Center

♦ ♦ ♦

Geometric hashing, a technique originally developed in computer vision for match-ing geometric features against a database of such features, finds use in a numberof other areas. Matching is possible even when the recognizable database objects

have undergone transformations or when only partial information is present.The technique is highly efficient and of low polynomial complexity.

.

OCTOBER–DECEMBER 1997 11

operate with large model databases. In its modern incarnation, geometric hashing,

a method based on the indexing approach, origi-nated in the work of Schwartz and Sharir.1,2

These first efforts concentrated on the recogni-tion of rotated, translated, and partially occludedtwo-dimensional objects from their silhouettesusing boundary-curve matching techniques. Asopposed to the simplified text analogy, the tech-nique in reality is more complicated, requiringshape information rather than just the location oflocal features. Two shapes may have the same lo-cal features yet be entirely different in appear-ance. If the shape’s rigidity is conserved, then notonly the local features but also their relative spa-tial configuration are important.

In order to exploit geometric consistency andto tackle model-based object recognition in two-dimensional and three-dimensional settings,Schwartz, Wolfson, and Lamdan developed anew geometric hashing technique applicable toarbitrary point sets, or constellations, under var-ious geometric transformations.3-5 They devel-oped efficient algorithms for recognizing flatrigid objects represented either by point sets orby curves under the affine approximation of theperspective transformation, and they extendedthe technique to recognize point sets under ar-bitrary transformations and to distinguish rigid3D objects from single 2D images.6

Since this early work, many research groupshave built and used geometric hashing systems.Most implementations have worked as well asclassical model-based vision systems, on thewhole delivering on the promise of greater effi-ciency. One of the advantages is that geometrichashing is inherently parallel. In fact, with min-imal communication and maintenance costs, theunderlying data structure can be easily decom-posed and shared among a number of cooperat-ing processors, and the technique has been im-plemented on the Connection Machine.7-8

Researchers also soon discovered that the dis-tribution of indices over the space of invariants isnonuniform.9-13 (This nonuniformity, however,is not specific to geometric hashing, but appearsto be endemic to all indexing schemes.14-15)From a practical angle, nonuniform distributionresults in different lengths for the hash entrylists. Since the length of the longest list domi-nates the time needed to carry out the histogramphase of the algorithm, a nonuniform distribu-tion will adversely affect the method’s perfor-mance. A uniform distribution, however, notonly reduces execution time but can also result

in much more efficient storage of the hash tabledata structure. Additionally, in parallel imple-mentation, a more or less constant co-occupancyof all the hash bins results in an improved loadbalancing among the proces-sors.8 Knowledge of the expres-sions for the index distributionsover the space of invariantsgreatly facilitates the equaliza-tion of the hash bin occupancy.

In addition, for the case wheremodels undergo similarity oraffine transformations, you canincorporate a noise model intothe geometric hashing frameworkand analytically determine its ef-fect on invariants. This analysisprovides a detailed description ofthe method’s behavior in the presence of noise.

With this augmented framework, you can de-velop a Bayesian approach to object recognitionwith geometric hashing.16 Augmentation of thetraditional geometric hashing algorithm withan error model layer and a Bayesian layer allowsthe creation of working systems that can oper-ate with real-world photographs and largemodel databases.17

Underlying ideasIn recognizing objects, geometric hashing and

indexing methods are efficient and can easily bemade parallel. These methods are especially at-tractive in model-based schemes, but they alsohold significant advantages in pairwise object-scene comparisons because of their ability to alsohandle partially occluded objects. However, thisis difficult because it is not known which data-base objects will appear and what their pose willbe. The model information is encoded in a pre-processing step and stored in a large memory, inthis case a hash table. The contents of the hashtable are independent of the scene and can thusbe computed offline, not affecting the recogni-tion time. Access to the memory is based on geo-metric information that is invariant of the ob-ject’s pose and computed directly from the scene.

During the recognition phase, and when pre-sented with a scene, the method accesses the pre-viously constructed hash table, indexing geo-metric properties of features extracted from thescene for matching with candidate models. Asearch of all scene features is still required, butgeometric hashing obviates a search of all modelsand their features.

.

One of the advantagesis that geometric

hashing is inherentlyparallel.

12 IEEE COMPUTATIONAL SCIENCE & ENGINEERING

Scene features—including such things as points,linear and curvilinear segments, and corners—areaccumulated during the feature extraction stage.The collection of features is represented by a setof dots, each dot representing a feature’s location.Associated with each dot is a list of one or moreattributes, depending on the feature’s type.

Suppose we wish to perform recognition of

patterns of point features in the presence of sim-ilarity transformations—that is, the point fea-tures may be translated, rotated, or scaled. (Geo-metric hashing can tackle other transformations,such as rigid and affine transformations, butsimilarity transformations are of moderate diffi-culty and effectively showcase the methodol-ogy.) The left side of Figure 1 shows model M1,which consists of five dots with position vectorsp1, p2, p3, p4, and p5. We want to encode thisdot information appropriately and store it into atable. This way, if the system detects this collec-tion of dots in a scene, it could conclude thatthey belong to the model M1.

If we assume for the moment that each dothas a unique, distinctive color, a potential albeitsimplistic indexing scheme would use the coloras the dot’s index: an entry in the hash bin wouldinclude the identity of the model to which thedot belongs. In the recognition stage, the sys-tem would simply scan the dots, access the hashtable using each dot’s color, and increase thecount of the models appearing in the accessedtable bins. Models accumulating high countshave high probability to be present in the scene.The computational complexity of such a schemewould be linear in the number of the scene dots.

However, what happens in the least informa-tive case, where dots belonging to a model haveno attributes except for their geometric config-uration? Is there a distinctive geometric “color”?Yes, the natural geometric “color” of a dot is theset of its coordinates, but coordinates depend ona reference frame. The question then becomesone of whether there is a natural reference framefor a model that will remain present under par-tial occlusion. One straightforward such choice

.

Rotation

Translation -1/2 1/2

yAppend (M1,(4,1))to end of hashbin list

3

45

2

1Scaling

Model

2

5

1

4

3

x

4

5

1

3

2

Figure 1. Determining the hash table entries when points 4 and 1 are used to define a basis. The models are allowed toundergo rotation, translation, and scaling. On the left of the figure, model M1 comprises five points.

o o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o oo

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

y

x

Figure 2. The locations of the hash table entries for model M1. Eachentry is labeled with the information “model M1” and the basis pair(i, j) used to generate the entry. The models are allowed toundergo rotation, translation, and scaling.

OCTOBER-DECEMBER 1997 13

is a pair of dots belonging to the model anddefining an unambiguous reference frame thatremains unchanged if the model undergoes ro-tation, translation, and/or scaling.

Let’s take the pair of dots p4, p1 as an orderedbasis to such a reference frame. As shown in Fig-ure 1, we scale the model M1 so that the magni-tude of in the Oxy system equals 1. Sup-pose now that we place the midpoint betweendots 4 and 1 at the origin of a coordinate systemOxy in such a way that the vector has thedirection of the positive x axis. The remainingthree points of M1 will land in three locations.Let’s record in a quantized hash table—in eachof the three bins where the remaining pointsland—the fact that model M1 with basis “(4, 1)”yields an entry in this bin.

Since our goal is to perform recognition un-der partial occlusion, we are not guaranteed thatboth basis points p1 and p4 will appear in eachscene where model M1 will be present. Conse-quently, we encode the model dot’s informationin all possible ordered basis pairs. Namely, thehash table contains three entries of the form (M1,(4, 2)), three entries of the form (M1, (4, 3)), andso on. Each triplet of entries is generated by firstscaling the model M1 so that the correspondingbasis has unit length in the Oxy coordinate sys-tem, and then by placing the midpoint of the ba-sis at the origin of the hash table in such a waythat the basis vector has the direction of the pos-itive x axis. The same process is repeated for eachmodel in the database. Of course, some hashtable bins may receive more than one entry. Asa result, the final hash table data structure willcontain a list of entries of the form (model, basis)in each hash table bin. Figure 2 shows the loca-

tions of all the hash table entries for model M1.In essence what we have done is define in

turn orthonormal bases for the coordinate sys-tem Oxy using a pair of vectorsand ; µ1 and µ2 denote distinctpoints taken from the model M1. For eachchoice of a basis, the remaining points p of M1are represented in this basis using the equation

(1)

where is the midpoint be-tween pµ1

and pµ2. The scalar quantities u and v

remain invariant under similarity transforma-tion of M1, and their quantization allows us todetermine an index (uq, vq) that will take us intoa location of a 2D hash table data structure. Inthe hash bin that is accessed via (uq, vq) we enterthe information (m, (pµ1

, pµ2)).

In the recognition phase, a pair of points (pµ1,

pµ2) from the image is chosen as a candidate ba-

sis. As before, this ordered basis defines a coor-dinate system Oxy whose center coincides withthe midpoint of the pair; the direction of the ba-sis vector pµ2

− pµ1coincides with that of the

positive x axis. The magnitude of the basis vec-tor defines the “unit” length for Oxy. The coor-dinates of all other points are then calculated inthe coordinate system defined by the chosen ba-sis. Each of the remaining image points ismapped to the hash table, and all entries in thecorresponding hash table bin receive a vote. Inessence, for the selected basis and for each re-maining point in the scene, Equation 1 is usedto determine the index (uq, vq) of a hash bin toaccess. As shown in Figure 3, each pair of (model,

p p p p− = +0s

xs

ysu v

.

p p4 1

p p4 1

p p pxs = −∆ ( )µ µ2 1

p pys

ys=∆ Rot ( )90

p p p0 1 22s = +( )µ µ /

5

3

21

4

Cast 1 vote

for each entry

in bin’s list

In the end histogram

all entries with one

or more votes

Model

Image

-1/2 1/2

y

x

Figure 3. Determining the hash table bins that are to be notified when two arbitrary image points are selected as a basis.Similarity transformation is allowed.

14 IEEE COMPUTATIONAL SCIENCE & ENGINEERING

basis) found in the accessed bin gets a vote.If we select a pair of scene points that corre-

sponds to a basis on one of the models, we expectit to accumulate as its score the votes from allother unoccluded points belonging to this model.If there are sufficient votes for one or more(model, basis) combinations, then a subsequentstage attempts to verify the presence of a modelwith the designated basis matching the chosenbasis points. In the case where model points aremissing from the image because they are ob-scured, recognition is still possible as long as asufficient number of points hash to the correctbins. The list of entries in each bin may be large,but because there are many possible models andbasis sets, the likelihood that a single model andsingle basis set will receive multiple votes is quitesmall, unless a configuration of transformedpoints coincides with a model or part of it.

In general, we do not expect the voting schemeto give only one candidate solution. The goal ofthe voting scheme is to act as a sieve and reducesignificantly the number of candidate hypothesesfor the verification step. For the algorithm to besuccessful it is sufficient to select as a basis tuple anyset of image points belonging to some model. It is not

necessary to hypothesize a correspondence be-tween specific model points and specific scenepoints, since all models and basis pairs are storedredundantly within the hash table. Classificationor perceptual grouping of features can be usedto make the search over scene features more ef-ficient—for example, by making use of only spe-cial basis tuples.

The two stages making up the core of the geo-metric hashing system are outlined in Figure 4.

We have seen that two points suffice to definea basis when the models are allowed to undergoa 2D similarity transformation. However, geo-metric hashing represents a unified approachthat applies also to many other useful transfor-mations encountered in object recognitionproblems. The only difference from one appli-cation to another is the number of features thathave to be used to form a basis for the referenceframe. This, of course, affects the computationalcomplexity of the algorithm in the differentcases, which still remains polynomial.

The following lists give a number of exampleswhere this general paradigm applies. Almost allcases involve point matching. Use of other fea-tures, such as lines, can be understood by anal-ogy. The first list covers recognition of 2D ob-jects from 2D data:

1.Translation in 2D: The technique is applica-ble using a one-point basis, the point beingviewed as the origin of the coordinate frame.

2.Translation and rotation in 2D: A two-pointbasis can be used, but one point with a direc-tion (obtained, say, from an edge segment) pro-vides enough information for a unique defini-tion of a basis.

3.Translation, rotation, and scaling in 2D: Dis-cussed earlier.

4. Affine transformation in 2D: A three-point ba-sis defines an unambiguous reference frame.3-5

5.Projective transformation in 2D: A four-pointbasis is needed to recover a projective transfor-mation between two planes.

When 3D data—such as range or stereo dataof the objects—is available, the recognition of3D objects from 3D images must be considered.Development of techniques for this case is es-pecially important in an industrial environment,where 3D data can be readily obtained and used.Geometric hashing for 3D rigid transformation(translation and rotation) has been applied inCAD/CAM, medical imaging, and protein com-parison and docking in molecular biology (as de-

.

Preprocessing phaseFor each model m do the following:1. Extract the model’s point features. Assume that n such features

are found.2. For each ordered pair, or basis, of point features do the following:(a) Compute the coordinates (u, v) of the remaining features in the

coordinate frame defined by the basis.(b) After proper quantization, use the tuple (uq, vq) as an index into a 2D

hash table data structure and insert in the corresponding hash table binthe information (m, (basis)), namely the model number and the basistuple used to determine (uq, vq).

Recognition phaseWhen presented with an input image, do the following:1. Extract the various points of interest. Assume that S is the set of the

interest points found; let S be the cardinality of S.2. Choose an arbitrary ordered pair, or basis, of interest points in the

image.3. Compute the coordinates of the remaining interest points in the coordin-

nate system Oxy that the selected basis defines.4. Appropriately quantize each such coordinate and access the appropriate

hash table bin; for every entry found there, cast a vote for the model andthe basis.

5. Histogram all hash table entries that received one or more votes duringstep 4. Proceed to determine those entries that received more than acertain number, or threshold, of votes: Each such entry corresponds to apotential match.

6 . For each potential match discovered in step 5, recover the transforma-tion T that results in the best least-squares match between all cor-responding feature pairs.

7. Transform the features of the model according to the recovered transfor-mation T and verify them against the input image features. If the verifica-tion fails, go back to step 2 and repeat the procedure using a differentimage basis pair.

Figure 4. The two stages of the geometric hashing system.

OCTOBER-DECEMBER 1997 15

scribed in other articles in this issue). Here areexamples of 3D transformations:

1.Translation in 3D: As in the 2D case, a one-point basis will suffice.

2.Translation and rotation in 3D: This is theinteresting case corresponding to rigid motion.A basis consisting of two noncollinear lines suf-fices. Alternatively, three points, with additionaltriangle-side length information, can be used todefine a basis.

3.Translation, rotation, and scaling in 3D: A ba-sis comprising two noncollinear and nonplanarlines will suffice. Alternatively, a point and a linecan be used.

In general, if♦ the database contains M known models,

each comprising n features,♦ the scene during recognition contains S fea-

tures, and♦ c features are needed to form a basis,

then the time complexity of the preprocessingphase is O(Mnc+1). The complexity of the recog-nition phase is O(HSc+1), where H represents thecomplexity of processing a hash table bin. As in allhashing techniques, H depends on the hash tableoccupancy and bin distribution. If the size of thetable is of the order of the elements it contains andthe distribution is uniform, the access complexityH will be equal to O(1). If, on the other hand, thetable is small or all of the features hash into only afew bins, the access time may be dominated by thenumber of elements in the table, which is Mnc+1.

This is, however, an unlikely situation. It is im-portant to note that the hash table and its struc-ture are known in advance and before the recog-nition phase. Thus, you can evaluate this structureand decide whether it requires the application ofrehashing procedures, the splitting of the tableinto several tables, or the changing of the indexstructure to a higher-dimensional one. It is alsoimportant to mention the bins with high occu-pancy, which cause significant computational ef-fort, can simply be ignored—their informationcontent is not salient enough to assist recognition.Thus, you can decide a priori on an upper boundto the size of the hash table bins that will beprocessed. An extension of this idea leads toweighted voting, where the bin information is in-versely proportional to the bin’s size.

In the recognition of 3D objects from single2D images we have the additional problem ofdifferent dimensions. A number of methodshave been suggested to tackle this problem using

geometric hashing; see elsewhere.6

Index distributionsOne issue of particular importance is index dis-

tribution over the space of invariants when the al-lowed transformation is known and fixed. The as-sumption is that all point features are identicallyand independently distributed following a randomprocess with a known probability density functionf( ). Recall that the indices used to access the hashtable are the quantized solution (u, v) to Equation1. Since the properties of the random process gen-erating the point features are known, the jointprobability density function f(u, v) of u and v canbe computed using the expression

(2)

where J is the Jacobian of the transformation (asin Equation 1). Evaluation of this integral yieldsthe distribution of indices over the space of invari-ants for the transformation under consideration.

For example, in the case of similarity trans-formations and feature points generated by theGaussian random process

,

evaluation of the integral 2 yields

.

The distribution over the space of invariants forsynthetically generated indices, as well as severalof its contours, are shown in Figure 5, with theoccupancies of the hash table encoded as heights.

When the random process giving rise to thefeature points comprising the various databasemodels is not known, it is typically possible to ob-tain an approximation f *( ) of the probability den-sity function using numerical techniques: you canuse f *( ) instead of f ( ) in Equation 2. Alternatively,you can use a numerical approximation of f *(u, v).

From hashing to rehashingIf the probability density function for the dis-

tribution of indices over the space of invariantsis available, you can effectively use it to provide

f u v

u v( , )

( ( ) )= ⋅

+ +12 1

4 32 2 2π

N 0

00,σ

σ

f x u v y u v f x y f x y

J dx dx dy dy

, , , , ,

| | ,

( ) ( )( ) ( ) ( )∫−

µ µ µ µ

µ µ µ µ

1 14

2 2

1 2 1 2

1R

.

.

substantial improvements in storage require-ments and performance. The outlined method-ology is generally applicable to all indexing-based object identification methods.

The nonuniform occupancy of the hash binsresults in bins that contain a large number ofentries. Since the longest such list dominatesthe time spent in the histogram phase, a uni-form distribution of the entries is desirable. Auniform distribution can be achieved by trans-forming the coordinates of point locations sothat the equispaced quantizer in the space of in-variants yields an expected uniform distribu-tion. To this end, knowledge of the expectedjoint probability density function f(u, v) (or anapproximation of it) for the distribution of theuntransformed coordinates (that is, the tuple ofinvariants) is required.

In essence we seek a mapping, ,that evenly distributes the hash bin entries over arectangular hash table. Notice that the range of

the function h is the space of transformed in-variants and not the space of features extractedfrom the input.

For the similarity transformation exampleabove, one such mapping is as follows:

(3)

In this expression, atan2(⋅,⋅) returns the phase inthe interval [−π, π]. It is interesting to note thatthe computed rehashing function does not in-clude the standard deviation s of the feature-gen-erating process as a parameter. Figure 6 showsthe result of hash table equalization for syntheticdata. The reference spike at the upper left cor-ner of the hash table is reproduced from Figure 5and provides a measure of the benefits incurredby the rehashing operation. Clearly, the remap-ping is very efficient. This table has the samenumber of bins as the one in Figure 5.17

h u v v u

u v( , ) , atan ( , )

( )= −

+ +

1 234 32 2

16 IEEE COMPUTATIONAL SCIENCE & ENGINEERING

Figure 6. Hash table equalization for the case of similarity transformations and point features generated by the Gaussianprocess : (a) the expected distribution of remapped invariants; (b) several of the distribution’s contours.

(a) (b)

Figure 5. The distribution over the space of invariants, and several of its contours, for thecase of Gaussian-distributed point features. Similarity transformation is allowed.

(a) (b)

h: 2 2R R→

N 00

0,σ

σ

OCTOBER-DECEMBER 1997 17

Exploiting existing symmetriesIn addition to savings from the use of rehash-

ing functions, further computational and stor-age savings are possible. Symmetries typicallyexist in the storage pattern of entries in the hashtable; these symmetries are independent of theuse of rehashing functions and thus can be usedin conjunction with the rehashing functions.

For rigid and similarity transformations, thesymmetries are with respect to a point: For everyentry of the form (µ, (µ1, µ2)) at location (u, v) ofthe hash table, there will be an entry (µ, (µ2, µ1))at location (−u, −v). For the affine transformation,the symmetries are with respect to an axis: Everyentry of the form (µ, (µ1, µ2)) at location (u, v) ofthe hash table will have a counterpart (µ, (µ2, µ1))at location (v, u). The practical importance of thisis that we can dispose of half of the hash table—atthe expense of minimal additional bookkeeping.This will result in entry lists that, on average, willbe half as long when spread among the existingset of processors, leading to an expected speedupby a factor of two.

Noise modelingWe have so far implicitly assumed that the

feature points in both the preprocessing andrecognition phases are noise-free, an assump-tion that does not hold in practice.17 Figure 7shows the method’s performance as a functionof the amount of noise. Noise at input leads topositional errors, which in turn translate to er-rors in the computed invariants. “Small” inputerrors will give rise to the same computed in-variant and thus the same index as the noise-freeinput. The semantics of small directly dependson the coarseness of quantization of the spaceof invariants. Once this coarseness is decided,an associated degree of tolerance is implicitlybuilt into the hash table.

Positional errors typically translate into thecomputation of “wrong” hash bin indices. But be-cause of the nature of the employed hashing func-tions, the respective “wrong” bins are in the neigh-borhood (in a Euclidean sense) of the “correct”bins that would have been accessed had the inputbeen noise-free. Other solutions to this probleminvolve accessing a rectangular region of the tableinstead of a single bin,18 or weighted voting.9

The exact shape of the neighborhood to beaccessed during the recognition phase is gener-ally complicated.17 In particular, the size, shape,and orientation of the regions that need to beaccessed depend directly on the selected basis

tuple, as well as on the computed hash locations.Figure 8 shows this dependence for certainpoint arrangements and the similarity transfor-mation. The variations of the region’s shape aremuch more pronounced when an affine trans-formation is allowed.

Since the ultimate goal is the creation of workingsystems that can perform satisfactorily in the pres-ence of noise, the method was enhanced by incor-porating a noise model. The derived formulas, inaddition to being useful in quantifying the observedbehavior, turned out to also be compatible with aBayesian interpretation of geometric hashing.

.

0

10

20

30

40

50

60

70

80

90

100

1/3 pxl

2/3 pxl

4/3 pxl

8/3 pxl-1 0 5 10 15 20

0

10

20

30

40

50

60

70

80

90

100

1/3 pxl

2/3 pxl

4/3 pxl

8/3 pxl-1 0 5 10 15 20

(a)

(b)

Figure 7. Percentage of the embedded model’s bases receiving kvotes, when used as probes, for different amounts of Gaussian noise.The models can only undergo similarity transformations.(a) The model points are distributed according to a Gaussian of σ = 1.(b) The model points are distributed uniformly over the unit disc. Inboth cases, the database contained 512 models, each consistingof 16 points.

18 IEEE COMPUTATIONAL SCIENCE & ENGINEERING

First, the sensor noise was modeled by treat-ing it as an additive Gaussian perturbation.The perturbations of the various feature pointswere assumed to be statistically independentand distributed according to a Gaussian distri-bution of standard deviation σ, centered at the“true” value of the variable. These errors werepropagated through the hashing function ofchoice, and second and higher-order errorterms were dropped.17

The derived expressions allowed us to drawthe following qualitative conclusions:

1.The larger the separation of the two basispoints, the smaller the spread in the space of in-variants in the presence of error.

2.For a given basis separation, the distance ofthe point whose coordinates we compute in thecoordinate frame of the basis also affects thespread: The smaller this point’s distance fromthe center of the coordinate frame, the smallerthe spread in the space of invariants.

3.A trade-off exists between the indexing

power of an invariant tuple and its sensitivity tonoise: Index values corresponding to relativelyunpopulated regions of the space of invariantscarry more information but are very sensitive tonoise—the opposite also holds true.

Bayesian formulation Recall that given a scene S containing

S points, , the geometric hashingmethod selects two points as a basis pair, sayB = {pµ, pν}, and attempts to determine if amodel is present in the scene. The knowledgebase consists of a database of M models {Mk},for k = 1,…, M. Occasionally the verificationstep will fail to find a model that obtains suffi-cient support, at which point another basis pairis selected and the entire analysis is repeated.The only source of evidence during each analy-sis is the set S′ = S − B of scene points, with theexception of those forming the basis.

We wish to compute the probabilityPr((Mk, i, j, B)|S′) that model Mk is present, withpoints i and j of the model respectively match-ing pµ and pν of the basis set B, based on the in-formation from the hash locations as computedby the scene points S′ relative to the basis set. Inparticular, we wish to find the maximum of thisexpression over all possible Mk, i, j, and B—amaximum likelihood approach to object recogni-tion. The model/basis combination maximizingthis expression is the most likely match given thecollection of hash values generated from S′. Areasonable assumption made is that, in the ab-sence of any match, the probability value of eventhe maximum winner will not be large. On theother hand, if there is a match, then for somechoice (or, most likely, multiple choices) of Bthere will be a large probability value for some(Mk, i, j, B). If there are multiple models present,then several model/basis combinations will sharea large probability. It is sufficient to determinethose few combinations that lead to the largestprobabilities. Indeed, one can always subjectthese “winners” to a verification phase. It sufficesto determine the relative probabilities and notthe actual values.

Additional conditional independence assump-tions are needed: It is assumed that a large numberof hash values is expected near the points of thetable where (Mk, i, j) hash entries occur and a uni-form density (or some fixed density) is expectedelsewhere, regardless of what other hash valuesare known to occur. The assumptions are reason-able if the features are chosen judiciously.

.

Figure 8. Regions of the hash table that need to be accessed whenthere is Gaussian error in the positions of the point features. Simi-larity transformation is allowed. The left graph of each pair showsthe feature space domain, whereas the right shows the space of in-variants. For presentation purposes, the amount of Gaussian errorwas deliberately made large.

S l lS= ={ }p 1

OCTOBER-DECEMBER 1997 19

Using Bayes’s theorem, the above formulationcan be shown equivalent to maximizing

(4)

over all possible model/basis combinations andbasis selections.

This maximization captures the essence of thegeometric hashing approach within a Bayesianframework: After having defined a basis B usingpoints from a scene, votes are tallied for allmodel/basis combinations using the informationcarried by the individual points in the scene, andthe hash locations are computed relative to thebasis B. The model/basis combinations that haveaccumulated a lot of votes (or a large weightedvote) are probable instances of a model: The ba-sis combination from that model will match thechosen basis B. The redundant representation ofthe known models obviates the need for exhaus-tive consideration of all model/basis combina-tions and basis selections before an answer can bereached. Also, the denominators in each term ofthe second sum are the expected probability den-sity values (whose calculation was outlined ear-lier) attached to a given location in hash space.

We can quantify the contribution from a par-ticular hash value generated by a scene point pxto a model/basis combination, thus allowing usto extend the geometric hashing method to aBayesian maximum-likelihood model-matchingsystem as follows:

During preprocessing, every model Mk, andevery basis pair comprising points from Mk,computes the hash locations of every other pointwithin M. At each such hash location (x, y) wemake an entry that contains information aboutthe model Mk, the basis pair (i,j), the modelpoint , (other than the basis points), and a pre-dicted normalized covariance radius, which forexample in the similarity transformation case isτ = (4(x2 + y2) + 3) σ 2. The entries containing thisinformation are organized in such a way thatgiven a location (u, v), all such records having an(x, y) value lying nearby can be easily accessed.

During recognition, feature points are ex-tracted from the scene and a trial basis formedusing a pair of these points—for example, pµ andpν. The coordinates of the remaining points in Sare then computed relative to the basis set, and ahash access is made to a location (u, v) in thehash domain. All nearby records of the form (x,

y, Mk, i, j, ,, t) are accessed. For each suchnearby record, we record a weighted vote for themodel/basis combination (Mk, i, j). For the sim-ilarity transformation, the amount of the vote isdetermined as follows:

(5)

The neighborhood is defined as an expressioninvolving z: Something is considered to be in theneighborhood if the value of the respective z isgreater than some threshold. Note that the aboveexpression incorporates the value of σ, the ex-pected error in positioning of the scene points,the number of scene points, S, and the basis-pairseparation distance. The value z is only an ap-proximation of the expression (Equation 4),which is the total contribution of the hash (u, v) tothe model/basis (Mk, i, j), obtained by neglectingall terms in f except for the entry at (x, y).

The geometric hashing method allows sys-tems to recognize objects even when the ob-

jects have undergone an arbitrary transformationand when parts of them might be occluded. Thestrength of the technique is in its efficiency, in itsability to operate in the presence of only partialinformation, and in its applicability to almost anydomain where geometric matching is required; itdoes not require any domain-specific knowl-edge—only the location of certain geometric in-terest features. The method has been successfullyapplied to pattern-matching problems in com-puter vision, CAD/CAM, and medical imaging—some covered in this special issue. A somewhatunexpected application, introduced by Nussinovand Wolfson,19 was to problems in structuralmolecular biology20-21 and medicinal chemistry.

We have demonstrated the implementation ofthe method for the least informative feature: apoint. Features carrying more information—suchas line segments, arcs, and corners of specified an-gles—significantly speed up the algorithm by re-ducing the number of features required for an un-ambiguous definition of a basis. Stein andMedioni have used gray encodings of groups ofconsecutive edge segments (supersegments) ofvarying cardinalities as informative features.11

Their TOSS system for 3D object recognitionfrom range sensor data uses characteristic curvesand local differential patches.22 Forsyth et al.23 use

z

u v p pS

u v x y

p pv

v= + ⋅

+ + − − −

−log exp

( ( ) ) ( , ) ( , )

/1

4 312

2 2 2 2 2

µτ τ

log(Pr(( , , , )) ) logPr( ( , , , ))

Pr ( )'

M i j Bkp M i j B

pp S

k+

∈∑ ξ

ξξ

.

20 IEEE COMPUTATIONAL SCIENCE & ENGINEERING

descriptors based on pairs of planar curves; the de-scriptors are invariant under perspective transfor-mations and thus generalize the affine invariantcurve-matching approach of Lamdan et al.4

Geometric hashing gains its efficiency fromindexing, or hashing, of geometric invariants ina large memory. The matching process can beseparated into two stages, with complexitiesadding one to another—and not multiplyingeach other; the preprocessing stage can occuroff-line. In this stage, the system stores the rel-evant information in the hash table using trans-formation-invariant access keys. In the recogni-tion stage, the system directly accesses therelevant locations of the memory, without re-trieving superfluous information. An advantageof the technique is that the various successful re-trievals, or hits, are scored in a way that pre-serves the overall rigidity constraints of the ob-jects, thus sharpening the “correct” hypothesis.

The geometric hashing method was recentlyextended to efficiently handle the matching ofobjects with internal degrees of freedom, suchas rotational or sliding (prismatic) joints.24 Thisis an important extension, since robots, humans,many manufactured objects, and biological mol-ecules can be modeled as assemblies of rigid sub-parts connected by such joints. This techniqueand its derivatives have been applied to therecognition of articulated objects in computervision,25 to the docking of flexible receptor-drugmolecules,26 and to the detection of partiallysimilar molecules in databases of drugs.15 An in-teresting point about this extension is that bothgeometric hashing and the generalized Houghtransform techniques applied to rigid objectmatching can be viewed as particular cases ofthis new flexible object-matching technique. ♦

AcknowledgmentsHaim Wolfson’s research has been supported in part bythe Hermann Minkowsk-Minerva Center for Geometry atTel Aviv University, by grant no. 95-0028 from the U.S.-Israel Binational Science Foundation, by a grant of theIsraeli Ministry of Science, by a basic research grant fromTel Aviv University, and by the Daat consortium grant,administered by the Israeli Ministry of Industry and Trade.

References1. J. Schwartz and M. Sharir, “Indentification of Par-

tially Obscured Objects in Two and Three Dimen-sions by Matching Noisy Characteristic Curves,” Int’lJ. Robotics Research, Vol. 6, No. 2, 1986, pp. 29–44.

2. A. Kalvin et al., “Two-Dimensional Model-BasedBoundary Matching Using Footprints,” Int’l J. Ro-

botics Research, Vol. 5, No. 4, 1986, pp. 38–55.3. Y. Lamdan, J. Schwartz, and H. Wolfson, “On Recog-

nition of 3D Objects from 2D Images,” Proc. IEEE Int’lConf. Robotics and Automation, IEEE Computer Soc.,Los Alamitos, Calif., 1988, pp. 1407–1413.

4. Y. Lamdan, J. Schwartz, and H. Wolfson, “ObjectRecognition by Affine Invariant Matching,” Proc.IEEE Computer Vision and Pattern Recognition Conf.,IEEE Computer Society, 1988, pp. 335–344.

5. Y. Lamdan, J. Schwartz, and H. Wolfson, “AffineInvariant Model-Based Object Recognition,” IEEETrans. Robotics and Automation, Vol. 6, No. 5,1990, pp. 578–589.

6. Y. Lamdan and H. Wolfson, “Geometric Hashing:A General and Efficient Model-Based Recogni-tion Scheme,” Proc. Int’l Conf. Computer Vision,IEEE Computer Society, 1988, pp. 238–249.

7. O. Bourdon and G. Medioni, “Object Recogni-tion Using Geometric Hashing on the ConnectionMachine,” Proc. Int’l Conf. Pattern Recognition,IEEE Computer Society, 1990, pp. 596–600.

8. I. Rigoutsos and R. Hummel, “Massively ParallelModel Matching: Geometric Hashing on the Con-nection Machine,” Computer, Vol. 25, No. 2, Feb.1992, pp. 33–42.

9. M. Costa, R. Haralick, and L. Shapiro, “OptimalAffine Matching,” Proc. Sixth Israeli Conf. ArtificialIntelligence and Computer Vision, Israeli InformationSoc. Press, Ramat Gan, Israel, 1989, pp. 35–61.

10. M. Costa, R. Haralick, and L. Shapiro, “OptimalAffine-Invariant Matching: Performance Charac-terization,” Proc. SPIE Conf. Image Storage and Re-trieval, SPIE, Bellingham, Wash., 1992.

11. F. Stein and G. Medioni, “Efficient Two-DimensionalObject Recognition,” Proc. Int’l Conf. Pattern Recog-nition, IEEE Computer Society, 1990, pp. 596–600.

12. P. Flynn and A. Jain, “3D Object Recognition UsingInvariant Feature Indexing of Interpretation Ta-bles,” Computer Vision, Graphics, and Image Pro-cessing: Image Understanding, Vol. 55, No. 2,1992, pp. 119–129.

13. R. Mohan, D. Weinshall, and R. Sarukkai, “3D Ob-ject Recognition by Indexing Structural Invariantsfrom Multiple Views,” Proc. Int’l Conf. Computer Vi-sion, IEEE Computer Society, 1993, pp. 264–268.

14. I. Rigoutsos and A. Califano, “Searching in Parallelfor Similar Strings,” IEEE Computational Science & En-gineering, Vol. 1, No. 2, Summer 1994, pp. 60–75.

15. I. Rigoutsos, D. Platt, and A. Califano, Flexible 3D-Substructure Matching and Novel Conformer De-rivation in Very Large Databases of Molecular Infor-mation, Tech. Report RC20497, IBM T.J. WatsonResearch Center, Yorktown Heights, N.Y., 1996.

16. I. Rigoutsos and R. Hummel, “A Bayesian Ap-proach to Model Matching with Geometric Hash-

.

OCTOBER-DECEMBER 1997 21

ing,” Computer Vision and Image Understanding,Vol. 61, No. 7, July 1995, pp. 11–26.

17. I. Rigoutsos, Massively Parallel Bayesian ObjectRecognition, doctoral dissertation, New York Univ.,Computer Science Dept., New York, 1992.

18. Y. Lamdan and H. Wolfson, “On the Error Analysis ofGeometric Hashing,” Proc. IEEE Computer Vision andPattern Recognition Conf., IEEE Computer Society,1991, pp. 22–27.

19. R. Nussinov and H.J. Wolfson, “Efficient Detectionof Three-Dimensional Motifs in Biological Macro-molecules by Computer Vision Techniques,” Proc.Nat’l Academy of Sciences USA, Vol. 88, Nat’lAcademy of Science, Washington, D.C., 1991,pp. 10495–10499.

20. D. Fischer, R. Nussinov, and H. Wolfson, “3D Sub-structure Matching in Protein Molecules,” Proc.3rd Int’l Symp. Combinatorial Pattern Matching,Lecture Notes in Computer Science, No. 644,Springer-Verlag, New York, 1992, pp. 136–150.

21. D. Fischer et al., “A Geometry-Based Suite of Mol-ecular Docking Processes,” J. Molecular Biology,1995, pp. 459–477.

22. F. Stein and G. Medioni, “Structural Hashing: Effi-cient Three-Dimensional Object Recognition,”Proc. IEEE Computer Vision and Pattern RecognitionConf., IEEE Computer Society, 1991, pp. 244–250.

23. D. Forsyth et al., “Invariant Descriptors for 3D Ob-ject Recognition and Pose,” IEEE Trans. PatternAnalysis and Machine Intelligence, Vol. 13, No. 10,1991, pp. 971–991.

24. H.J. Wolfson. “Generalizing the GeneralizedHough Transform,” Pattern Recognition Letters, Vol.12, No. 9, 1991, pp. 565–573.

25. A. Beinglass and H. J. Wolfson, “Articulated ObjectRecognition, or How to Generalize the GeneralizedHough Transform,” Proc. IEEE Computer Vision andPattern Recognition Conf., IEEE Computer Society,1991, pp. 461–466.

26. B. Sandak, R. Nussinov, and H.J. Wolfson, “An Au-tomated Robotics-Based Technique for Biomole-cular Docking and Matching Allowing Hinge-Bending Motion,” Computer Applications in theBiosciences, Vol. 11, 1995, pp. 87–99.

Haim J. Wolfson received his BSc, MSc, and PhD inmathematics from Tel Aviv University. From 1985 to1989 he was with the Robotics Laboratory of theCourant Institute of Mathematical Sciences, New YorkUniversity, and in 1989 he returned to Tel Aviv Uni-versity, where he is currently an associate professor of

computer science. His main research interests focus onefficient algorithms for object recognition in computervision and for protein-structure comparison and bio-molecular recognition in structural molecular biology.Together with Ruth Nussinov from Tel Aviv University’sFaculty of Medicine, he is leading an interdisciplinaryresearch team in computational structural biology.

Isidore Rigoutsos received his BSc in physics from theUniversity of Athens, Greece, and his PhD in computerscience from the Courant Institute of Mathematical Sci-ences of New York University. Since 1992, he has beenwith IBM’s T.J. Watson Research Center, where he is cur-rently leading the effort in bioinformatics and patterndiscovery in biological sequences. His research activitiesfocus on combinatorial approaches to computationalbiology and bioinformatics, the design of efficient in-variant schemes for knowledge representation, and ap-plied mathematics. Rigoutsos is a member of the AAAS,the IEEE, and the IEEE Computer Society.

The authors can be reached in care of Rigoutsos at IBMT.J. Watson Research Center, PO Box 704, YorktownHeights, NY 10598; e-mail, [email protected].

.

Coming in IEEE CS&Ein 1998!

Jan/March: ASCI: The Grandest Challenge

April/June: Languages for ComputationalScience & Engineering

July/Sept: Fast Computation of N-Body Problems

Oct/Dec: Computational Science in Communications and Networking

http://computer.org