Image registration using BP-SIFT

25
Accepted Manuscript Image Registration Using BP-SIFT Yingxuan Zhu, Samuel Cheng, Vladimir Stanković, Lina Stanković PII: S1047-3203(13)00025-4 DOI: http://dx.doi.org/10.1016/j.jvcir.2013.02.005 Reference: YJVCI 1162 To appear in: J. Vis. Commun. Image R. Received Date: 13 February 2011 Accepted Date: 11 February 2013 Please cite this article as: Y. Zhu, S. Cheng, V. Stanković, L. Stanković, Image Registration Using BP-SIFT, J. Vis. Commun. Image R. (2013), doi: http://dx.doi.org/10.1016/j.jvcir.2013.02.005 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Transcript of Image registration using BP-SIFT

Accepted Manuscript

Image Registration Using BP-SIFT

Yingxuan Zhu, Samuel Cheng, Vladimir Stanković, Lina Stanković

PII: S1047-3203(13)00025-4

DOI: http://dx.doi.org/10.1016/j.jvcir.2013.02.005

Reference: YJVCI 1162

To appear in: J. Vis. Commun. Image R.

Received Date: 13 February 2011

Accepted Date: 11 February 2013

Please cite this article as: Y. Zhu, S. Cheng, V. Stanković, L. Stanković, Image Registration Using BP-SIFT, J. Vis.

Commun. Image R. (2013), doi: http://dx.doi.org/10.1016/j.jvcir.2013.02.005

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers

we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and

review of the resulting proof before it is published in its final form. Please note that during the production process

errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Image Registration Using BP-SIFT

Yingxuan Zhua, Samuel Chengb, Vladimir Stankovicc, Lina Stankovicc

aImage Analytics Lab, GE Global Research, Niskayuna, NY, 12309 USAbSchool of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK,

74135 USAcDepartment of Electronics and Electrical Engineering, University of Strathclyde,

Glasgow G1 1XW UK

Abstract

Scale Invariant Feature Transform (SIFT) is a powerful technique for im-age registration. Although SIFT descriptors accurately extract invariantimage characteristics around keypoints, the commonly used matching ap-proaches of registration loosely represent the geometric information amongdescriptors. In this paper, we propose an image registration algorithm namedBP-SIFT, where we formulate keypoint matching of SIFT descriptors as aglobal optimization problem and provide a suboptimum solution using beliefpropagation (BP). Experimental results show significant improvement overconventional SIFT-based matching with reasonable computation complexity.

Keywords: Image registration, belief propagation, SIFT

1. Introduction

Image registration is widely used in numerous applications including com-puter vision, medical imaging, panoramic imaging, and hyperspectral imag-ing. In these applications, image registration is a key step to integrate imagesfor further analysis. Image registration transforms a set of images, which canbe acquired at different times, in different locations, from different angles, orby different scales, into the same coordinate system according to the infor-mation shared among these images.

Image registration algorithms have been introduced and summarized in[1, 2]. Local descriptors are effective in representing the features of imagesin registration [3, 4, 5, 6]. The Scale Invariant Feature Transform (SIFT)algorithm [7, 8] generates one type of popular descriptors and is applied to

Preprint submitted to Visual Communication and Image Representation July 9, 2012

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

matching the invariant features between images [9, 10]. SIFT locates thedistinctive image keypoints, which are invariant to image scaling or rotation,and generates descriptors based on the orientation, scale and location of thesekeypoints. The SIFT extracted image features can be used for reliable imageregistration, i.e., they are robust to noise, illumination changes and cameraviewpoint differences [11, 12]. Recently, Liu et al. modified the standardoptical flow assumption and proposed a SIFT flow alignment algorithm [13].In their paper, they first applied histogram intersection to select 20 candi-dates from a video dataset for matching the query image, then used a SIFTflow based method to extract the dense correspondence between differentscenes to generate descriptors, based on the assumption that the descriptorsare constant with respect to the pixel displacement and a pixel in the queryimage would be matched to any other pixel(s) in the candidate image. Broxet al. generated descriptors according to the normalized Euclidean distancesbetween the descriptors and the result from boundary based segmentationof both images, and then used the variational method to evolve the opticalflow of the descriptors [14, 15].

Although SIFT descriptors can accurately extract invariant image char-acteristics around keypoints, the corresponding matching approach for reg-istration needs improvement to explore the geometric information amongkeypoints, so it can avoid mismatches between two keypoints which are ge-ometrically far away but with similar local image information. Geometricinformation is one of the most important image features. It has been widelyused in image registration and matching [16, 2]. Using geometric informationis a popular and effective way to represent the features of an image. Sharghiet al. used geometric information to set up feature points and to match thestereo images. After the feature points were set up, the entire target im-age searched for matching points [17]. Lhuillier et al. used local and globalgeometric constraints to regularize the matching map, where the local geo-metric constraints are encoded by planar affine applications and the globalgeometric constraints are encoded by the fundamental matrix [18]. Bian etal. proposed a new feature matching which applied geometric constraints forimage registration in weakly calibrated stereo images of curved scenes [19].Schmid et al. used geometric constraints to generate local descriptors in IPmatching [3, 4, 5, 6]. In our method, the image features are first extractedby SIFT, and stored as descriptors. These keypoints represent the significantinformation of images. The geometric constraints evaluate the geometric re-lationship between these keypoints, which provide geometric information for

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

image registration, hence improve the registration accuracy.Given that the physical distances between the keypoints in the same re-

gion are usually less than those in different regions, we use the Euclideandistance between two candidate keypoints as a constraint adding to the en-ergy function. After setting up the energy function, we apply BP to detectany mismatched keypoints generated because of loosely represented spatialinformation, and to correctly develop matches according to information fromboth descriptor and geometric information. In this paper, the tree-reweightedmax-product BP algorithm is used for the following reasons: first, the max-product based belief propagation method works well with discrete variablescomparing to divide & concur algorithm [20]; second, the method may giveout the lowest cost or most probable configuration for graphs without cyclescomparing to the sum-product algorithm [21, 22]; moreover, the algorithmwe used here needs less computation compared to the methods in [23, 24];furthermore, the tree-reweighted max-product BP algorithm has shown itsadvantages in convergence [25, 26].

The main contribution of this paper is the application of BP to improvethe matching algorithms of SIFT and its variants by analyzing the geometricdistances between keypoints, i.e., by finding the consistent geometric corre-spondence between two sets of features [27]. We propose a novel registrationalgorithm named BP-SIFT, where the geometric information of descriptorsis intuitively incorporated using BP. Experimental results show significantimprovement of our method over conventional SIFT, and also illustrate someSIFT mismatched keypoints which are removed by RANdom SAmple Con-sensus (RANSAC) [28] but correctly matched by BP-SIFT. Note that theproposed matching approach can be directly applied to other SIFT variantsas well.

The rest of the paper is organized as follows. A brief review of SIFT ispresented in the next section. In Section 3, we describe our BP improvedmatching approach, BP-SIFT. Experimental results are provided in Section4. Computational complexity of BP-SIFT is addressed in Section 5. Weconclude and suggest future work in Section 6.

2. SIFT

The main objective of SIFT is to identify locations, keypoints, of an im-age where there exist characteristics that are invariant to scaling and rota-tion. These characteristics are summarized by descriptors. SIFT generates

3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

keypoints by finding the extremum of difference-of-Gaussian function of animage. A keypoint candidate is refined to subpixel level and eliminated if itis found to be unstable. Once a keypoint is located, a descriptor is generatedbased on orientation(s), scale and location of the keypoint. For more details,we refer the reader to [8].

Suppose that we have two images, I1 and I2, the reference image and thetarget image, respectively, that need to be integrated. We denote D1(i) andD2(j) as the descriptors/feature vectors of the i

th and the jth keypoints of I1and I2, respectively. Let xD1(i) and xD2(j) be the corresponding locationsof the keypoints in I1 and I2, respectively.

The matching problem consists of finding a keypoint in I2 that best cor-responds to each keypoint in I1. As described in [8], the best candidatematching can be achieved based on minimum Euclidean distance. In otherwords, for a descriptor D1(i), the best match j(i) should satisfy

j(i) = arg minj ||D1(i)−D2(j)||2, (1)

where minimization is over all keypoints j in I2. Note that it is possible thatthere does not exist a correct matching keypoint in the second image. Toidentify such a case, Lowe [8] suggested to use the second best match (secondminimum distance from the target descriptor) to measure the probability ofa match. Therefore, j(i) in (1) will be considered as a valid match only if

|D1(i)−D2(j(i))||2 < T ||D1(i)−D2(j)||2, ∀j 6= j(i), (2)

where T is a constant independent of the two images.

3. Improved Matching for SIFT Using Belief Propagation

While SIFT is highly successful in describing unique features of images,the matching method proposed by the original SIFT algorithm loosely rep-resents the global information between descriptors. Indeed, its locally opti-mized approach, which minimizes the Euclidean distance, ignores the geomet-ric information among different descriptors. As a result, nearby descriptorsin I1 may map to far away descriptors in I2 and vice versa.

Denote ID1as the set of descriptors for I1 and m = [m1, · · · , m|ID1

|]as the vector of mapped indices from I1 to I2. That is, mi is a vector ofindices in I2 matched with the keypoint i in I1. To incorporate geometricinformation, we consider the matching as a global optimization problem and

4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

enforce a penalty to keypoints that violate geometric invariance. Specifically,our penalty function, Φ(m), is defined as the sum of the second norms ofdifferences between the distance from one keypoint to another in I1, and thedistance between the corresponding mapped keypoints in I2. That is,

Φ(m) =∑

i∈ID1

j∈ID1

φ(i, j;mi, mj), (3)

where

φ(i, j; i′, j′) = ‖√

‖xD1(i)− xD1

(j)‖2 −√

‖xD2(i′)− xD2

(j′)‖2‖2. (4)

Therefore, we aim to solve a global optimization problem as follows

m = arg minm

(Ψ(m) + λΦ(m)) (5)

with

Ψ(m) =∑

i∈ID1

ψ(i,mi) (6)

and

ψ(i, i′) = ‖D1(i)−D2(i′)‖, (7)

where λ is a Lagrange multiplier that can be adjusted based on prior knowl-edge.

3.1. Belief Propagation (BP)

The optimization problem in (5) is not convex and appears to have ex-ponential complexity. It is possible to discard some keypoints initially asdescribed by (2), but the number of remaining keypoints is over a hundredtypically. Thus, an exhaustive search requires ∼ 100! steps and is compu-tationally intractable. Moreover, applying (2) at such an early stage maydiscard useful information. Furthermore, (2) could be faulty in some cases,i.e., it is possible that a match exists even without satisfying conditions (2).

We propose to use BP to solve (5). BP [29, 23, 24, 20], also known asmessage passing algorithm, has been widely used in numerous signal pro-cessing applications [30, 31, 32, 33, 22, 34]. It is an iterative algorithm toapproximate the global optimum of a discrete optimization problem.

5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

First, the optimization problem is divided into a number of simpler (local)problems. At each iteration step, instead of estimating the exact optimum so-lution, for each local problem we attempt to evaluate the probability (belief)of each possible solution being optimal. These local beliefs are exchangedamong “neighboring” problems, where neighborhood is defined based on thespecific problem. The beliefs are incorporated in computing the beliefs inthe next iteration for each local problem. The algorithm stops either aftera fixed predefined number of iterations, or when the most probable beliefsamong all local problems converge.

3.2. Descriptors Matching using Belief Propagation

Since the optimization problem does not change even if we raise the ob-jective function exponentially, we can rewrite (5) as

m = arg maxm

(exp(−Ψ(m)) exp(−λΦ(m))), (8)

which is equivalent to

m = arg maxm

i∈ID1

bDesi(mi)∏

i∈ID1,j∈ID1

bDisti,j (mi, mj) (9)

where

bDesi(mi) = exp(−ψ(i,mi)/CDes), (10)

bDisti,j (mi, mj) = exp(−φ(i, j;mi, mj)/CDist), (11)

and CDes/CDist = λ. bDesi(mi) and bDisti,j (mi, mj) can be considered ap-proximately as beliefs of the ith keypoint in I1 being matched to the mth

i

keypoint in I2 given the descriptor information and the information that thejth keypoint in I1 is being matched to the mth

j keypoint in I2 , respectively.CDes and CDist adjust the certainty of the prior matches and geometric in-formation. The higher the coefficients, the lower the certainty or belief inthe prior information.

The reformulation in (9) does not make the optimization problem tractable.Considering the descriptors locating far away between each other are lesslikely to be correlated, so their corresponding mapped descriptors are alsoless likely to maintain the same distance apart. If we relax the problem a

6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

little and have the optimization function only include bDisti,j in which i and jare close together in the neighborhood (i.e., considering j ∈ N (i) ⊂ ID1

with|N (i)| ≪ |ID1

|) the problem can be solved approximately using a type of theBP algorithm known as the max-product algorithm [35]. In other words, wewant to solve

m = arg maxm

(∏

i∈ID1

bDesi(mi)∏

i∈ID1,j∈N (i)

bDisti,j (mi, mj)) (12)

instead.The max-product algorithm converges to the global optimum if the net-

work graph is a tree. In general, the algorithm is suboptimal but is shown toconverge to a very good solution in many applications [30, 31, 32, 33]. Before

presenting our algorithm, let us define µ(n)i,j (mj) as the belief of keypoint i

that the correct match for keypoint j is mj at the nth iteration. This mes-sage passes from keypoint i to keypoint j in each iteration. The algorithm issummarized as follows:

1. Initialize all messages µ(0)i,j (mj) as a constant. Set n = 1.

2. Update messages µ(n)i,j iteratively for all i ∈ ID1

and j ∈ ID2:

µ(n)i,j (mj)← κ1max

mi

bDisti,j (mi, mj)bDesi(mi)∏

j′∈N (i)\j

µ(n−1)j′,i (mi). (13)

3. Compute overall beliefs for i ∈ ID1:

b(n)i (mi)← κ2bDesi(mi)

j′∈N (i)

µ(n)j′,i(mi), (14)

mi = argmaxmi

b(n)i (mi) (15)

4. n ← n + 1 and go to 2) until n reaches the maximum number ofiterations.

κ1 and κ2 above are normalization constants such that∑

mjµi,j(mj) and

bi(mi) are 1. Then, bi(mi) can be physically interpreted as probability ofkeypoint i matching to mi. Moreover, it becomes reasonable to discard akeypoint when bi(mi) is less than some probability threshold pth.

7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

3.3. Least-Square RegistrationAfter the matched keypoints are determined, the two images can be reg-

istered by simply computing the best projective transform to satisfy thematching points [8]. Given a set of keypoint locations (xi, yi) in I1 and thelocations of corresponding mapped keypoints (x′i, y

′i) in I2, we need to find

an optimum 3×3 matrix A such that

A

xiyi1

=

x′iy′i1

. (16)

The equation above can be rearranged to gather the unknowns into a columnvector. Thus, we have

xi yi 1 0 0 0 0 0 0

0 0 0 xi yi 1 0 0 0

0 0 0 0 0 0 xi yi 1

· · ·

· · ·

︸ ︷︷ ︸

B

·

A1,1

A1,2

A1,3

A2,1

A2,2

A2,3

A3,1

A3,2

A3,3

︸ ︷︷ ︸

w

=

x′

i

y′

i1

.

.

.

︸ ︷︷ ︸

c

.

The least-squares solution for A can be computed by solving the normalequation

w = [BTB]−1BT c, (17)

which can be solved efficiently and in a numerically stable manner via QRfactorization.

4. Experiments

The proposed algorithm is validated by presenting numerical results us-ing various real images, which represent different registration issues, ob-ject motion, object missing, different viewpoints and different spatial loca-tions. The SIFT algorithm in our implementation is based on the work fromVedaldi [36]. The min-sum (max-product) algorithm of BP, specifically, tree-reweighted belief propagation algorithm [25, 26], is deemed the most appro-priate when taking into consideration its computation, suitability for discretevariables, and its convergence. In our experiment, we set CDes = 50, 000 andCDist = 2, 000. The experimental results are insensitive to the values of CDes

and CDist, as long as they are not too small. Indeed, as described in Sec-tion 3.2, the two coefficients control the certainty of the prior information.Very small CDes and CDist force the estimate to stay at the initial points

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Figure 1: Comparison of matches (upper) and the corresponding registration results(lower) obtained by SIFT (left) and by BP-SIFT (right). The distance constraint used inBP-SIFT apparently helps to fix some potential matching errors.

and thus prevent BP from converging to the global optimum. The numberof neighbors for all input keypoints, |Ni|, is set to 5. To speed up BP, werestrict the possible matches to only ten descriptors with the highest initial(prior) matches. The threshold probability described in Section 3.2, pth, isheuristically set to 0.5. The total number of BP iterations for each case is5. In most of the cases, the algorithm converged in 5 iterations, additionaliterations may not improve the performance.

Fig. 1 shows matches and the corresponding registration results obtainedby using SIFT and the proposed algorithm. The images used in Fig. 1 arefrom INRIA Graffiti dataset [37]. There are a few errors shown in the originalSIFT matching. In particular, the keypoints from the original image are allclose together; however, SIFT matches one of the keypoints to a keypointfar away from the others. On the other hand, BP-SIFT keeps the distanceinvariant as shown in the lower figure of Fig. 1.

Fig. 2 demonstrates four comparisons between SIFT and BP-SIFT, whereBP-SIFT consistently results in more accurate registration compared to theoriginal SIFT approach. The third test set (lower left in Fig. 2) is especiallyinteresting since it demonstrates that BP-SIFT successfully gives good reg-istration even though the two pictures are not taken at the same time. Wecan see that the red truck on the runway and yellow car in the parking lotare not in the registered image. Moreover, from the difference of the shadowof the building, we can see that the two input images are taken at differenttimes. The fourth test set (lower right in Fig. 2) illustrates that BP-SIFT

9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Figure 2: Comparison of SIFT and BP-SIFT: the top right image of each set is theregistration result of SIFT, and the lower right image of each set is the registration resultof BP-SIFT.

can successfully register two images even when one of them is blurred.Next, we illustrate the differences in registration results obtained using

SIFT, RANSAC-SIFT, and BP-SIFT. RANSAC is good for fitting a model toexperimental data which contains a significant percentage of gross errors [28].In this experiment, we apply RANSAC with SIFT in registration, namedRANSAC-SIFT, and compare its result with that of BP-SIFT. The RANSACalgorithm here is implemented based on [38], where the distance parameterof this algorithm is set to 0.002 in our experiments. And the input keypointsare the output keypoints from the SIFT algorithm.

The image sets are numbered Sample 1, Sample 2, ..., in Figs. 3-6. Specif-ically, Figs. 3-4 compare the keypoints which are mismatched in SIFT, re-moved in RANSAC-SIFT, but correctly matched in BP-SIFT. Figs. 5-6further illustrate the promising registration results of BP-SIFT compared to

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

SIFT and RANSAC-SIFT. The corresponding registration accuracy for eachsample is shown in Table 1 and Fig. 7. In the paper, the registration accu-racy is obtained via dividing the number of pairs of correctly matched pointsby that of total matched points.

In each sample of Figs. 3-4, the figures in the left column show thematched points, and the figures in the right column compare the registrationresult of each method for each sample. Specifically, from left to right, thentop to bottom, the figures represent matched points using SIFT, registra-tion result using SIFT, matched points using RANSAC-SIFT, registrationresult using RANSAC-SIFT, matched points using BP-SIFT, and registra-tion result using BP-SIFT, respectively. Each pair of corresponding pointsare connected with red lines. In each figure with red lines, the left image rep-resents the target image, i.e., I2, and the right image is the reference image,I1. In the figures showing registration result of each method, the left imagein each figure is the reference image, and the right image shows the regis-tration result of the target image. The keypoints, which are mismatched inSIFT, removed by RANSAC-SIFT, but correctly matched in BP-SIFT, areshown with green circles. The points in green squares represent the keypointswhich are just partially consistent in geometric information and mismatchedby RANSAC, but corrected by BP-SIFT. It is obvious that the results of SIFTand RANSAC-SIFT have more incorrect matches than BP-SIFT, especiallywhen the images are blurred, without enough intensity information to gen-erate matches between descriptors, or with significant location changes. BP-SIFT corrects the mismatched descriptors by using geometric information,i.e., distances between descriptors, as a criterion during belief propagationinstead of just removing them as RANSAC-SIFT. Therefore, as shown bythe results, BP-SIFT can significantly reduce the number of wrong matchedkeypoints, which leads to a better registration result.

Sample 1 in Fig. 3 shows two images taken at the same scene but onelooks cloudy, one does not. From the figure with matched points by SIFT,we can see that the circled mismatches happen on the locations which havesimilar local information, e.g., a circled point on the third floor of the buildingin the target image is mismatched to the point on the fifth floor of the samebuilding in the reference image. Similar errors occur in the images mappedby the RANSAC-SIFT method, which are in green squares. The referenceimage and target image in Sample 2 of Fig. 3 were taken for the same sceneat almost the same time but from a different angle. In this sample, we cansee that the obvious mismatched points are removed by RANSAC-SIFT but

11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

correctly mapped by BP-SIFT, which are circled in green.The images of Sample 3 in Fig. 4 were captured in a moving car. Although

the vehicles in both images are the same, the backgrounds are different. Thecircled points in the figure using the SIFT method are incorrectly mapped,e.g., a point under the white car is matched to a point under the SUV and apoint under the black car. However, the corresponding green circled points inthe figure using the BP-SIFT method are exactly matched. Sample 4 in Fig.4 shows the registration of two images which have green letters on a lightblue background. Though the font styles of these two images are different,their contents are neat, which increases the number of similar descriptorsas well as the number of mismatches. The lines which are not used fordemonstration are shown as dashed lines in light red. From this sample,we can see that RANSAC-SIFT removes some of the mismatched keypointsgenerated by SIFT, however BP-SIFT corrects these keypoints, as shownin green circles. Moreover, BP-SIFT corrects some of the points which aremapped incorrectly in RANSAC-SIFT, e.g., the point between letter “A”and “B”. Figs. 5-6 show the registration results of Samples 5 to 10. Thefigures used here represent different kinds of images, which are captured fromdifferent angles, times, locations, etc. For each sample, the figures from leftto right then from top to bottom are original images, registration result usingSIFT, registration result using RANSAC-SIFT, and registration result usingBP-SIFT. These registration results further validate that our algorithm cansignificantly improve the registration accuracy of SIFT and RANSAC-SIFT.

Table 1 and Fig. 7 show the registration accuracy of Samples 1 to 10,where the data points in Fig. 7 are connected by lines to show the consistencyof performance. Most of the registration results obtained by using BP-SIFThave 90− 100% accuracy, compared to SIFT with 50%− 60% accuracy andRANSAC-SIFT with 80%− 90% accuracy.

Moreover, Fig. 8 shows the RMSE and PSNR values for each sam-ple. The RMSE value is calculated as the root mean square error betweenthe reference image and the registered image. The PSNR is obtained by20log10(Imax/RMSE), where Imax is the maximum intensity value of refer-ence image. The intensity values in each image are normalized to between 0and 1 before the calculation of RMSE and PSNR. In this figure, the RMSEvalues of each method are shown on the left, and the PSNR values are shownon the right. In each plot, the values from SIFT are in blue, the values fromRANSAC-SIFT are in green, and the values from BP-SIFT are in red. From

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Figure 3: Comparison of SIFT, RANSAC-SIFT, and BP-SIFT, Sample 1 and Sample 2:the figures from left to right then from top to bottom represent matched points usingSIFT, registration result using SIFT, matched points using RANSAC-SIFT, registrationresult using RANSAC-SIFT, matched points using BP-SIFT, and registration result usingBP-SIFT.

13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Figure 4: Comparison of SIFT, RANSAC-SIFT, and BP-SIFT, Sample 3 and Sample 4:the figures from left to right then from top to bottom represent matched points usingSIFT, registration result using SIFT, matched points using RANSAC-SIFT, registrationresult using RANSAC-SIFT, matched points using BP-SIFT, and registration result usingBP-SIFT.

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Figure 5: Comparison of SIFT, RANSAC-SIFT, and BP-SIFT, Sample 5 to Sample 7: thefigures from left to right then from top to bottom represent original images, registrationresult using SIFT, registration result using RANSAC-SIFT, and registration result usingBP-SIFT.

15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Figure 6: Comparison of SIFT, RANSAC-SIFT, and BP-SIFT, Sample 8 to Sample 10:the figures from left to right then from top to bottom represent original images, registrationresult using SIFT, registration result using RANSAC-SIFT, and registration result usingBP-SIFT.

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Table 1: Matching accuracy.

Sample number SIFT RANSAC-SIFT BP-SIFT1 0.54 0.77 1.002 0.54 0.76 1.003 0.56 0.76 1.004 0.48 0.79 1.005 0.56 0.85 1.006 0.64 0.90 0.937 0.76 0.97 1.008 0.79 0.97 1.009 0.39 0.55 1.0010 0.55 0.61 1.00

Figure 7: Registration accuracy of SIFT, RANSAC-SIFT, and BP-SIFT.

17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Sample numbers

RM

SE

RMSE of Different Methods

SIFTRANSAC−SIFTBP−SIFT

1 2 3 4 5 6 7 8 9 100

5

10

15

20

25

Sample numbers

PS

NR

PSNR of Different Methods

SIFTRANSAC−SIFTBP−SIFT

Figure 8: RMSE (left) and PSNR (right) for Sample 1-10.

this figure, we can see that results of BP-SIFT have smaller RMSE valuesand larger PSNR values. In order to illustrate that our method is morerobust than other methods, we have applied our algorithm to register thetarget images which have significant differences with the reference images.Therefore, the overall PSNR of these images are low, but the results of ourmethod still have higher PSNR than other methods. Please also note thatPSNR loosely measures the registration errors since it drastically exaggeratesalignment problem.

The proposed algorithm is based on the assumption that the distancesbetween most of the descriptors do not change drastically between referenceimage and target image. If there are a large number of changes betweenthe reference and target images, or the two images are generated from twocompletely different scenes, the algorithm may fail, because the distancechanges between descriptors become hard to estimate. Fig. 9 shows anunsuccessful registration, where the reference image and target image areplaced on right, and the registration result are on left.

5. Computational Time Analysis

The computation time of the BP algorithm significantly varies and gen-erally increases with the number of original descriptors found by the SIFTalgorithm. Technically, the computational cost increases as the number ofimage features increase. The number of descriptors for each test set is listedin Table 2 and the computational time for each test set is profiled into fourmain components as shown in Table 3, where (A) corresponds to the time

18

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Figure 9: An example for unsuccessful registration.

spent to generate descriptors using SIFT; (B) is the time spent for the simpleSIFT matching described by (1) and (2); (C) corresponds to the time spent tocompute initial beliefs of Ψ(i, i′) for all pairs of descriptors from I1 and I2 andthe time spent to find the nearest neighbor sets for each keypoint in I1; and(D) is the actual time spent for performing BP. Therefore, the time spent forthe original SIFT is (A)+(B) and that for BP-SIFT is (A)+(C)+(D), whichare also listed in Table 3. It is interesting to see that the time spent on BPis very small comparing to the rest. All the experiments are conducted usinga Pentium 4, 2.8 GHz Dual-Core desktop with 2 GB memory under Linux.

The algorithm was implemented with a number of different programminglanguages for convenience and briefly described next to allow the reader tomake a fair comparison. Steps (A) and (B) are conducted by the implemen-tation of [36], which is mainly implemented in C++ and is optimized. Step(C) is performed as a mixture of MATLAB/C-MEX and MATLAB inter-nal function. Step (D) is implemented using Java which is then called byMATLAB.

We can see that the computation time of BP-SIFT varies quite signif-icantly from very little (less than 20%) to quite large (more than double).However, we want to point out that most effort is spent in Step (C), i.e.,to compute the initial beliefs and to find the nearest neighbors for eachkeypoints, which is currently implemented in a straightforward manner. Toobtain the initial beliefs for BP (Step (D)), we first compute beliefs among allpossible pairs of keypoints from the two images and then for each keypoint inI1, sort the computed beliefs to retain the most confident matching keypointsout of I2. Finding the nearest neighbors of keypoints in I1 is implemented

19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Table 2: The number of descriptors generated by SIFT for each image set.

ID1ID2

ID1ID2

Set 1 4594 5211 23.94E+6Set 2 1509 893 1.35E+6Set 3 1908 1921 3.67E+6Set 4 4660 4293 20.01E+6Set 5 1064 5952 6.33E+6

Table 3: Computational time in seconds. (A) Time spent to compute descriptors usingSIFT. (B) Conventional SIFT matching based on (1). (C) Compute initial beliefs using(10) and the closest neighbors for each keypoint. (D) Perform BP.

(A) (B)SIFT

(A)+(B)(C) (D)

BP-SIFT(A)+(C)+(D)

Set 1 33.34 9.18 42.52 59.72 3.99 97.05Set 2 13.81 0.72 14.53 1.49 0.77 16.07Set 3 16.17 1.27 17.44 4.68 1.68 22.63Set 4 30.35 7.36 37.71 49.90 5.52 85.77Set 5 21.14 2.53 23.67 7.33 0.91 29.38

in the similar manner: compute distances among all pairs of keypoints andthen sort the distances to retain the nearest ones. Currently, computing thepairwise beliefs and distances is implemented using MATLAB/C-MEX andthe sorting steps are performed using an MATLAB internal function. Eventhough Step (B) requires less computation than (C), it includes very similarsteps as (C) (computing initial beliefs and performing a partial sorting withonly the first two elements). Since Step (B) is optimized and Step (C) is not,we believe that Step (C) would need less time to run than what is currentlyobserved if it were implemented as efficiently as Step (B). Hence, if opti-mally implemented, BP-SIFT may have a smaller computational overheadthan currently observed.

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

6. Conclusion

In this paper, we propose a BP-SIFT registration model which improvesthe traditional SIFT-based method by correcting the wrong matches usingBP according to geometrical information. Experimental results show signifi-cant improvement over the traditional approach. We also provide comparisonof computational time between SIFT and BP-SIFT. Our BP-based approachcan be applied to other SIFT variants such as PCA-SIFT, which will be partof future work.

References

[1] L. G. Brown, A survey of image registration techniques, ACM Comput-ing Survey 24 (4) (1992) 325–376.

[2] B. Zitova, J. Flusser, Image registration methods: a survey, Image andVision Computing 21 (11) (2003) 977–1000.

[3] K. Mikolajczyk, C. Schmid, A performance evaluation of local descrip-tors, IEEE Transactions on Pattern Analysis and Machine Intelligence27 (10) (2005) 1615–1630.

[4] F. Rothganger, S. Lazebnik, C. Schmid, J. Ponce, Object modeling andrecognition using local affine-invariant image descriptors and multi-viewspatial constraints, International Journal of Computer Vision 66 (3)(2006) 231–260.

[5] J. Ponce, M. Hebert, C. Schmid, A. Zisserman, Toward Category-LevelObject Recognition, Springer, 2007.

[6] I. Laptev, T. Lindeberg, Local descriptors for spatio-temporal recogni-tion, in: ECCV’04 Workshop on Spatial Coherence for Visual MotionAnalysis, Vol. 3667, 2004, pp. 91–103.

[7] D. G. Lowe, Object recognition from local scale-invariant features, in:IEEE International Conference on Computer Vision, Vol. 2, 1999, pp.1150–1157.

[8] D. G. Lowe, Distinctive image features from scale-invariant keypoints,International Journal of Computer Vision 60 (2) (2004) 91–110.

[9] Y. Ke, R. Sukthankar, PCA-SIFT: A more distinctive representationfor local image descriptors, in: Proc. IEEE International Conference onComputer Vision and Pattern Recognition 2004, Vol. 2, 2004.

[10] M. Grabner, H. Grabner, H. Bischof, Fast approximated sift, LectureNotes in Computer Science 3851 (2006) 918.

21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

[11] J. Bustard, M. Nixon, Robust 2d ear registration and recognition basedon sift point matching, in: IEEE/RSJ International Conference on Bio-metrics: Theory, Applications and Systems, 2008, pp. 1–6.

[12] C. Tang, Y. Dong, X. Su, Automatic registration based on improved siftfor medical microscopic sequence images, 2008, pp. 580–583.

[13] C. Liu, J. Yuen, A. Torralba, J. Sivic, W. T. Freeman, Sift flow: densecorrespondence across different scenes, in: European Conference onComputer Vision, Vol. 5304, 2008, pp. 28–42.

[14] T. Brox, A. Bruhn, N. Papenberg, J. Weickert, High accuracy opticalflow estimation based on a theory for warping, in: European Conferenceon Computer Vision, Vol. 3024, 2004, pp. 25–36.

[15] T. Brox, C. Bregler, J. Malik, Large displacement optical flow, in: IEEEConference on Computer Vision and Pattern Recognition, 2009.

[16] L. G. Brown, A survey of image registration techniques, ACM Comput-ing Survey 24 (4) (1992) 325 – 376.

[17] S. Sharghi, F. Kamangar, Geometric feature-based matching in stereoimages, in: Proc. Information, Decision and Control 99, Adelaide, SA,Australia, 1999, pp. 65 – 70.

[18] M. Lhuillier, L. Quan, Robust dense matching using local and globalgeometric constraints, in: International Conference on Pattern Recog-nition, 2000, pp. 968–972.

[19] H. Bian, J. Su, Feature matching based on geometric constraints instereo views of curved scenes, in: IEEE International Conference onIntelligent Robots and Systems, 2005, pp. 313–318.

[20] V. Elser, I. Rankenburg, P. Thibault, Searching with iterated maps, in:Proc. Nat. Acad. Sci. USA, Vol. 104, 2007, pp. 498–519.

[21] M. Wainwright, M. Jordan, Graphical Models, Exponential Families,and Variational Inference., Now Publishers, 2008.

[22] F. Kschischang, B. Frey, H. Loeliger, Factor graphs and the sum-productalgorithm, IEEE Transactions on information theory 47 (2) (2001) 498–519.

[23] J. S. Yedidia, W. T. Freeman, Y. Weiss, Understanding belief propaga-tion and its generalizations, in: Exploring Artificial Intelligence in theNew Millennium, Science and Technology Books, 2003.

[24] J. S. Yedidia, W. T. Freeman, Y. Weiss, Constructing free-energyapproximations and generalized belief propagation algorithms, IEEETransactions on Information Theory 51 (7) (2005) 2282–2312.

[25] M. Wainwright, T. Jaakkola, A. Willsky, Tree-reweighted belief prop-

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

agation algorithms and approximate ML estimation by pseudomomentmatching, in: Workshop on Artificial Intelligence and Statistics, 2003.

[26] V. Kolmogorov, Convergent tree-reweighted message passing for energyminimization, IEEE Transactions on Pattern Analysis and Machine In-telligence 28 (10) (2006) 1568–1583.

[27] M. Leordeanu, M. Hebert, A spectral technique for correspondence prob-lems using pairwise constraints, in: IEEE International Conference onComputer Vision, Vol. 2, 2005, pp. 1482–1489.

[28] M. A. Fischler, R. C. Bolles, Random sample consensus: A paradigmfor model fitting with applications to image analysis and automatedcartography, Communications of the ACM 24 (6) (1981) 381–395.

[29] J. Pearl, Probabilistic reasoning in intelligent systems: networks ofplausible inference, Morgan KauFmann Publishers, Inc., San Francisco,1988.

[30] D. J. C. MacKay, Good error-correcting codes based on very sparsematrices, IEEE Trans. Inform. Theory 45 (1999) 399–431.

[31] D. J. C. MacKay, R. J. McEliece, J. F. Cheng, Turbo decoding as an in-stance of pearl’s belief propagation algorithm, IEEE Journal on SelectedAreas in Communication.

[32] J. J. Hopfield, C. D. Brody, What is a moment? transient synchrony asa collective mechanism for spatiotemporal integration, in: Proceedingsof the National Academy of Sciences, 2001, pp. 1282–1287.

[33] J. Sun, N. N. Zheng, H. Y. Shum, Stereo matching using belief propaga-tion, IEEE Transactions on Pattern Analysis and Machine Intelligence25 (7) (2003) 787–800.

[34] P. Felzenszwalb, D. Huttenlocher, Efficient belief propagation for earlyvision, International Journal of Computer Vision 70 (1) (2006) 41–54.

[35] D. J. C. MacKay, Information Theory, Inference, and Learning Algo-rithms, Cambridge, 2003.

[36] A. Vedaldi, An open implementation of the SIFT detector and descrip-tor, Tech. Rep. 070012, UCLA CSD (2007).

[37] Viewpoint change sequences. [link].URL http://www.inrialpes.fr/movi/

[38] Random Sample Consensus (RANSAC) Algorithm. [link].URL http://www.csse.uwa.edu.au/ pk/Research/MatlabFns

23

Image Registration Using BP-SIFT

Research Highlights

>We apply geometric information as constraints in image registration by BP with SIFT

>BP corrects mismatched keypoints according to descriptors and geometric information

>BP-SIFT finds the consistent geometric correspondence between two sets of features

>Results show significant improvement of our method over conventional SIFT

>Mismatched keypoints, which are just removed by RANSAC, correctly matched by BP-

SIFT