A practical two-step image registration method for two-dimensional images
-
Upload
independent -
Category
Documents
-
view
6 -
download
0
Transcript of A practical two-step image registration method for two-dimensional images
Information Fusion 5 (2004) 283–298
www.elsevier.com/locate/inffus
Information Fusion 5 (2004) 283–298
www.elsevier.com/locate/inffus
A practical two-step image registration methodfor two-dimensional images
Xiaoming Peng a,*, Mingyue Ding a, Chengping Zhou a, Qian Ma b
a State Education Commission Key Lab for Image Processing & Intelligent Control, Institute for Pattern Recognition & Artificial Intelligence,
Huazhong University of Science and Technology (HUST), Wuhan 430074, Chinab 5th Lab, Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China
Received 12 June 2003; received in revised form 16 December 2003; accepted 19 December 2003
Available online 31 January 2004
Abstract
In this paper, a practical method for registering two-dimensional images involving 2D affine transformations is proposed. This
method, whose parameters are provided by the user, consists of two steps. The first step is a robust feature-based method that does
not need to establish a correspondence between the features of images. Practically, an efficient Hausdorff distance-based algorithm is
developed to yield an initial transformation in this step. Then in the second step, an area-based method (Irani et al.’s method) is
utilized to refine the initial transformation and enhance the registration accuracy. The combination of both feature-based and area-
based methods tends to take advantage of the benefits of individual methods. Experiments have demonstrated that this two-step
method can be successfully applied to registering both multi-sensor images and images from a single sensor.
� 2004 Elsevier B.V. All rights reserved.
Keywords: Image registration; Feature-based; Area-based; Affine transformation; The Hausdorff distance
1. Introduction
Image registration, sometimes called image alignment,
is an important step for a great variety of applications
such as remote sensing, medical imaging and multi-sen-sor fusion based target recognition. It is a prerequisite
step prior to image fusion or image mosaic. Its purpose is
to overlay two or more images of the same scene taken at
different times, from different viewpoints and/or by dif-
ferent sensors [1]. In this paper, we concentrate our
investigation in the scope of two-dimensional (2D)
images and the spatial transformation between the ima-
ges can be viewed as a 2D affine transformation.The image registration techniques that were devel-
oped in the early 90’s or earlier can be found in [1].
Registration can be performed either manually or
automatically. The former refers to human operators
manually selecting corresponding features in the images
*Corresponding author. Tel.: +86-27-87544512.
E-mail address: [email protected] (X. Peng).
1566-2535/$ - see front matter � 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.inffus.2003.12.004
to be registered. In order to get reasonably good reg-
istration results, an operator has to choose a consid-
erably large number of feature pairs across the whole
images, which is not only tedious and wearing but also
subject to inconsistency and limited accuracy. Thus,there is a natural need to develop automated techniques
that require little or no operator supervision. The cur-
rent automatic registration techniques generally fall
into two categories: feature-based methods and area-
based methods. Feature-based methods utilize extracted
features to estimate the registration parameters. The
most widely used features include regions [2,3], lines or
curves [4–7], and points [8–12]. The primary merits offeature-based methods are their ability to handle large
misalignments with relatively short execution times.
For most feature-based methods to be successfully ap-
plied two conditions must be satisfied: (i) features are
extracted robustly and (ii) feature correspondences are
established reliably. Failure to meet either of the
requirements will cause a method of this type to fail. In
[2,3] closed regions are used to compute the affine trans-formation between the reference and sensed images.
284 X. Peng et al. / Information Fusion 5 (2004) 283–298
The methods in these articles require that at least three
closed-contour regions are available in each image. This
condition is rarely satisfied in many applications. Li
et al. present a scheme by first matching closed con-
tours to give an initial estimation of the transformationand then matching open edges [4]. It only works well
when the closed contours are well preserved in both
images. In Coiras et al.’s visual/infrared registration
method edges are approximated as straight-line seg-
ments using Ramer’s algorithm [5]. The segments are
then grouped to form triangles and the method looks
for the transformation that best matches the triangles
from the source and destination images. However, for acontour, the approximated line segments obtained by
Ramer’s algorithm are greatly influenced by the end
positions of the contour. Furthermore, the allowed
deviation of the approximated segment from the con-
tour also plays an important role in the process.
Therefore, unless the contours can be perfectly ex-
tracted and the allowed deviation is correctly chosen,
there is no guarantee that the triangles formed by theline segments are reliable ones. Hsieh et al. propose to
first estimate the orientation difference between images
by calculating an ‘‘angle histogram’’ [7]. Then matching
point pairs are found as features to compute the
translation and scaling. This method uses area corre-
lation as the matching measure. Since gray-level
intensity based correlation-matching criterion does not
work for images of dissimilar intensity characteristics,this method, along with those in [8,9], is not suitable for
multi-sensor image registration. In [10] Yang and Co-
hen propose to establish correspondences between the
convex hull vertices of the test and reference images,
and then to recover the affine transformation between
the images based on these correspondences. This
method partly solves the problem of occlusion or
addition of features, but it requires four or more cor-ners to form consecutive vertices of the convex hull on
each image. In [12] matching pairs are determined
simultaneously with the transformation parameters.
This method is able to establish a reliable correspon-
dence between the points. However, it is limited in the
set of similarity transformations it can handle.
Several methods have been developed recently that
avoid establishment of correspondence between fea-tures. Yang and Cohen proposed to use affine invariants
and cross-weighted moment affine invariants as match
measures for registration [11]. The features are corners.
Although this method does not require feature corre-
spondence, it is sensitive to outliers. Shekhar et al. use
multiple features (primarily points and straight lines) to
solve for the unknown transformation parameters [14].
The transformation parameters are estimated by locat-ing the peaks of the ‘‘feature consensus’’ functions. This
method works well when the images are scenes of arti-
ficial objects where points and lines can be easily and
accurately extracted. Among the correspondence-less
feature-based methods, the Hausdorff distance-based
methods are of particular interest to us. A measure de-
fined between two finite point sets, the Hausdorff dis-
tance is robust to outliers (extra features), missingfeatures, and noise. This characteristic is particularly
important to multi-sensor image registration where
some features may be present in one image while absent
in the other.
In contrast to feature-based methods, area-based
methods use the whole image content to estimate the
transformation parameters. The main virtue of such
methods is their ability of providing a high registrationaccuracy up to a fraction of a pixel. Brown divided them
into three types: correlation-like methods, Fourier
methods, and mutual information methods [1]. We pay
a special attention to those methods that, with proper
modifications, can be used for registering multi-sensor
images. These methods include: (1) the minimization of
sum of squared brightness differences (SSD) [18,19]; (2)
the maximization of normalized correlation [20]; and (3)the optimization of mutual information [21,22]. There
are two commons among these methods: (i) an optimi-
zation method (e.g., Newton’s method [18–20], Marqu-
ardt–Levenberg method [22]) is used for recursive
iteration; and (ii) a hierarchical data pyramid is adopted
to propagate the transformation parameters from a
coarser level to a finer level, compensating for the initial
displacement between the source images. Unfortunately,area-based methods cannot handle large misalignments
between images. The reasons are twofold: (1) given an
image of definite size, the number of levels of a data
pyramid is always limited. The size of the image at the
coarsest level cannot be too small otherwise aliasing will
arise. (2) An optimization method (Newton’s method,
Marquardt–Levenberg method, etc.) needs an initial
point that is sufficiently close to the true solution forconvergence. This means that large displacements be-
tween corresponding levels of the image pyramids are
not allowed.
In this paper we bring forward a two-step image
registration method that combines the merits of both
feature-based and area-based methods. It can be used
to register images from similar as well as different types
of sensors. The rest of the paper is organized as fol-lows. In Section 2, the first step of the method is ad-
dressed in detail. This step includes a feature extraction
scheme and a Hausdorff distance-based registration
approach that provides an initial estimation of the
transformation parameters. In addition, various other
methods based on the Hausdorff distance are compared
with our method. In Section 3 we use an area-based
method [20] to refine the transformation parametersobtained from Section 2. Experimental results are
presented in Section 4 and conclusions and discussions
are given in Section 5.
X. Peng et al. / Information Fusion 5 (2004) 283–298 285
2. First step––feature extraction and initial transforma-
tion estimation
Let two 2D images I1 and I2 respectively denote thereference image and the test image, then the geometricmapping between two points ðx1; y1Þ 2 I1 and
ðx2; y2Þ 2 I2 can be expressed asðx1; y1Þ ¼ gððx2; y2ÞÞ ð1Þwhere g is the geometric transformation between thereference and the test images, which is determined by the
a priori knowledge of the scene as well as the sensor
geometries. In this paper, we focus on the six-parameter
2D general affine transformation:
x1y1
� �¼ a00 a01
a10 a11
� �x2y2
� �þ tx
ty
� �ð2Þ
The vector form representation of Eq. (2) is x1 ¼Ax2 þ t, where x1 ¼ ½x1; y1�T, x2 ¼ ½x2; y2�T,
A ¼ a00 a01a10 a11
� �, and t ¼ tx
ty
� �.
The general affine transformation is a good approx-
imation to a wide class of imaging models. For the
perspective projection model, which is a suitable model
for most cameras, if the sensor is far enough from the
imaged surface, the affine transformation is frequently
an acceptable approximation for the actual geometric
transformation. Moreover, the set of affine transfor-
mations includes some common linear transformationssuch as translations only, similarity transformation, etc.
2.1. Feature extraction
The purpose of the feature extraction process is to
provide feature sets as input for a Hausdorff distance-
based registration method. In view of the speed of exe-
cution we would prefer sets of small-sized features. Onthe other hand, the features must be robust. These two
aspects should be both taken into consideration when
selecting features. We have tested several point feature
extraction methods like the wavelet-based methods
[7,13,15] and found that they failed to supply reliable
features at times in multi-sensor image registration
cases. However, we have observed that more often than
not, structural edges can be preserved in both the ref-erence and the test images, even if the image pairs are
from sensors of different modalities. To preserve the
primary edges and suppress the clutter ones, an ‘‘edge
focusing’’ methodology [16] combined with a Canny
edge detector [29] is adopted in our feature extraction
process. The details are as follows:
(1) Create an initial edge image Eiði; j; r0Þ by applyingthe Canny edge detector to the image Ii (i ¼ 1; 2),where r0 is the scale of the Gaussian used in the
Canny edge detector. Choose a proper step length
Dr.(2) rk ¼ rk�1 � Dr. Detect edges using the Canny edge
detector with scale rk. The edge detection is onlyperformed in a series of 3 · 3 windows centered atevery edge point in Eiði; j; rk�1Þ. A new edge imageEiði; j; rkÞ is thus obtained.
(3) Repeat Step (2) until rk reaches some preset value.
During the above process, no thresholds are used to
eliminate weak edges. The edges obtained by this pro-
cess are used as input of a hysteresis approach [29], in
which two thresholds T1 and T2 are used (T1 > T2). Theinput edges are dealt with herein: (1) all edge points withmagnitude greater than T1 are marked as correct andretained; (2) all edge points with magnitude less than T2are removed; (3) scan all edge points with magnitude in
the range ½T2; T1�. If such a point borders another al-ready marked as correct, then mark it too as correct.
Repeat this step until stability is achieved. After this
processing, all edges shorter than some length threshold
are removed.
2.2. Hausdorff distance-based image registration algo-
rithm
The benefits of the Hausdorff distance as a matching
measure have already been mentioned in Section 1.
However, the running time has always been a bottleneck
for most Hausdorff distance-based algorithms. The
major contribution of this paper is the presentation of
an efficient Hausdorff distance-based algorithm that
gives competitive results compared with other available
methods.Given two finite 2D point sets A and B, the directed
partial Hausdorff distance hf ðB;AÞ from B to A is definedas
hf ðB;AÞ¼deff thb2Bmina2A
kb� ak ð3Þ
In the above definition, f thx2X
gðxÞ denotes the f th
quantile value of gðxÞ over the set X , for some value off between zero and one [17]. For example, the 1th
quantile value is the maximum and the 1/2th quantile
value is the median. kb� ak is the Euclidean distancebetween the points b and a. The physical explanation ofhf ðB;AÞ is as follows: there are at least a fraction f ofthe points in set B such that the distances from these
points to their nearest neighbors in set A will not exceedhf ðB;AÞ. Note that the above definition is asymmetric.A directed partial Hausdorff distance hf ðA;BÞ from Ato B can be analogously defined. In image registrationapplications it will be sufficient to consider one of them
[13].In this section, we seek a general affine transforma-
tion t such that
Fig. 1. Expandable array P Q.
286 X. Peng et al. / Information Fusion 5 (2004) 283–298
hf ðtðE2Þ;E1Þ6 s þ e ð4Þwhere E1 and E2 are the edge images from the feature
extraction algorithm described in Section 2.1; s is apreset threshold and tðE2Þ denotes the transformedversion of E2 by t; e ¼
ffiffiffi2
p. For convenience we use a six-
tuple ða00; a01; a10; a11; tx; tyÞ to represent t. Since sixparameters are needed to define a transformation, thetransformation space is six-dimensional. A rectilinear
axis aligned region of the six-dimensional transforma-
tion space is called a cell [17], which can be uniquely
represented by a pair of lower and upper transforma-
tions tl ¼ ðal00; al01; al10; al11; tlx; tlyÞ and th ¼ ðah00; ah01; ah10;ah11; t
hx ; t
hy Þ. Given a cell R, tl denotes such a transforma-
tion that the parameters al00 � tly all take the lowestvalues of R, in each dimension. Similarly, th is a trans-formation whose parameters all have the highest values
of R. Given a point P 2 E2 and a transformation k 2 R,the transformed point kðP Þ is bounded in a rectangle onthe plane of E1. This bounding rectangle, whose top leftand bottom right corners are respectively tlðP Þ andthðP Þ, is called the ‘‘uncertainty region’’. Moreover, de-fine the size of an uncertainty region to be the length of
its longest side. In this way, each cell is associated with acollection of uncertainty regions, one for each point
P 2 E2.Define the distance transform D½x; y� of E1 as
D½x; y� ¼ mini2E1
kðx; yÞ � ik ð5Þ
It can be seen from the above definition that D½x; y� isvirtually the distance of a point ðx; yÞ to its closestneighbor in E1. In this paper, we call a point ðx; yÞ as aninteresting point if it satisfies D½x; y�6 s.
2.2.1. Hausdorff distance-based image registration algo-
rithm
(1) Compute the distance transform of E1 using a fastmethod given in [25].
(2) Construct an expandable array P Q. P Q is specialin that its elements are priority queues of cells (see
Fig. 1). At the beginning P Q has just one elementP Q½0� that contains the initial cell R. Ensure thatR is large enough so that a target transformation tsatisfying inequality (4) is held in it. Let
Pj ¼ ðxj; yjÞ represent an edge point in E2(j ¼ 1; 2; . . . ; jE2j, where jE2j denotes the total ofthe edge points of E2) and set two integers
x max ¼ maxj¼1;2;...;jE2j
ðxjÞ and y max ¼ maxj¼1;2;...;jE2j
ðyjÞ.Initialize an integer LEVEL to zero.
(3) Find a target transformation t (see Fig. 2).
Keep in mind that in Fig. 2 the element P Q½LEVEL�is a priority queue and the size of the cells stored in thepriority queue varies with LEVEL. The explanations of
the annotations A, B, C and D in Fig. 2 are as follows:
(A) For a cell R in question, compute its difference
transformation
Dt ¼ th � tl ¼ ðD00;D01;D10;D11;Dtx;DtyÞ. Then themaximum uncertainty region size associated with
R is computed as
maxj¼1;2;...;jE2j
ð maxðxj;yjÞ2E2
ðD00xj þ D01yj þ Dtx;D10xj
þ D11yj þ DtyÞÞ þ 1 ð6Þ
(B) We decompose a cell R into equally sized sub-cellsby splitting the cell through its midpoint in each eli-
gible dimension. An eligible dimension is defined as
follows: there are six terms that correspond to thesix dimensions a00 � ty of R. They are: a00 !D00x max, a01 ! D01y max, a10 ! D10x max,a11 ! D11y max, tx ! Dtx, and ty ! Dty , respec-tively. If any of these terms is larger than 2/3, then
the corresponding dimension is an eligible dimen-
sion. For instance, the a00 dimension is eligible ifD00x max > 2=3. Assume that a cell R has two eligi-ble dimensions a00 and tx, and the midpoint trans-formation of R is tm ¼ ðtl þ thÞ=2 ¼ ðam00; am01;am10; a
m11; t
mx ; t
my Þ. Then we can obtain four sub-cells
which are represented by the lower and upper trans-
formation pairs:
fðal00; al01; al10; al11; tlx; tlyÞ; ðam00; ah01; ah10; ah11; tmx ; thy Þg;
fðam00; al01; al10; al11; tlx; tlyÞ; ðah00; ah01; ah10; ah11; tmx ; thy Þg;
Fig. 2. Flow chart of seeking a target transformation.
X. Peng et al. / Information Fusion 5 (2004) 283–298 287
fðal00; al01; al10; al11; tmx ; tlyÞ; ðam00; ah01; ah10; ah11; thx ; thy Þ;
and
fðam00; al01; al10; al11; tmx ; tlyÞ; ðah00; ah01; ah10; ah11; thx ; thy Þ:
The idea of eligible dimension enables us to
decompose a cell into a flexible number of sub-cells.It can be easily seen that a cell can be subdivided
into at most 26¼ 64 sub-cells.(C) A cell is labeled as promising if it is possible that
the cell contains a target transformation. The fol-
lowing cell evaluation algorithm is used to evaluate
a cell.
2.2.2. Cell evaluation algorithm
(a) Compute the uncertainty regions for every point
Pj 2 E2ðj ¼ 1; 2; . . . ; jE2jÞ with respect to the cell Rin question. If an uncertainty region contains at least
one interesting point, we mark this uncertainty re-
gion as qualified.
(b) If the fraction of the qualified uncertainty regions
associated with R is not smaller than f , label R aspromising.
We developed the uncertainty region evaluation algo-
rithm below to quickly judge if an uncertainty region isqualified.
2.2.3. Uncertainty region evaluation algorithm
(a) Initialize a 2D integer array M1 the same sizeof E1. Assume that the size of M1 is JðrowsÞ�IðcolumnsÞ.
(b) If D½0; 0� > s, M1ð0; 0Þ ¼ 0;Else, M1ð0; 0Þ ¼ 1;For i ¼ 1; 2; . . . ; I � 1, doIf D½i; 0� > s, M1ði; 0Þ ¼ M1ði� 1; 0Þ;Else, M1ði; 0Þ ¼ M1ði� 1; 0Þ þ 1;
Fig. 3. Cells at different levels. The lines connect a parent cell and its decomposed sub-cells. In this illustration there are five levels. Cells at the same
level have equal sizes.
288 X. Peng et al. / Information Fusion 5 (2004) 283–298
For j ¼ 1; 2; . . . ; J � 1, doIf D½0; j� > s, M1ð0; jÞ ¼ M1ð0; j� 1Þ;Else, M1ð0; jÞ ¼ M1ð0; j� 1Þ þ 1;
For j ¼ 1; 2; . . . ; J � 1, doFor i ¼ 1; 2; . . . ; I � 1, doIf D½i; j� > s, M1ði; jÞ ¼M1ði� 1; jÞþ>M1ði; j�1Þ �M1ði� 1; j� 1Þ;Else, M1ði; jÞ ¼ M1ði� 1; jÞ þM1ði; j� 1Þ�M1ði� 1; j� 1Þ þ 1.
It follows from the above algorithm that the value of
the element M1ði; jÞ is the number of interesting pointscontained in a rectangle on the plane of E1, whose topleft corner is (0,0) and whose bottom right corner is
ði; jÞ. Thus given an uncertainty region URj for Pj 2 E2,whose top left and bottom right corners are
tlðPjÞ ¼ ðxl; ylÞ and thðPjÞ ¼ ðxh; yhÞ, 1 respectively, thenumber of interesting points contained in URj is
immediately available by M1ðxl � 1; yl � 1Þ� M1ðxh; yl�1Þ �M1ðxl � 1; yhÞþ M1ðxh; yhÞ.
(D) All the promising sub-cells are ordered by the frac-
tion of the qualified uncertainty regions, i.e., themore qualified uncertainty regions a sub-cell pos-
sesses, the higher priority is given to it.
It is easy to prove that when all the sizes of the
uncertainty regions associated with a promising cell Rare less than or equal to 3, the partial Hausdorff distance
hf ðtðE2Þ;E1Þ will exceed s by no more than e ¼ffiffiffi2
p,
where t is the midpoint of R.
1 Since points on a grid plane have integer coordinates, we round
transformed points to the integer-coordinate points on the plane of E1that are nearest to them (as in [17]).
Let us discuss Step (3) of the above registration
algorithm in somewhat greater detail. Firstly, it is
apparent that a depth-first search strategy is adopted in
this step, and at the current level only the cell with the
highest priority (the most promising cell) is decom-
posed to generate successors. Secondly, cells at the
same level are all equally sized (see Fig. 3). Althoughthey differ from each other by different tls and ths, theyshare the common difference transformation Dt. Thusat a given level, we compute the maximum uncertainty
region size of one cell and the result applies to all the
other cells. Analogously we can determine the eligible
dimensions of all the cells at a given level by deter-
mining the eligible dimensions of one cell at this level.
Finally, this step can be parallel processed. Currentlywe try to partition the initial cell into several consecu-
tive and non-overlapping parts and assign each part to
a different computer. We have found that this parallel
processing can find the target transformation more
quickly than by assigning the whole work to a single
computer.
2.3. Comparison with the other Hausdorff distance-based
methods
In this subsection we compare our algorithm with
some other Hausdorff distance-based registration/matching methods. These methods include: (1) the safe
branch-and-bound approximation algorithm [13]; (2)
the bounded alignment algorithm [13]; (3) the multi-
resolution image registration method [28]; (4) the
method of locating objects in an affine transformation
space [17]; and (5) the Hausdorff distance-based horizon
registration method [6].
In [13] Mount et al. proposed two registrationmethods. The first one is a branch-and-bound (B&B)
X. Peng et al. / Information Fusion 5 (2004) 283–298 289
approximation algorithm. This algorithm splits a
promising cell into two sub-cells and uses the middle
point of a cell to update the best similarity and trans-
formation. For each cell, it computes the lower distance
bound and the upper distance bound, and kills the cell ifthe lower bound is not significantly lower than the
current best similarity. One drawback of this algorithm
lies in its inability to control the directed partial Haus-
dorff distance hf ðtðBÞ;AÞ at given quantile f , where t isthe best transformation found in this algorithm. To put
this in another way, we cannot predict the value of
hf ðtðBÞ;AÞ prior to applying the algorithm and
hf ðtðBÞ;AÞ is known only after the algorithm stops. Wehad an experiment to demonstrate it. In this experiment,
the parameters are er ¼ 0:1, ea ¼ 0:3, eq ¼ 0:2 and
q ¼ 0:6 and 1.0. These parameters are adjustable non-negative quantities that act as input arguments for the
algorithm. Fig. 4 depicts the curves of the directed
partial Hausdorff distance hf ðtðBÞ;AÞ versus the quantilef . Let us take f ¼ 0:6 for example. It can be seen fromFig. 4 that at this quantile, the values of hf ðtðBÞ;AÞcorresponding to q ¼ 0:6 and q ¼ 1:0 are quite differentfrom each other. We cannot anticipate these results in
advance. In order for the transformation parameters
refinement algorithm in the next section to work well, a
sufficiently good estimation about the initial transfor-
mation is required. As a matching measure, hf ðtðBÞ;AÞin a sense reflects the quality of the estimated initial
transformation t: from its value we are aware that underthe transformation t, how close a fraction f of the
structural edges of the test image are to the structural
edges of the reference image. For this reason we would
prefer that the value of hf ðtðBÞ;AÞ be controlled under adesired value.
Compared with the B&B algorithm, our method
differs in quite a number of ways:
Fig. 4. The directed partial Hausdorff distance versus the quantile value. The
found in the B&B algorithm [13]. We test the algorithm on two parameter
changed for both sets while q takes different values.
ii(i) Search strategy. Our method adopts a depth-first
(a hill-climbing) search strategy while the B&B
algorithm is based on a best-first search strategy.
It is difficult to theoretically determine which
strategy is better, because the efficiency of bothmethods is dependent on different situations. In
our experiments presented in this paper, our
depth-first based method runs much faster than
the B&B algorithm. Another important benefit
of the depth-first strategy is that, compared with
the best-first strategy, it requires much less mem-
ory. Assuming that the quantity of memory a cell
occupies is m, then the total memory required forstoring the cells in our method is not larger than
64mL, where L is the total number of levels of thecells.
i(ii) Way of killing a cell. We use a fast uncertainty re-
gion evaluation algorithm to quickly test an uncer-
tainty region and kill a cell if it does not have
enough qualified uncertainty regions, while in the
B&B algorithm, some lower and upper boundsare computed for this work.
(iii) Way of splitting a promising cell. Our method splits
a promising cell into a variable number of sub-cells
(depending on the number of eligible dimensions
the cell has), while the B&B algorithm only splits
a promising cell into two sub-cells.
(iv) Parallel processing. Recall from Section 2.2 that we
can implement our method through parallel pro-cessing by splitting the initial cell into several parts
and assign each part to a different processor/com-
puter. Once a target transformation is found in
one processor/computer, all the tasks in the other
processors/computers are immediately terminated.
However, if the B&B algorithm is implemented un-
der the same condition, one has to wait till all the
partial Hausdorff distance is computed using the best transformation
sets. Among the parameters, er ¼ 0:1, ea ¼ 0:3, eq ¼ 0:2 are held un-
290 X. Peng et al. / Information Fusion 5 (2004) 283–298
tasks in different processors/computers are finished
and then choose the best transformation t accordingto hf ðtðBÞ;AÞ. Parallel processing can also help alle-viate the possible problems arising from the draw-
backs of a hill-climbing search strategy (e.g., localmaxima, plateaux, and ridges).
(v) Controllability of hf ðtðBÞ;AÞ. Recall from Section 2.2that we can assure that hf ðtðBÞ;AÞ6 s þ e.
A second algorithm given by Mount et al. is a com-
bination of the above branch-and-bound algorithm with
computation of point alignments (short for BA algo-
rithm). By finding enough alignable uncertainty regions 2
and sampling triple pairs, the BA algorithm could be
much faster than the B&B algorithm in point matching
examples. However, we have found in our experiments
that the BA algorithm is even slower than the B&B
algorithm when applied to edge registration. The main
reason for this slowdown is that, when registering edges,
it is more difficult than in point matching cases for the
BA algorithm to find enough alignable uncertainty re-gions that contain at most one feature point of the other
set. This is due to the connectivity of edges.
The multi-resolution image registration method [28]
has some similarities with the B&B algorithm of [13].
When evaluating a cell, both algorithms compute two
bounds and split a promising cell into two sub-cells.
Instead of using distances as the bounds, this method
uses fractions. The authors of [28] also proposed theidea of multi-class Hausdorff fraction (MCHF). By
segmenting edges into multiple classes (straight lines and
curves in their paper) and replacing the conventional
single-class Hausdorff fraction with the MCHF, they
achieved higher efficiency as less cells were visited. It
must be noted that the MCHF idea can also be directly
applied to our algorithm. Besides, one should also note
that the multi-resolution registration algorithm of [28] isfor the set of similarity transformations rather than for
general affine transformations.
The object location method in [17] can be extended
for image registration. In this method, the continuous
affine transformation space is first digitized to a discrete
one. Then each cell at a lower-resolution level is subject
to an evaluation-and-decomposition process. If a cell is
promising, it is divided into 64 sub-cells (all the samesize) that are stored at a higher-resolution level. The cell
decomposition continues until the finest resolution level
is reached, when each cell contains just a discrete
transformation. Since a promising cell is decomposed
into a fixed number of sub-cells, a cell is required to be
as square as possible. In fact, the author suggested the
2 For a point b 2 B, if the number of points of A that lie within b’suncertainty region is at most one, or if this number is zero and there is
at least one point of A within distance g of the region, this uncertaintyregion is labeled as alignable.
edge lengths of a cell always be a power of 2 and equal
to each other, in each dimension. However, in many
cases the side lengths of a cell might not be ‘‘well bal-
anced’’, i.e., some side length can be much longer or
shorter than the others, which will decrease the efficiencyof this method. In contrast to this method, we do not
need the digitization operation on the continuous affine
transformation space. And more important, the number
of sub-cells generated by a parent cell is variable, mak-
ing our algorithm more flexible in dealing with the ‘‘not-
well-balanced’’ cases.
A Hausdorff distance-based method is proposed in [6]
to register horizons in visual/infrared image pairs.Strictly speaking, this method is not a general scheme to
solve for affine transformation registration problems.
The global affine transformation between two images is
calculated from three local translations that minimize the
partial Hausdorff distances between corresponding sub-
image pairs. The transformation between each sub-im-
age pair is assumed as a translation, which is obviously
not proper for relatively complex circumstances. Thusthe application scope of this method is quite limited.
3. Second step–transformation parameters refinement [20]
The transformation obtained from the first step in
Section 2 may not meet the requirement of high regis-
tration accuracy. We use an area-based method to refinethe transformation parameters to further increase the
registration accuracy, which is based on the maximiza-
tion of normalized correlation given in [20]. This method
is completely different from those traditional area-cor-
relation methods that directly use image intensities, and
can be used for multi-sensor image registration.
Given an image f , a Laplacian-energy image fle of fis formed by first filtering f with a 3 · 3 convolutionmask
1 1 1
1 �8 1
1 1 1
24
35
and then squaring the filtered image pixel-wise. A
Gaussian-pyramid [26] f kleðk ¼ 0; 1; 2; . . .Þ of fle is con-structed subsequently. f 0leð¼ fleÞ is the highest resolutionlevel (bottom level) of the Gaussian pyramid.
Given a reference image I1 and a test image I2, a pointðx� u; y � vÞ in I2 is mapped to the point ðx; yÞ in I1 byan affine motion vector ~uðx; y;~pÞ ¼ ½uðx; y;~pÞ; vðx; y;~pÞ�Texpressed as
~uðx; y;~pÞ ¼ uðx; y;~pÞvðx; y;~pÞ
� �¼ X ðx; yÞ~p
¼ p1 þ p2xþ p3yp4 þ p5xþ p6y
� �ð7Þ
X. Peng et al. / Information Fusion 5 (2004) 283–298 291
where X ðx; yÞ ¼ 1 x y 0 0 0
0 0 0 1 x y
� �and ~p ¼ ½p1; p2;
p3; p4; p5; p6�T. A simple relationship exists between
ða00; a01; a10; a11; tx; tyÞ and ½p1; p2; p3; p4; p5; p6�. 3We use I1le and I2le to denote the Laplacian-energy
images of I1 and I2, respectively. The kth levels of theGaussian pyramids of I1le and I2le are represented by I
k1le
and Ik2le , respectively. Define a normalized-correlationmatching measure Skðx;yÞðu; vÞ computed over a smallwindow W with respect to the pixel ðx; yÞ in Ik1le and ðu; vÞas
Skðx;yÞðu; vÞ ¼ Ik1leðxþ u; y þ vÞ �N Ik2leðx; yÞ
¼P
ði;jÞ2W ðIk1leðxþ iþ u; y þ jþ vÞ � fW ÞðIk2leðxþ i; y þ jÞ � gW ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP
ði;jÞ2W ðIk1leðxþ iþ u; y þ jþ vÞ � fW Þ2
q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPði;jÞ2W ðIk2leðxþ i; y þ jÞ � gW Þ
2q ð8Þ
where fW and gW denote the mean brightness values
within the corresponding windows around pixels
ðxþ u; y þ vÞ in Ik1le and ðx; yÞ in Ik2le , respectively.Our aim is to find the parameter vector ~p that max-
imizes the global similarity measure Mkð~pÞ:
Mkð~pÞ ¼Xðx;yÞ
Skðx;yÞðuðx; y;~pÞ; vðx; y;~pÞÞ
¼Xðx;yÞ
Skðx;yÞð~uðx; y;~pÞÞ ð9Þ
The solution to the above problem is obtained usingNewton’s method. Let ~p0 denote the parameter vectorcomputed in the previous iteration step, and Mkð~pÞ canbe expanded using a second order Taylor series: 4
Mkð~pÞ ¼ Mkð~p0Þ þ ð$~pMkð~p0ÞÞTd~p þ 1
2dT~pHMk ð~p0Þd~p ð10Þ
where d~p ¼~p�~p0; $~pMk denotes the gradient of Mk and
HMk is the Hessian matrix of Mk. Differentiate the right-
hand side of Eq. (10) with respect to d~p and set thedifferential equal to zero:
$~pMkð~p0Þ þ HMk ð~p0Þd~p ¼ 0 ð11Þsolving for d~p and we have
d~p ¼ �ðHMk ð~p0ÞÞ�1$~pMkð~p0Þ ð12Þ
Thus a better estimation of~p is made by~p ¼~p0 þ d~p. InEq. (12) $~pMkð~pÞ and HMk ð~pÞ are computed as
3 It follows from Eqs. (2) and (7) thatxþ uy þ v
� �¼ a00 a01
a10 a11
� �xy
� �þ
txty
� �) u
v
� �¼ a00 � 1 a01
a10 a11 � 1
� �xy
� �þ tx
ty
� �¼ tx þ ða00 � 1Þxþ a01y
ty þ a10xþ ða11 � 1Þy
� �.
Thus p1 ¼ tx, p2 ¼ a00 � 1, p3 ¼ a01, p4 ¼ ty , p5 ¼ a10, and p6 ¼ a11 � 1.4 Note that in [20]Mkð~pÞ ¼ Mkð~p0Þ þ ð$~pMkð~p0ÞÞ
Td~pþ dT~pHMk ð~p0Þd~p,missing the factor ‘‘1/2’’ before the third term.
$~pMkð~pÞ ¼Xðx;yÞ
ðX T:$~uSkðx;yÞð~uÞÞ
HMk ð~pÞ ¼Xðx;yÞ
ðX T � HSkðx;yÞ ð~uÞ � X Þð13Þ
where
$~uSkðx;yÞð~uÞ ¼oSkðx;yÞou
oSkðx;yÞov
h iT
HSkðx;yÞ ð~uÞ ¼oSkðx;yÞou2
oSkðx;yÞouov
oSkðx;yÞouov
oSkðx;yÞov2
24
35 ð14Þ
Substituting Eq. (13) into Eq. (12) provides:
d~p ¼ �Xðx;yÞ
X T � HSkðx;yÞ ð~u0Þ � X !�1
�Xðx;yÞ
X T � $~uSkðx;yÞð~u0Þ !
ð15Þ
where ~u0 ¼~uðx; y;~p0Þ.The steps of the transformation parameters refine-
ment algorithm are summarized as follows:
(1) Warp the test image I2 towards the reference imageI1 using the initial transformation obtained from thefirst step (Section 2) and generate a new image I 02.Construct two Gaussian-pyramids based on I1leand I 02le , where I1le and I
02leare the Laplacian-energy
images of I1 and I 02, respectively. Set ~p0 to zero andproceed to Step (2) at the coarsest resolution level
of the pyramids.
(2) At the kth level of the Gaussian-pyramids, for eachpixel ðx; yÞ of Ik1le , compute a correlation matchingmeasure surface Skðx;yÞð~uÞ around ~u0, where ~u ¼ðu; vÞ satisfies k~u�~u0k6 d. Use Beaudet’s masks[27] for the computation of $~uSkðx;yÞð~uÞ and HSkðx;yÞ ð~uÞ.
(3) Compute the increment d~p of~p (Eq. (15)) and update~p by ~p ¼~p0 þ d~p.
(4) After repeating Steps (2) and (3) for a few it-
erations (four times in our implementation),
propagate ~p to the ðk � 1Þth level of the Gauss-ian-pyramids and repeat the computation at that
level. This iteration and propagation process stops
when the computation at the finest resolution level
(k ¼ 0) finishes.
292 X. Peng et al. / Information Fusion 5 (2004) 283–298
In our implementation, the window size for com-
puting the normalized correlation is 5 · 5. 3 · 3 Beau-det’s masks are used for computing $~uSkðx;yÞð~uÞ andHSkðx;yÞ ð~uÞ (accordingly in Step (2) d ¼
ffiffiffi2
p). An image-
warping operation is added before each iteration, i.e., Ik02leis warped towards Ik1le according to the current~p0. Afterwarping the images,~p0 is set to zero and d~p is estimatedbased on the reference and warped test images.
To further boost the robustness of the iteration step
of Eq. (15), an ‘‘outlier rejection’’ mechanism is adop-
ted: only pixels ðx; yÞ for which the surface Skðx;yÞð~uÞaround ~u0 is concave are used for solving for d~p. How-ever, how to judge whether the surface Skðx;yÞð~uÞ around~u0 is concave is not mentioned in [20]. In fact, we de-velop a simple criterion for this purpose: if the Hessian
matrix HSkðx;yÞ ð~uÞ is negative semi-definite, we judge thesurface Skðx;yÞð~uÞ as concave.When this transformation parameters refinement
algorithm is over, we have a final motion vector ~pa.Following the relationship described in footnote 3, we
have a corresponding affine transformation whose
parameters are contained in a 2 · 2 matrix Aa and atranslation vector ta. Keep in mind that this affine
transformation is obtained on the basis of the initial
affine transformation from the first step (Section 2). We
represent the initial transformation using a 2 · 2 matrixAb and a translation vector tb. Combining these twotransformations we have
x1 ¼ AaðAbx2 þ tbÞ þ ta ¼ AaAbx2 þ ðAatb þ taÞ ð16ÞHence, the final transformation can be represented by
A ¼ AaAb and t ¼ Aatb þ ta.
Fig. 5. The performance evaluation experiment. (a) The reference
image (200· 200 pixels). (b) The test image (300· 300 pixels). The testimage is generated by scaling and re-sampling the reference image and
is further corrupted by Gaussian white noises (SNR¼ 13.01).
4. Experimental results
We want first to demonstrate the performance of our
approach with an experiment in which the true align-
ment is known a priori, but in which the gray-level
correspondence between the images is uncontrolled.This will in a sense give an objective evaluation of the
quality of our algorithm. Our second example registers
two multi-sensor and multi-temporal satellite images. In
the third experiment, the approach is used to register a
visual/infrared image pair that has a distinct disparity in
image resolutions between each other. A fourth example
presents the results of applying our approach for the
registration of images from a single sensor but taken atdifferent moments.
In all the experiments, the parameters for the feature
extraction are set to r0 ¼ 3 and Dr ¼ 0:4, and the fea-ture extraction stops when r ¼ 1. The feature imagesand the Laplacian-energy images are normalized so that
the maximum edge strengths are 255. In the first step,
the parameters f , tl and th are provided by the user.
Choice of these parameters is mostly dependent on the a
priori knowledge about the characteristics of the images.
For example, if we know beforehand that a majority of
the structural edges in the test image may be matched to
those in the reference image, we can set a relatively highvalue to f ; otherwise, we can set a relatively low value tof . Generally speaking, f takes relatively smaller valuesin multi-sensor imagery registration cases than in those
cases where the images to be registered are from sensors
of similar type. tl and th should be chosen to cover thetarget transformation. These two parameters will not
influence the outcome (i.e., the target transformation is
bound to be found as long as it is contained in the initialcell determined by tl and th) but will influence theimplementation time: with a narrow span between tl andth, the time taken is small; with a wider span between tl
and th, more time can be taken to find the target
transformation. A rough estimation of the target
transformation will aid in choosing tl and th. In thesecond step, we use three-level Gaussian-pyramids for
the first three experiments and four-level Gaussian-pyramids for the fourth experiment. After a test image is
registered to a reference image, we use a wavelet trans-
form-based image fusion algorithm [23] to fuse the
overlapping parts of them. The wavelet transform filters
for the image fusion algorithm are listed in Table 2 of
Appendix A. All the experiments are implemented using
Visual C++ 6.0 on a Pentium III PC running Windows
2000.
4.1. Performance evaluation experiment
In this experiment, we use a 200 · 200 Lena image asthe reference image. The reference image is scaled to 1.5
times of its original size and re-sampled to generate the
test image (Fig. 5). Then we add Gaussian white noise to
further corrupt the test image. We measure the amount
X. Peng et al. / Information Fusion 5 (2004) 283–298 293
of added noise as a signal-to-noise ratio (SNR) ex-
pressed in decibels: 10 log10r2ðf Þr2ðnÞ, where r2ðf Þ is the va-
riance of an imageand r2ðnÞ is the variance of the addednoise. In this experiment the SNR¼ 13.01. The optimaltransformation between the test and reference images is
(0.6667, 0.0, 0.0, 0.6667, 0.0, 0.0). We set s ¼ 2, f ¼ 0:8,tl ¼ ð0:5; 0:0; 0:0; 0:5;�20;�20Þ, and th ¼ ð0:8; 0:0; 0:0;0:8; 20; 20Þ for the first step of the proposed method. Thefinal transformation obtained is (0.6667, 10�4, 7.49 ·10�4, 0.6671, )0.2767, )0.1666). Transforming the testimage to the reference image by this transformation, we
obtain an accuracy of registration ranging between 0.27
and 0.35 pixels. This result also shows the noise-resis-
tance ability of our method.
4.2. Satellite imagery registration experiment
In this experiment, the images to be registered are
respectively from SPOT band 3 (taken on 08/08/95) and
Landsat TM band 4 (taken on 06/07/94) of Brasilia,
Brazil. Both images have a resolution of 256 · 256 pixels.The thresholds used in the hysteresis approach for the
reference and test images are T1 ¼ 100, T2 ¼ 50, andT1 ¼ 120, T2 ¼ 60, respectively. After thresholded with
Fig. 6. The satellite imagery registration experiment. (a) The reference ima
256· 256 pixels). (c) Edge image of (a). (d) Edge image of (b). (e) Mosaic of (cof (a) and (b) using the final transformation from the second step.
hysteresis, the remaining edges whose lengths are shorter
than 15 and 20 pixels in the reference and test images,
respectively, are deleted. The original images and cor-
responding edge images are shown in Fig. 6(a)–(d). It
can be seen from these images that it is a great challengefor most existing feature-correspondence registration
methods. The other parameters for the first step of
our method are s ¼ 2, f ¼ 0:5, tl ¼ ð1:0; 0:0; 0:0; 1:0;�100;�100Þ, and th ¼ ð1:0; 0:0; 0:0; 1:0; 100; 100Þ. Ittook our Hausdorff distance-based image registration
algorithm 2.94 s to find an initial transformation (1.0,
0.0, 0.0, 1.0, )7.0313, )78.9063). The mosaic of Fig. 6(c)and (d) using this transformation is shown in Fig. 6(e).For the sake of a desirable visual effect of the mosaic
image, in this example as well as in the following
examples, we set the intensity of the edge pixels from the
reference image to 80; the intensity of the edge pixels
from the test image is set to 255. An additional 43.73 s
was taken to refine the transformation parameters. The
final transformation is (1.0004, 0.0271, )0.0198, 1.0005,)12.1249, )76.2377). The mosaic of Fig. 6(a) and (b)using the final transformation is shown in Fig. 6(f). Note
that the rotation effect initially not included in the initial
transformation is recovered. This also indicates the
necessity of the refinement of the initial transformation.
ge (SPOT band 3, 256· 256 pixels). (b) The test image (TM band 4,
) and (d) using the initial transformation from the first step. (f) Mosaic
294 X. Peng et al. / Information Fusion 5 (2004) 283–298
4.3. Visual/infrared image pair registration experiment
In this experiment, the image pair is from the Fort
Carson data set that is publicly available from the
website: http://www.cs.colostate.edu/~vision. The origi-nal visual image is a color one and we converted it into a
256-level gray-scale image so that it can be used in our
experiment. The reference image is an infrared image,
which is in the size of 256 · 256 pixels. The visual imageis used as the test image and has a resolution of
480 · 720 pixels. The thresholds used in the hysteresisapproach for the reference and test images are T1 ¼ 40,T2 ¼ 20, and T1 ¼ 100, T2 ¼ 50, respectively. Afterthresholded with hysteresis, the edges whose lengths are
Fig. 7. The visual/infrared image pair registration experiment. (a) The refere
image, 480· 720 pixels). (c) Edge image of (a). (d) Edge image of (b). (e) MosMosaic of (a) and (b) using the final transformation from the second step.
shorter than 10 pixels in the test image are deleted. The
original images and corresponding edge images are
shown in Fig. 7(a)–(d). The distinct contrast between the
dimensions of the images makes it impossible to directly
apply area-based methods. The other parameters for thefirst step of our method are s ¼ 2, f ¼ 0:5, tl ¼ð0:5; 0:0; 0:0; 0:5;�100;�50Þ, and th ¼ ð1:0; 0:0; 0:0;1:0; 0:0; 0:0Þ. It took our Hausdorff distance-based im-age registration algorithm 3.95 s to find an initial
transformation (0.5732, 0.0, 0.0, 0.5791, )75.9766,)13.0859). The mosaic of Fig. 7(c) and (d) using thistransformation is shown in Fig. 7(e). An additional
66.98 s was taken to refine the transformation parame-ters to yield the final transformation (0.5685, 0.0068,
nce image (infrared image, 256· 256 pixels). (b) The test image (visualaic of (c) and (d) using the initial transformation from the first step. (f)
X. Peng et al. / Information Fusion 5 (2004) 283–298 295
)0.0041, 0.5832, )74.2895, )13.5573). The mosaic ofFig. 7(a) and (b) using the final transformation is shown
in Fig. 7(f).
4.4. Single-sensor image registration experiment
In this experiment, the images for registration are two
pictures of an airport taken prior and posterior to an air
Fig. 8. The single-sensor image registration experiment. (a) The reference im
(taken posterior to an air strike, 447· 720 pixels). (c) The negative of (a). (dMosaic of (e) and (f) using the initial transformation from the first step. (h) M
final transformation from the second step. Then the intensity levels of the re
strike. The registration is prerequisite to a battlefield
damage assessment study carried out in our lab. In the
test image there are some craters on the runway plus
some white arrows. In order to retain the craters in the
registered and fused image, we did this experiment alittle differently from the previous experiments.
We first obtain two negative images of the original
reference and test images. To put it formally, a negative
age (taken prior to an air strike, 412 · 683 pixels). (b) The test image) The negative of (b). (e) Edge image of (c). (f) Edge image of (d). (g)
osaic of (a) and (b). We first register and fusion (c) and (d) using the
sult is reversed using the negative transformation to give (h).
296 X. Peng et al. / Information Fusion 5 (2004) 283–298
image g0ðx; yÞ is acquired by applying the negative
transformation g0ðx; yÞ ¼ 255� gðx; yÞ to an original
image gðx; yÞ. The original reference and test images andtheir negative counterparts are shown in Fig. 8(a)–(d).
Our method is then applied in Fig. 8(c) and (d). Thethresholds used in the hysteresis approach for the ref-
erence and test images are T1 ¼ 80, T2 ¼ 40, andT1 ¼ 60, T2 ¼ 30, respectively. After thresholded withhysteresis, the remaining edges whose lengths are shorter
than 30 pixels in the reference and test images, respec-
tively, are deleted. The other parameters for the first step
of our method are s ¼ 3, f ¼ 0:6, tl ¼ ð�0:6; 0:8;�0:6;�0:5; 350; 420Þ, and th ¼ ð�0:5; 0:9;�0:5;�0:4;380; 450Þ. It took our Hausdorff distance-based imageregistration algorithm 7.13 s to find an initial transfor-
mation ()0.5242, 0.8445, )0.5258, )0.4133, 363.359,432.891). The mosaic of Fig. 8(e) and (f) using this
transformation is shown in Fig. 8(g). An additional
202.40 s was taken to refine the transformation param-
eters to yield the final transformation ()0.5460, 0.8880,)0.5068, )0.4253, 363.809, 430.577). The registered andfused image of Fig. 8(a) and (b) using the final trans-
formation is shown in Fig. 8(h). The outliers in the test
image – the craters and arrows – proved the correctness
of our concave surface judgment criterion for the
‘‘outlier rejection’’ mechanism.
We also performed the B&B algorithm and the BA
algorithm for comparison. The parameters are er ¼ 0:1,ea ¼ 0:3, eq ¼ 0:2 for all experiments (these parametersare recommended by the authors of [13]). We use
q ¼ 0:625 for the satellite imagery and visual/infraredregistration experiments and q ¼ 0:8 for the single-sen-sor image registration experiment. In the BA algorithm,
the parameters are g ¼ 0:5, Ns ¼ 20, and qsjE2j ¼ 100.The lower and upper transformations tl and th are thesame as those used in our method. The times taken to
run the two algorithms on the last three experiments aresummarized in Table 1, which also includes the execu-
tion times of ours.
Table 1
A comparison of execution times of different Hausdorff distance-based
algorithms (wall-clock time, s)
Methods B&B
algorithm
BA
algorithm
Our Hausdorff
distance-based
algorithm
Experiments
Satellite imagery
registration
11.19 27.41 2.94
Visual/infrared
registration
79.85 152.37 3.95
Single-sensor
registration
1228.96 2415.28 7.13
It can be seen in Table 1 that in these examples our
Hausdorff distance-based image registration algorithm
is much faster than both the two algorithms.
The second step of our method may take most of the
total execution time. Nevertheless, we still argue thatthis step is necessary. On the one hand, the parameters
refinement step can help to recover the components not
assumed in the initial transformation. This is particu-
larly important when the initial estimation about the
transformation is not quite accurate. Recall the rotation
restored in the satellite imagery registration example. On
the other hand, the registration accuracy by the initial
transformation from the first step may not be preciseenough, especially when s is relatively large or f is rel-atively small. So there is a natural requirement to fur-
ther refine the transformation parameters to acquire a
higher accuracy. Some acceleration techniques such as
Lewis algorithm [30] can be used to reduce the imple-
mentation time of this step.
5. Conclusions and discussions
In this paper, we present a practical two-step method
for registering two-dimensional images involving 2D
affine transformations. This method requires the user to
provide a number of parameters as input. Our major
contribution is the first step, which is a Hausdorff dis-
tance-based algorithm that provides an efficient searchin the 2D affine transformation space. A depth-first
search strategy is adopted in this algorithm. This algo-
rithm can be implemented on parallel processors to
achieve even higher efficiency. We also compare this
algorithm with several other methods based on the
Hausdorff distance.
The Hausdorff distance-based algorithm yields an
initial transformation based on which a refinement of thetransformation parameters proceeds and a final trans-
formation is obtained. In this way, we achieve both
efficiency and accuracy by combining these two methods.
We have found in the literature a hybrid method [24]
that also attempts to combine both feature-based and
area-based methods. However, we must point out that
the feature-based method in [24] is a feature correspon-
dence one in that it uses gravity centers of smoothedimages to estimate the initial transformation. Thus it is
not as robust as our Hausdorff distance-based algorithm.
Another problem with this hybrid method is that it
adopts the gradient-based optical flow method to refine
the initial transformation, which will not work well when
large changes in illumination and contrast exist between
the images. As a result, its applicability is severely limited.
The problem of image registration is not at all trivial.While the affine transformation is quite general and
X. Peng et al. / Information Fusion 5 (2004) 283–298 297
applicable to a large number of real-world applications,
it is inadequate for modeling distortions that occur due
to some sensor peculiarities. Our method cannot be di-
rectly applied to the imagery from push-broom sensors
(e.g. HYDICE––hyper-spectral digital imagery collec-tion experiment sensor) or whiskbroom sensors (e.g.
HyMap sensor), whose geometric models are much
more complex. In order for our method to be used on
such kinds of images some preprocessing on the raw
image data is required. One possibility is to use the
techniques described in [31] to ortho-rectify the raw
imagery first and then our method might apply.
Our future work will involve more complicatedtransformation models, including the non-linear ones.
Acknowledgements
This work is supported partly by China’s National
Science Foundation (grant no. 60135020 F F 030405).
The authors would like to appreciate the valuablecomments from the anonymous reviewers.
Appendix A
The wavelet transform filters for the image fusion
algorithm [23].
Table 2
h and g are the forward wavelet transform filters. hi and gi are thereverse wavelet transform filters
n h g hi gi
)12 0.002
)11 )0.002 )0.002 )0.003)10 )0.003 0.002 )0.003 )0.006)9 0.006 )0.003 0.006 0.006
)8 0.006 )0.006 0.006 0.013
)7 )0.013 0.006 )0.013 )0.012)6 )0.012 0.013 )0.012 )0.030)5 0.030 )0.012 0.030 0.023
)4 0.023 )0.030 0.023 0.078
)3 )0.078 0.023 )0.078 )0.035)2 )0.035 0.078 )0.035 )0.307)1 0.307 )0.035 0.307 0.542
0 0.542 )0.307 0.542 )0.3071 0.307 0.542 0.307 )0.0352 )0.035 )0.307 )0.035 0.078
3 )0.078 )0.035 )0.078 0.023
4 0.023 0.078 0.023 )0.0305 0.030 0.023 0.030 )0.0126 )0.012 )0.030 )0.012 0.013
7 )0.013 )0.012 )0.013 0.006
8 0.006 0.013 0.006 )0.0069 0.006 0.006 0.006 )0.00310 )0.003 )0.006 )0.003 0.002
11 )0.002 )0.003 )0.00212 0.002
References
[1] L.G. Brown, A survey of image registration techniques, ACM
Computing Surveys 24 (4) (1992) 326–376.
[2] J. Flusser, T. Suk, A moment-based approach to registration of
images with affine geometric distortion, IEEE Transactions on
Geoscience and Remote Sensing 32 (2) (1994) 382–387.
[3] X. Dai, S. Khorram, A feature-based image registration algorithm
using improved chain-code representation combined with invari-
ant moments, IEEE Transactions on Geoscience and Remote
Sensing 37 (5) (1999) 2351–2362.
[4] H. Li, B.S. Manjunath, S.K. Mitra, A contour-based approach to
multisensor image registration, IEEE Transactions on Image
Processing 4 (3) (1995) 320–334.
[5] E. Coiras, J. Santamaria, C. Miravet, Segment-based registration
techniques for visual-infrared images, Optical Engineering 39 (1)
(2000) 282–289.
[6] Y. Sheng, X. Yang, D. McReynolds, Z. Zhang, L. Gagnon, L.
S�evigny, Real-world multisensor image alignment using edge
focusing and Hausdorff distances, Proceedings of SPIE 3719
(1999) 173–185.
[7] J.W. Hsieh, H.Y.M. Liao, K.C. Fan, M.T. Ko, Y.P. Hung, Image
registration using a new edge based approach, Computer Vision
and Image Understanding 67 (2) (1997) 112–130.
[8] Q. Zheng, R. Chellappa, A computational vision approach to
image registration, IEEE Transactions on Image Processing 2 (3)
(1993) 311–326.
[9] H.H. Li, Y.T. Zhou, Automatic visual/IR image registration,
Optical Engineering 35 (2) (1996) 391–400.
[10] Z. Yang, F.S. Cohen, Image registration and object recognition
using affine invariants and convex hulls, IEEE Transactions on
Image Processing 8 (7) (1999) 934–946.
[11] Z. Yang, F.S. Cohen, Cross-weighted moments and affine
invariants for image registration and matching, IEEE Transac-
tions on Pattern Analysis and Machine Intelligence 21 (8) (1999)
804–814.
[12] S.H. Chang, F.H. Cheng, W.H. Hsu, G.Z. Wu, Fast algorithm for
point pattern matching: invariant to translation, rotations and
scale changes, Pattern Recognition 30 (2) (1997) 311–320.
[13] D.M. Mount, N.S. Netanyahu, J.L. Moigne, Efficient algorithms for
robust feature matching, Pattern Recognition 32 (1) (1999) 17–38.
[14] C. Shekhar, V. Govindu, R. Chellappa, Multisensor image
registration by feature consensus, Pattern Recognition 32 (1)
(1999) 39–52.
[15] B.S. Manjunath, C. Shekhar, R. Chellappa, A new approach to
image feature detection with applications, Pattern Recognition 29
(4) (1996) 627–640.
[16] F. Bergholm, Edge focusing, IEEE Transaction on Pattern
Analysis and Machine Intelligence 9 (6) (1987) 726–741.
[17] W.J. Rucklidge, Efficiently locating objects using the Hausdorff
distance, International Journal of Computer Vision 24 (3) (1997)
251–270.
[18] J.R. Bergen, P. Anadan, K.J. Hanna, R. Hingorani, Hierarchical
model-based motion estimation, in: Proceedings of the 2nd
European Conference on Computer Vision, Santa Margherita,
Italy, May 1992, pp. 237–252.
[19] R.K. Sharma,M. Pavel,Multisensor image registration, SIDDigest
of Society for Information Display (XXVIII) (1997) 951–954.
[20] M. Irani, P. Anadan, Robust multi-sensor image alignment, in:
Proceedings of 6th International Conference on Computer Vision,
Bombay, India, January 1998, pp. 959–966.
[21] P. Violar, W.M. Wells III, Alignment by maximization of mutual
information, in: Proceedings of 5th International Conference on
Computer Vision, Boston, MA, USA, June 1995, pp. 16–23.
298 X. Peng et al. / Information Fusion 5 (2004) 283–298
[22] P. Th�evenaz, M. Unser, Optimization of mutual information for
multiresolution image registration, IEEE Transaction on Image
Processing 9 (12) (2000) 2083–2099.
[23] H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion
using the wavelet transform, Graphical Models and Image
Processing 57 (3) (1995) 235–245.
[24] Z. Zhang, R.S. Blum, A hybrid image registration technique for a
digital camera image fusion application, Information Fusion 2
(2001) 135–149.
[25] G. Borgefors, Hierarchical Chamfer matching: a parametric edge
matching algorithm, IEEE Transaction on Pattern Analysis and
Machine Intelligence 10 (6) (1988) 849–865.
[26] P.J. Burt, E.H. Adelson, The Laplacian pyramid as a compact image
code, IEEE Transaction on Communications 31 (4) (1983) 532–540.
[27] P.R. Beaudet, Rotationally invariant image operators, in: 4th
International Joint Conference on Pattern Recognition, Kyoto,
Japan, November 1978, pp. 579–583.
[28] H.S. Alhichri, M. Kamel, Multi-resolution image registration
using multi-class Hausdorff fraction, Pattern Recognition Letters
23 (2002) 279–286.
[29] M. Sonka, V. Hlavac, R. Boyle, Image processing, analysis,
and machine vision, second ed., Brooks/Cole, California, USA,
1999.
[30] J. Lewis, Fast template matching, Vision Interface (1995) 120–
123.
[31] C. Lee, J. Bethel, Georegistration of airborne hyperspectral image
data, IEEE Transaction on Geoscience and remote sensing 39 (7)
(2001) 1347–1351.