A performance study for camera pose estimation using visual marker based tracking

12
Machine Vision and Applications (2010) 21:365–376 DOI 10.1007/s00138-008-0170-y ORIGINAL PAPER A performance study for camera pose estimation using visual marker based tracking Madjid Maidi · Jean-Yves Didier · Fakhreddine Ababsa · Malik Mallem Received: 16 February 2007 / Accepted: 10 September 2008 / Published online: 1 October 2008 © Springer-Verlag 2008 Abstract Vision-based tracking systems are widely used for augmented reality (AR) applications. Their registration can be very accurate and there is no delay between real and virtual scene. However, vision-based tracking often suffers from limited range, errors, heavy processing time and present erroneous behavior due to numerical instability. To address these shortcomings, robust method are required to overcome these problems. In this paper, we survey classic vision-based pose computations and present a method that offers increa- sed robustness and accuracy in the context of real-time AR tracking. In this work, we aim to determine the performance of four pose estimation methods in term of errors and exe- cution time. We developed a hybrid approach that mixes an iterative method based on the extended Kalman filter (EKF) and an analytical method with direct resolution of pose para- meters computation. The direct method initializes the pose parameters of the EKF algorithm which performs an optimi- zation of these parameters thereafter. An evaluation of the pose estimation methods was obtained using a series of tests and an experimental protocol. The analysis of results shows that our hybrid algorithm improves stability, convergence and accuracy of the pose parameters. M. Maidi (B ) · J.-Y. Didier · F. Ababsa · M. Mallem IBISC Laboratory, 40 Rue du Pelvoux, CE 1455 Courcouronnes, 91020 Évry Cedex, France e-mail: [email protected] J.-Y. Didier e-mail: [email protected] F. Ababsa e-mail: [email protected] M. Mallem e-mail: [email protected] 1 Introduction The camera pose estimation is a crucial step for augmented reality (AR) applications. It allows the projection of syn- thetic models at the right location on real images. Accurate and robust camera pose parameters, are a prerequisite for a variety of applications including dynamic scene analysis and interpretation, 3D scene structure extraction and video data compression [13]. AR environments in which synthe- tic objects are inserted into a real scene, is a prime candi- date since a potentially restricted workspace demands robust and fast pose estimation from few feature points. Several approaches are formulated to solve the camera pose para- meters. The problem is considered as a nonlinear problem and it is solved by least squares methods or nonlinear opti- mization algorithms, typically, the Gauss–Newton [3, 14] or Levenberg–Marquardt method [17]. Most solutions are itera- tive and depend on nonlinear optimization of some geometric constraints, either on the world coordinates or on the projec- tions to the image plane. For real-time applications, we are interested in linear or closed-form solutions free of initiali- zation. In this work, our primary interest is to develop a pose esti- mation algorithm for real-time applications for which only four points are required to determine pose. The motion of the camera is usually unpredictable, in such scenarios our algorithm does not require initialization. Dhome et al. [9] developed an analytical pose estimation method based on the interpretation of a triplet of any image lines and on the search of the model attitude. Ansar and Danii- lidis [2] estimated the camera pose from an image of n points or lines with known correspondences. The authors presented a general framework which allow the pose estimation for both n points and n lines. Lu et al. [15] developed a fast and globally convergent pose estimation algorithm, called, 123

Transcript of A performance study for camera pose estimation using visual marker based tracking

Machine Vision and Applications (2010) 21:365–376DOI 10.1007/s00138-008-0170-y

ORIGINAL PAPER

A performance study for camera pose estimation using visualmarker based tracking

Madjid Maidi · Jean-Yves Didier ·Fakhreddine Ababsa · Malik Mallem

Received: 16 February 2007 / Accepted: 10 September 2008 / Published online: 1 October 2008© Springer-Verlag 2008

Abstract Vision-based tracking systems are widely usedfor augmented reality (AR) applications. Their registrationcan be very accurate and there is no delay between real andvirtual scene. However, vision-based tracking often suffersfrom limited range, errors, heavy processing time and presenterroneous behavior due to numerical instability. To addressthese shortcomings, robust method are required to overcomethese problems. In this paper, we survey classic vision-basedpose computations and present a method that offers increa-sed robustness and accuracy in the context of real-time ARtracking. In this work, we aim to determine the performanceof four pose estimation methods in term of errors and exe-cution time. We developed a hybrid approach that mixes aniterative method based on the extended Kalman filter (EKF)and an analytical method with direct resolution of pose para-meters computation. The direct method initializes the poseparameters of the EKF algorithm which performs an optimi-zation of these parameters thereafter. An evaluation of thepose estimation methods was obtained using a series of testsand an experimental protocol. The analysis of results showsthat our hybrid algorithm improves stability, convergence andaccuracy of the pose parameters.

M. Maidi (B) · J.-Y. Didier · F. Ababsa · M. MallemIBISC Laboratory, 40 Rue du Pelvoux, CE 1455 Courcouronnes,91020 Évry Cedex, Francee-mail: [email protected]

J.-Y. Didiere-mail: [email protected]

F. Ababsae-mail: [email protected]

M. Malleme-mail: [email protected]

1 Introduction

The camera pose estimation is a crucial step for augmentedreality (AR) applications. It allows the projection of syn-thetic models at the right location on real images. Accurateand robust camera pose parameters, are a prerequisite fora variety of applications including dynamic scene analysisand interpretation, 3D scene structure extraction and videodata compression [13]. AR environments in which synthe-tic objects are inserted into a real scene, is a prime candi-date since a potentially restricted workspace demands robustand fast pose estimation from few feature points. Severalapproaches are formulated to solve the camera pose para-meters. The problem is considered as a nonlinear problemand it is solved by least squares methods or nonlinear opti-mization algorithms, typically, the Gauss–Newton [3,14] orLevenberg–Marquardt method [17]. Most solutions are itera-tive and depend on nonlinear optimization of some geometricconstraints, either on the world coordinates or on the projec-tions to the image plane. For real-time applications, we areinterested in linear or closed-form solutions free of initiali-zation.

In this work, our primary interest is to develop a pose esti-mation algorithm for real-time applications for which onlyfour points are required to determine pose. The motion ofthe camera is usually unpredictable, in such scenarios ouralgorithm does not require initialization.

Dhome et al. [9] developed an analytical pose estimationmethod based on the interpretation of a triplet of any imagelines and on the search of the model attitude. Ansar and Danii-lidis [2] estimated the camera pose from an image of n pointsor lines with known correspondences. The authors presenteda general framework which allow the pose estimation forboth n points and n lines. Lu et al. [15] developed a fastand globally convergent pose estimation algorithm, called,

123

366 M. Maidi et al.

orthogonal iteration (OI). The pose estimation problem isformulated as problem of error minimization based on objectcollinearity in image space. In [16], Maidi et al. used an exten-ded Kalman filter (EKF) to estimate the transformation bet-ween the object and the camera coordinate frames. Based onthe knowledge of the feature point position in the cameraframe, the perspective projection matrix of the camera iscomputed and solved using the two steps of the EKF. Didier[10] presented a new analytical method based on a directparameters computation algorithm. The method uses fourcollinear points of the target object, the author computed thereal depth of vertices, then determined the rotation and thetranslation of the object coordinate frame according to thecamera coordinate frame.

Several methods based photogrammetry and using closed-form solutions for three points, were developed in the lite-rature [4,11,12,19]. Quan and Lan [18] proposed a familyof linear methods that yield a unique solution to four andfive point pose determination for generic reference points.The authors showed that their methods do not degeneratefor coplanar configurations and even outperform the speciallinear algorithm for coplanar configurations in practice.

In this paper, we develop a new pose estimation algorithmbased on a combination of an analytical and an iterativemethod. We use an EKF to perform a nonlinear optimiza-tion of pose parameters which were initialized by an ana-lytical algorithm. We present a comparative study betweenfour pose estimation algorithms, then, we evaluate the per-formance of these methods using a series of tests. The meanerrors and standard deviation are estimated, we compute alsothe reconstruction and generalization errors. For real distanceestimation, we use an experimental robot bench to computecamera-target distance with the different algorithms accor-ding to real distance computed by the robot.

Our hybrid method proves its efficiency and accuracy forpose parameters estimation. The robustness is obtained byintegrating the analytical method into the EKF to initializethe first guesses of pose parameters and then performing aminimization using a numerical process to fit these parame-ters. The method presented in this paper has been validatedon several image sequences, results show that the method isrobust and presents the best performances in term of recons-truction and generalization error and real distance estimation.

The remainder of the paper is organized as follows.Section 2 details the pose estimation methods. In Sect. 3,we show the obtained experimental results. Discussion ispresented in Sect. 4. We finish by Sect. 5 where we presentconclusion and future work.

2 Pose estimation algorithms

Pose estimation is the determination of the transformationbetween the measured data coordinate frame and the model

Fig. 1 Pose parameters: rotation and translation of the object coordi-nate frame according to the camera coordinate frame

data coordinate frame (Fig. 1). Several pose estimation algo-rithms were developed in the literature, however, there isusually two kinds of algorithms: analytical and iterative. Ana-lytical methods are direct algorithms admitting finite set ofsolutions. On the other hand, iterative methods are based onthe minimization of a certain error criterion.

The camera pose estimation is based on the geometri-cal extraction of primitives which allow the matching of2D points extracted from the image (expressed in the imagereference frame) with known 3D points of object (expressedin the associated workspace reference frame). To determinethe pose, the 2D–3D pairs of points must be known and thecamera is supposed calibrated.

The camera calibration determines the geometrical modelof an object and the corresponding image formation systemwhich is described by the following equation [21]:

⎛⎝

susvs

⎞⎠ = Ix

(R T

)⎛⎜⎜⎝

XYZ1

⎞⎟⎟⎠ (1)

where s is an arbitrary scale factor (R, T ), called the extrinsicparameters, is the rotation and translation which relate theworld coordinate system to the camera coordinate systemand Ix called the camera intrinsic matrix given by:

Ix =⎛⎜⎝αu 0 u0

0 αv v0

0 0 1

⎞⎟⎠ (2)

with (u0, v0) the coordinates of the principal point and αu

and αv the scale factors according to u and v image axes.Ix is computed from the camera calibration procedure and

remains unchanged during the experiments. The purpose ofthe work is to compute the extrinsic matrix (R, T ), whichrepresents the camera pose.

Once the internal parameters of the camera are determinedthe pose can be computed using a set of 2D–3D matching

123

Performance study for camera pose estimation 367

points. Due to the number of pose estimation algorithms, wewill not evoke in an exhaustive way all methods proposedin the literature. However, we will present only the mostrepresentative and which can be computed in real time, thisfactor being critical for AR systems.

2.1 Analytical algorithm

Analytical methods use a reduced number of points. Thecomplexity of these methods and the execution time are low.Pose parameters are accurate in these methods, however, thedepth is not well estimated. There are many analytical algo-rithms [7,9,11], these methods differ essentially in the tech-nique of resolution and the used number of points. Didier [10]developed a new analytical algorithm based coded squaretargets. The position and the orientation are computed whencorrect code of target is found. The method requires theknowledge of:

− Intrinsic parameters of the camera.− Coordinates of the four corners of the fiducial in the

image.− Real measurement of a fiducial side.

The algorithm is composed of two parts. The first part consistson computing the real depth of fiducial vertices and the secondpart is the pose computation. The fiducial has a square shape,so one has the following property:

−→AB = −→

DC

Applying the perspective model of the camera, one gets thefollowing expression:⎛⎜⎝

u B −uC u D

vB −vC vD

−1 1 −1

⎞⎟⎠

⎛⎜⎝

Z B

ZC

Z D

⎞⎟⎠ =

⎛⎜⎝vA

u A

−1

⎞⎟⎠ (3)

Solving the Eq. 3, the depth of the four square corners isgiven by the following formulas:

Z B = 1det M [u A (vC − vD)+ vA (u D − uC )

+ (uCvD − u DvC )]

ZC = 1det M [u A (vB − vD)+ vA (u D − u B)

+ (u DvB − u BvD)]

Z D = 1det M [u A (vB − vC )+ vA (uC − u B)

+ (u BvC − uCvB)]

det M = (uCvD − u DvC )+ (u DvB − u BvD)

+ (u BvC − uCvB)

(4)

Once the real depth is known, one determines the translationand the orientation of the fiducial toward the camera. Thetranslation is determined using the fiducial center computedfrom the coordinates of fiducial verteces, A, B, C and D. The

rotation matrix is given by the three following vectors

r1 =−→AB + −→

DB∥∥∥−→AB + −→

DB∥∥∥, r2 =

−→AC − −→

DB∥∥∥−→AC − −→

DB∥∥∥, r3 = r1 ∧ r2

(5)

2.2 Hybrid orthogonal iteration algorithm

In this method, the pose estimation is formulated as errormetric minimization based on collinearity in object space.Using object space collinearity error (Fig. 2), an iterativealgorithm is derived to compute orthogonal rotation matrices[15].

The mapping from 3D reference points to 2D image coor-dinates is formalized as follows: given a set of noncollinear3D coordinates of reference points Pi = (xi , yi , zi )

t , where:i = 1 . . . n, n ≥ 3, expressed in an object-centered refe-rence frame, the corresponding camera-space coordinatesqi = (xi

′, yi′, zi

′)t , are related by a rigid transformation as:qi = R Pi + T , where R and T are, respectively, the rota-tion matrix and the translation vector. The reference pointsPi are projected to the image plane. Let the image pointpi = (ui , vi )

t , be the projection of Pi on the normalizedimage plane. Under the idealized pinhole imaging model,pi , qi , and the center of projection are collinear. This fact isexpressed by the following equation:

ui = r t1 Pi + tx

r t3 Pi + tz

(6)

vi = r t2 Pi + ty

r t3 Pi + tz

(7)

and

pi = 1

r t3 Pi + tz

(R Pi + T ) (8)

Fig. 2 Object-space and image-space collinearity errors

123

368 M. Maidi et al.

The OI algorithm allows to dynamically determine the exter-nal camera parameters using 2D–3D matchings establishedby the 2D fiducials tracking algorithm from the current videoimage [1]. The OI algorithm computes first the object-spacecollinearity error vector [15]:

ei =(

I − V̂i

)(R Pi + T ) (9)

where V̂i is the observed line of sight projection matrix defi-ned by:

V̂i = p̂i p̂ti

p̂ti p̂i

(10)

then, a minimization of squared error is performed:

E (R, T ) =n∑

i=0

‖ei‖2 =n∑

i=0

∥∥∥(

I − V̂i

)(R Pi + T )

∥∥∥2(11)

The OI algorithm converges to an optimum for any set ofobserved points and any starting point [1]. However, in orderto ensure this convergence into the correct pose in a minimumtime, a good pose parameters initialization is required. TheOI algorithm can be initiated using initial rotation guess R0

of R and compute T0. The initial pose (R0, T0) is then used toestablish a set of hypothesized scene points Vi (R0.Pi + T0),which are used to start the first absolute orientation iteration.Although the OI algorithm is globally convergent, it doesnot guarantee that it will efficiently or eventually convergeto the correct solution. Therefore, to ensure the convergenceof the OI algorithm, the analytical pose estimator presentedin Sect. 2.1 is used to compute a good initial guess of poseparameters to converge into an optimum solution (Fig. 3).

Fig. 3 Hybrid orthogonal iteration pose estimation diagram

2.3 Extended Kalman filter algorithm

In this second approach, we used the EKF to estimate posi-tion and orientation of the object according to the cameracoordinate frame. The Kalman filter is a set of mathematicalequations that provides an efficient computational model toestimate the state of a process, in a way that minimizes themean of the squared error [20]. The EKF is applied to non-linear systems with Gaussian zero mean process and measu-rement noise. The process model is given by the followingequation:

xk = f (xk−1, ωk−1) (12)

where xk is the state vector and wk the process noise. Mea-surements of desired variables are made according to theequation:

zk = h (xk−1, νk−1) (13)

where zk is the measurement vector and νk the measurementnoise.

In the first step of the EKF, which is time update, the statevector and the error covariance matrix are predicted usinginitial estimates of x̂k and Pk . Once this step is finished, theywill become the inputs for the measurement update (correc-tion) step. With the updated information, the time update stepprojects the state vector and the error covariance matrix tothe next time step. By doing these two steps recursively, wesuccessfully estimate the state vector that represents the poseparameters.

As described before, the time update projects the systemstate vector x̂ and its covariance matrix from the current timestep k into the next step k + 1. The measurement modelrepresents the relationship between the system state vectorand the camera measurement inputs.

First we need to define the state vector for the EKF. Sinceour goal is to estimate the camera pose, we use the rotationangles and the translation components (ψ, θ, ϕ, tX , tY , tZ ) torepresent the system state. The measurement input is provi-ded by the camera. We have to estimate six variables of thestate vector, the total measurement input is 8 × 1 vector:

z = (u1 u2 u3 u4 v1 v2 v3 v4

)t(14)

Applying the camera perspective model to the 3D points, wehave the following equations:

ui = M1 Pi + M14

M3 Pi + M34(15)

vi = M2 Pi + M24

M3 Pi + M34(16)

Pi represents the 3D point in the object reference frame andMi = (Mi1,Mi2,Mi3) are the components of the perspective

123

Performance study for camera pose estimation 369

projection matrix of the camera, given by:

M =

⎛⎜⎜⎝αur11 + u0r31 αur12 + u0r32 αur13 + u0r33 αutx + u0tz

αvr21 + v0r31 αvr22 + v0r32 αvr23 + v0r33 αvty + v0tz

r31 r32 r33 tz

⎞⎟⎟⎠ (17)

2.3.1 Time update

The time update produces estimates for the state vector, x̂ ,and for the error covariance matrix, P . The equations of pro-jection are given by:

x̂−k+1 = x̂k (18)

P−k+1 = Ak Pk At

k + Qk (19)

where Q represents the covariance matrix of the process noiseand A is the Transition matrix represented by:

A = I6 (20)

2.3.2 Measurement update

The measurement update model relates the state vector tothe measurement vector. The measurement vector is repre-sented by the 2D features points. Based on the knowledge ofthe feature points position in the camera frame, we use theperspective projection model as follows:

ui = f (M, Pi ) (21)

vi = f (M, Pi ) (22)

The measurement function is given by:

zk+1 = h(x̂k

) + νk (23)

h is given by:

h(x̂k

) = M × Pi × xk (24)

and xk is the state vector defined before.To perform the measurement update, first we compute the

Kalman gain:

Kk = P−k H T

k

(Hk P−1

k H Tk + Vk Rk V T

k

)(25)

where{

Hi j = ∂hi∂x j

(x̃k, 0)

Vi j = ∂hi∂ν j

(x̃k, 0)(26)

The estimation is updated with measurement zk :

x̂k = x̂−k + Kk

(zk − h

(x̂−

k , 0))

(27)

Finally, we update the error covariance:

Pk = (I − Kk Hk) P−k (28)

where

H =

⎛⎜⎜⎜⎝

∂h1

∂ψ

∂h1

∂θ

∂h1

∂ϕ

∂h1

∂tx

∂h1

∂ty

∂h1

∂tz

∂h2

∂ψ

∂h2

∂θ

∂h2

∂ϕ

∂h2

∂tx

∂h2

∂ty

∂h2

∂tz

⎞⎟⎟⎟⎠ (29)

The state vector and the error covariance matrix are upda-ted using the sets of measurement input from the camera.Once this step is finished, they will become the inputs for thetime update step. The time update step, then projects the statevector and the error covariance matrix to the next time updatestep. By executing these two steps recursively, we success-fully estimate the rotation angles and the translation vectorof the camera coordinate frame according to the workspacecoordinate frame.

2.3.3 Hybrid extended Kalman filter algorithm

This method is simply the combination of the two algo-rithms presented before: the EKF and the analytical algo-rithm. Indeed, we already said that the EKF problem is theparameters first guesses, so we use the analytical algorithmto initialize the pose values to accurately estimate the EKFstates (Fig. 4).

Fig. 4 Hybrid Kalman filter pose estimation diagram

123

370 M. Maidi et al.

Fig. 5 123Fiducial detectionprocess. (1) contours detection,(2) image smoothing, (3) imagedilatation, (4) contoursapproximation

3 Experimental results

Before starting our evaluation of the four pose estimationalgorithms, we will describe our method of detection andidentification of targets to extract feature points and deter-mine the camera pose.

To estimate the camera pose, it is necessary to have aset of 2D points and their 3D counter parts. These 2D–3Dmatching are determined after detecting and identifying theobject of interest in the image. To extract objects of inter-est from the scene, images are pre-processed into an accep-table form before carrying out any image analysis, in orderto reduce detection error rate. Many operations are appliedto process the image and detect the object shape (Fig. 5). Ouralgorithm of object detection is composed of the followingsteps:

1. Apply Canny filter to detect contours in image [5].2. Smooth the image using a Gaussian filter to eliminate

pixel variations according to the contour segments byjoining the average values.

3. Dilate the smoothed image to remove potential holes bet-ween edge segments.

4. Approximate contour with accuracy proportional to thecontour perimeter.

5. Find the number of object vertices.6. Identify object boundaries as four intersecting lines by

testing collinearity of vertices.7. Find minimum angle between joint edges, if the cosines

of the four angles are near to zero, then, a square is detec-ted.

Finally, only objects with four vertices and right anglesare retrieved and considered as square shapes. Once a square

Fig. 6 Models of fiducials

object is detected, the next step is to identify this object andmatch it with a defined template. Our goal is to design fidu-cials which can be robustly extracted in real time from thescene. Therefore, we use two kinds of square fiducials withpatterns inside (Fig. 6), these fiducials contain a code usedfor template matching.

The internal code of the fiducial is computed by spatialsampling of the 3D fiducial model. Then, we project thesample points on the 2D image using the homography com-puted from the four vertices of the detected fiducial with thefollowing formula:

⎛⎝

susvs

⎞⎠ =

⎛⎝

h11 h12 h13

h21 h22 h23

h31 h32 h33

⎞⎠

⎛⎝

xy1

⎞⎠ (30)

We compute the fiducial corresponding code from thesampling grid, this code is composed of 16 bits and repre-sents the fiducial samples color. The fiducial code can havefour values following the four possible fiducial orientations.The target system must respect a strong constraint, whichis to allow the detection of target orientation. Each targetturned of a quarter of turn has a different code in the identi-fication phase. Thus, targets have four codes following their

123

Performance study for camera pose estimation 371

Fig. 7 Codes corresponding todifferent target orientations

orientations and consequently, the number of target classesis divided by 4 which reduces the number of possible codes(Fig. 7). Moreover, targets should not have a central symme-try to lift the target orientation ambiguities.

We present now, experimental results and a detailed eva-luation of different localization methods. A comparison bet-ween these methods is performed in order to determine theperformances and the weakness for each one. We comparedour hybrid EKF method to the three other algorithms whichare the analytical algorithm, the hybrid OI and the EKF. Thecomparison between these algorithms is carried out accor-ding to the following criterions:

– Execution time– Reconstruction error: measures the pixellic difference

between feature points of the detected target in the imageand the 3D target model projection using the computedpose parameters.

– Generalization error: consists on projecting the targetswhich were not used for pose computation on the imageplan and measure the variation in pixels between the pro-jected points of the 3D models and the correspondingtargets detected in the image.

– Real camera-target distance estimation: it is the differencebetween the evaluation of the estimated distance by thepose algorithm and the real distance given by the robot.

The experimental tests were realized using the followingmaterial configuration:

– Pentium III 1.1 GHz.– 1 GB RAM.– Matrox Meteor II framegrabber.– Sony XC-555 camera.

The camera is calibrated using Zhang method [21]. Intrin-sic parameters are illustrated in Table 1.

Our first analysis concerning time execution of differentalgorithms, shows that the analytical algorithm is the fastestmethod with 19 µs for one pose estimation, the hybrid EKFmakes 112 µs to estimate the same pose, then we have 153 µsfor the hybrid OI and finally, 13, 530 µs are necessary for theEKF to determine pose parameters. So, in term of computa-tion time, we can say that the analytical algorithm is better

Table 1 Intrinsic parametrs of the Sony XC-555P used in experiments

Image size (736 × 571)

Projection parameters Distortion parametersScale factors Radial distortion coefficients

αu 706.1 k1 −0.2279

αv 731.1 k2 0.1479

Optical center projection Tangential distortion coefficients

u0 388.0 p1 −0.0007985

v0 269.6 p2 0.0006245

Table 2 Results on different experiments performed for reconstructionerror

Algorithm Anal. algo. H. OI EKF H. EKF

Reconst. error (pixel) 0.5421 0.4671 3.2050 0.4651

Variance 0.0694 0.0373 0.5272 0.0360

Standard deviation 0.2634 0.1932 0.7261 0.1897

than the other methods unlike the EKF which is very slowand could compromise the visual rendering.

3.1 Reconstruction error

In this experimentation, the camera is moved around the tar-get object, the four algorithms estimate the pose parametersand we evaluate reconstruction error in the image. The fouralgorithms computed 1,400 poses, the error is estimated byreprojecting the object model on the image. For each posecomputation, we reproject the target model on the image andwe measure the deviation between real target corners andthe projected corners. In Table 2, we notice that the hybridEKF is the most stable and accurate method comparing to theother algorithms. From Fig. 8, we see that when the distancebetween fiducials belongs to [0.10, 0.45] m, the analyticalmethod and the hybrid EKF present the lowest reconstruc-tion error, the two algorithms are accurate and stable in thisinterval. However, when this distance becomes greater than0.45 m the hybrid OI is more accurate than the other methods.The reconstruction error is important in the EKF, because thealgorithm did not converge to the optimal solution due to badparameters initialization.

123

372 M. Maidi et al.

Fig. 8 Reconstruction error according to distance between fiducials

Fig. 9 Generalization error according to distance between fiducials

3.2 Generalization error

To determine the generalization error, we used a paper inwhich we printed four square targets with 5 cm side. One ofthe targets is used to compute pose parameters and the threeothers are used for generalization error. This generalizationerror is computed by reprojecting the models of objects whichdid not serve to estimate pose and project them on the image.The obtained results on generalization error are representedin Fig. 9. Table 3 shows that the hybrid EKF and the analyticalmethod present the best performance in term of generaliza-tion error comparing to other algorithms. The overall errorbehavior for these two algorithms is stable and do not presentjitter in images.

3.3 Real camera-target distance estimation

In order to evaluate camera-target distance errors of the dif-ferent algorithms, we use a calibration robot-bench which

Table 3 Results on different experiments performed for generalizationerror

Algorithm Anal. algo. H. OI EKF H. EKF

Gener. error (pixel) 10.9562 17.4712 16.0005 9.8435

Variance 48.0473 93.5641 450.2530 28.9559

Standard deviation 6.9316 9.6729 21.2192 5.3811

moves in two directions X and Y (Fig. 10). The camera ismounted on the robot bench, the target is fixed in the otherside of the bench. This bench allows to control the motionof the robot and compare the distance with the estimatedpose of different algorithms. We sample the robot displa-cement space in order to compute the corresponding posewith the different pose estimators. We have 1,939 robot posi-tions for which each algorithm estimates the pose parametersand computes the distance between the optical center of thecamera and the target.

We have classified the obtained pose results into ten classesand we computed the mean errors and variances of the poseestimation methods. The results are illustrated in Fig. 11 tocompare the generated errors of the real distance given bythe robot (robot position) and the position estimated by thepose algorithms. We notice that analytical method presentsan important mean error compared to other methods, howe-ver its variance is quite small. The hybrid EKF and OI presentbest performances unlike the EKF algorithm which presents alarge variance around its mean error. Figure 12 represents realdistances computed by the robot according to distance esti-mated by the different pose algorithms. Indeed, this evalua-tion determines, with accuracy, the distance error generatedfrom each pose estimator. The interpretation of errors is per-formed by approximating the curves represented in Fig. 12with nonlinear regression for the hybrid OI, EKF and hybridEKF and a quadratic regression for the analytical algorithm.The mean error of the OI algorithm is 0.81% (a mean errorof 8.1 mm for a distance of 1 m). The hybrid OI error is esti-mated to 0.84% while the EKF degenerates and presents amean error of 2.6%. The best value of error is obtained withthe hybrid EKF where it is estimated to 0.72%. We concludethat the hybrid EKF is best real distance estimator. Data arefitted using 1, 939 data points, the used regression parame-ters are represented in Table 4, standard fit of theoretical andexperimental distribution is illustrated in Table 5. In Table 6,the best values for each comparison criterion is highlightedin green whereas the worst are the red ones.

3.4 Virtual object rendering results

Since the pose parameters were determined, we have pro-jected a virtual cube on the detected real target in order toevaluate visually the virtual object rendering stability. In this

123

Performance study for camera pose estimation 373

Fig. 10 Robot bench used fordistance evaluation

experiment, the camera is freely moved around fiducials. Theidentification algorithm detects and track targets in framesand the hybrid EKF estimates position and orientation of thecamera. We can see that virtual objects are well superimpo-sed on the real image (Fig. 13) and they remain laid on thetarget for different camera pose. We have also some resultsof generalization of our algorithm on 3D objects. First, weused our fiducial model to estimate the camera pose and weoverlaid on the 3D object: a cylinder head (Fig. 14). Then,we used the feature points of a 3D box and we reproject the3D model on another box (Fig. 15). These experimental testsproved the capacity and the accuracy of the hybrid EKF in ageneralization task.

4 Discussion

In this study, we compared the performances of four poseestimation algorithms. We evaluated these methods usingan experimental protocol to compute several error sourcesand estimate real distances. We used three iterative methodsdepending on nonlinear optimization and a new analyticalmethod based on direct computation of parameters.

The main accomplishments of this paper are:

– A new identification algorithm of coded fiducials.– A hybrid pose estimation algorithm based on a combina-

tion of analytical and iterative method.– A comparison of different methods in term of execution

time, errors and real distance estimation.

Previous published papers on vision-based pose estima-tion used direct or iterative methods and some authors wereinterested in comparison and evaluation of these methods.DeMenthon and Davis [8] have compared several approxi-mate methods for the perspective three point problem to solvethe pose estimation parameters. A synthesis work was reali-zed in [2], the authors developed a fast and accurate analyticalpose estimation algorithm for a limited numbers of points orlines. Their method was tested and compared to linear algo-rithms and also some iterative methods.

In Table 7, we compare different pose estimation methods,where we precise the year, the nature of the algorithm andthe condition of application.

We quantitatively analyzed the tracking and localizationerrors and we proposed a new method combining both

123

374 M. Maidi et al.

Fig. 11 Mean errors and variances of the classified data

Fig. 12 Evaluation of measured distances according to real distances

numerical and analytical algorithm to overcome drawbacksof each method and enhance accuracy and robustness of ourhybrid algorithm.

Indeed, the two kinds of algorithms have advantages andshortcomings. Iterative methods are accurate but suffer fromcomputation expense due to bad initialization and localminima problems. On the other side, the analytical methods

Table 4 Parameters of regressions used for fitting

Algorithm Final set of parameters Asymptotic standard error

Anal. algo. a1 = 1.0081 0.0009

b1 = 0.0068 0.0007

H. OI a2 = 1.0084 0.0004

b2 = 0.0068 0.0005

EKF a3 = 2.6034 0.2849

b3 = −1.1006 0.3488

H. EKF a4 = 1.0071 0.0003

b4 = 0.0087 0.0003

Table 5 Standard fit of theoretical and experimental distribution

RMS of residuals Variance of residuals

Anal. algo. 0.0069 4.8205e-05

H. OI 0.0050 2.5045e-05

EKF 3.3241 11.0498

H. EKF 0.0033 1.0808e-05

Table 6 Results on the different experiments performed for distanceestimation

Algorithm Anal. algo. H. OI EKF H. EKF

Mean error (m) 0.0168 0.0057 0.0030 0.0046

Variance 6.9574e-6 7.0165e-6 0.3567 3.6445e-6

Standard deviation 0.0026 0.0026 0.5973 0.0019

Time (µs) 660 21420 1894200 15680

are fast but their major disadvantage is the lack of accu-racy. We exploit the complementary nature of these two poseestimation methods to compensate for the weakness in eachtype of methods. Finally, analysis and experimental resultsdemonstrate the system’s effectiveness.

5 Conclusion

In this paper, we presented four algorithms intended to solvethe problem of camera localization. First, we proposed analgorithm for detection and identification of the object ofinterest in image, then, we developed a new approach forpose estimation based on the combination of two methodswhich are the analytical algorithm and the EKF. The analy-tical method computes the first guesses of pose parameters,these parameters are used after to initialize a second pose esti-mator based on the EKF. This method, we entitled the hybridEKF. Thereafter, we performed a comparative study of seve-ral methods of camera pose estimation using coded targets.We evaluated the performances of our localization systemcomparing to known algorithms. This study related to the

123

Performance study for camera pose estimation 375

Fig. 13 Virtual object overlayin a tracking sequence usingvarious fiducials

Fig. 14 Virtual object overlay on generalized 3D object using coplanarfiducial

Fig. 15 Virtual object overlay on generalized 3D object using 3Dmodel

following performances criteria: execution time, reconstruc-tion error, generalization error and distance estimation error.The experimentation tests to estimate real distances betweencamera and targets, were realized using a robot bench. The

Table 7 Summarization of pose estimation methods

Method Year Type Application condition

Dhome et al. [9] 1989 Analytical Three lines

Dementhon and 1992 Analytical Three pointsDavis [8]

POSIT 1995 Iterative Four noncoplanarpoints

OI [15] 2000 Iterative Three points

EKF [6] 2002 Iterative Three points

Ansar and 2003 Analytical Four pointsDaniilidis [2]

Didier [10] 2005 Analytical Four coplanar points

Hybrid EKF 2007 Analytical and Four coplanar pointsiterative

obtained results for our algorithm were efficient and robustand proved that our system provides interesting solutions forcamera localization using coded targets. Finally, our systemwas tested to manage augmentations in AR applications, theobtained results of overlaying were accurate.

References

1. Ababsa, F., Mallem, M.: Robust camera pose estimation using2d fiducials tracking for real-time augmented reality systems.In: Proceedings of ACM SIGGRAPH International Conferenceon Virtual-Reality Continuum and its Applications in Industry(VRCAI 2004), pp. 431–435. Nanyang, Singapore (2004)

2. Ansar, A., Daniilidis, K.: Linear pose estimation from points orlines. IEEE Trans. Pattern Anal. Mach. Intell. 25(5), 578–589(2003)

3. Araujo, H., Carceroni, R., Brown, C.: A fully projective formula-tion for lowe’s tracking algorithm. Technical Report 641, Univer-sity of Rochester, New York, USA (1996)

123

376 M. Maidi et al.

4. Horn, B.K.B., Hilden, H., Negahdaripour, S.: Closed-form solutionof absolute orientation using orthonormal matrices. J. Opt. Soc.Am. A 5, 1127–1135 (1988)

5. Canny, J.: A computational approach to edge detection. IEEETrans. Pattern Anal. Mach. Intell. 8(6), 679–698 (1986)

6. Chai, L., Hoff, W.A., Vincent, T.: Three-dimensional motion andstructure estimation using inertial sensors and computer visionfor augmented reality. Presence: teleoper. Virtual Environ. 11(5),474–492 (2002)

7. Chen, J.H., Chen, C.S., Chen, Y.S.: Fast algorithm for robusttemplate matching with m-estimators. IEEE Trans. Signal Pro-cess. 51(1), 36–45 (2003)

8. DeMenthon, D., Davis, L.S.: Exact and approximate solutions ofthe perspective-three-point problem. In: IEEE Trans. Pattern Anal.Mach. Intell., vol. 14, pp. 1100–1105. IEEE Computer Society,Washington DC, USA (1992)

9. Dhome, M., Richetin, M., Lapreste, J.T., Rives, G.: Determinationof the attitude of 3d objects from a single perspective view. In:IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, pp. 1265–1278.Washington DC, USA (1989)

10. Didier, J.Y.: Contributions à la déxtérité d’un système de réalitéaugmentée mobile appliqué à la maintenance industrielle. PhDThesis, Univerité d’Évry, France (2005)

11. Fischler, M., Bolles, R.: Random sample consensus: a paradigmfor model fitting with applications to image analysis and automatedcartography. In: Graphics and Image Processing, vol. 24, pp. 381–395 (1981)

12. Forstner, W.: Reliability analysis of parameter estimation in linearmodels with applications to mensuration problems in computervision. In: Computer Vision, Graphics and Image Processing,vol. 40, pp. 273–310 (1987)

13. Jiang, B., You, S., Neumann, U.: Camera tracking for augmentedreality media. In: IEEE International Conference on Multimediaand Expo (III), pp. 1637–1640 (2000)

14. Lowe, D.G.: Three-dimensional object recognition from singletwo-dimensional image. J. Artif. Intell. 31, 355–395 (1987)

15. Lu, C.P., Hager, G.D., Mjolsness, E.: Fast and globally convergentpose estimation from video images. IEEE Trans. Pattern Anal.Mach. Intell. 22(6), 610–622 (2000)

16. Maidi, M., Ababsa, F., Mallem, M.: Active contours motion basedon optical flow for tracking in augmented reality. In: Proceedingsof the 8th Virtual Reality International Conference, VRIC 2006.Laval, France (2006)

17. More, J.J.: The levenberg-marquardt algorithm, implementationand theory. J. Numer. Anal. 630, 105–116 (1977)

18. Quan, L., Lan, Z.D.: Linear n-point camera pose determination.IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 774–780 (1999)

19. Haralick, R.M., Lee, C., Ottenberg, K., Nolle, M.: Analysis andsolutions of the three point perspective pose estimation problem.In: Proc. IEEE Conf. Computer Vision and Pattern Recognition,pp. 592–598. Maui, Hawaii (1991)

20. Welch, G., Bishop, G.: An introduction to the kalman filter. Tech-nical Report No. TR 95-041, Department of Computer Science,University of North Carolina, USA (2004)

21. Zhang, Z.: Flexible camera calibration by viewing a plane fromunknown orientations. In: International Conference on ComputerVision, vol. 1, p. 666. Corfu, Greece (1999)

123