An empirical evaluation of factors influencing camera calibration accuracy using three publicly...

17
Machine Vision and Applications (2006) 17(1): 51–67 DOI 10.1007/s00138-006-0014-6 ORIGINAL PAPER Wei Sun · Jeremy R. Cooperstock An empirical evaluation of factors influencing camera calibration accuracy using three publicly available techniques Received: 23 November 2004 / Accepted: 3 January 2006 / Published online: 25 February 2006 C Springer-Verlag 2006 Abstract This paper presents an empirical study investigat- ing the effects of training data quantity, measurement error, pixel coordinate noise, and the choice of camera model, on camera calibration accuracy, based on three publicly avail- able techniques, developed by Tsai, Heikkil¨ a and Zhang. Re- sults are first provided for a simulated camera system and then verified through carefully controlled experiments using real-world measurements. Our aims are to provide sugges- tions to researchers who require an immediate solution for camera calibration and a warning of the practical difficul- ties one may encounter in a real environment. In addition, we offer some insight into the factors to keep in mind when selecting a calibration technique. Finally, we hope that this paper can serve as an introduction to the field for those newly embarking upon calibration-related research. Keywords Camera calibration · Camera parameters · Coordinate transformations · Distortion models · Accuracy evaluation 1 Introduction The decreasing cost of computers and cameras has brought rapid growth in stereo-based applications. In consequence, an ever-increasing population of researchers are coming to depend on camera calibration for their projects. Although camera calibration is a mathematically well- defined problem and numerous methods have been devel- oped to provide solutions, these methods make different sim- plifying assumptions as to how they should be used in prac- tice and which variables need to be measured. It is also not clear how certain factors such as training data quantity, mea- surement error, and the choice of camera model influence W. Sun (B ) · J. R. Cooperstock Department of Electrical and Computer Engineering, Centre for Intelligent Machines, McGill University, 3480 University Street, Montreal, Quebec H3A 2A7, Canada E-mail: {wsun, jer}@cim.mcgill.ca Fax: +1-514-398-7348 calibration accuracy. The effect of noise has been studied by Lavest et al. [1] and Zhang [2]; however, their experiments involved a small volume of 3D space and did not use sep- arate test data to verify the scalability of calibrated results. Moreover, a major practical concern is the degree of effort involved in providing a given camera calibration algorithm with the training data it needs to achieve some required ac- curacy. Using three publicly available techniques, through ex- tensive experimentation with separate training and test data, we conducted an empirical study of the impact of noise, ei- ther in world or pixel coordinates, and training data quantity, on calibration accuracy. We also included a detailed com- parison of various models to determine the relative impor- tance of different distortion components, whether the addi- tion of higher-order radial terms or decentering distortions actually improves calibration accuracy, and if so, to what de- gree. Our contributions are first, to provide a quick answer to researchers who need an immediate camera calibration solution; and second, to give some insight into practical dif- ficulties one may encounter in a real environment, and which factors to keep in mind when looking for a suitable calibra- tion technique. Finally, we hope to offer an introduction to the field for those newly embarking upon calibration related research. The calibration methods chosen for experimentation were developed by Tsai [3], Heikkil¨ a[4] and Zhang [5]. An important reason for this choice is that the source code for the three methods is publicly available, well developed, well tested, and therefore provides a fair comparison. In fact, only open source algorithms were serious candidates for a study of this purpose, because they are open to study and readily integrated into a larger system. Tsai’s method represents a conventional approach that is based on the radial alignment constraint (RAC) and requires accurate 3D coordinate measurement with respect to a fixed reference. Among the conventional calibration methods sur- veyed by Salvi et al. [6], including those developed by Hall et al. [7], Faugeras–Toscani [8], and Weng et al. [9], Tsai’s was reported to exhibit the best performance. His method

Transcript of An empirical evaluation of factors influencing camera calibration accuracy using three publicly...

Machine Vision and Applications (2006) 17(1): 51–67DOI 10.1007/s00138-006-0014-6

ORIGINAL PAPER

Wei Sun · Jeremy R. Cooperstock

An empirical evaluation of factors influencing camera calibrationaccuracy using three publicly available techniques

Received: 23 November 2004 / Accepted: 3 January 2006 / Published online: 25 February 2006C© Springer-Verlag 2006

Abstract This paper presents an empirical study investigat-ing the effects of training data quantity, measurement error,pixel coordinate noise, and the choice of camera model, oncamera calibration accuracy, based on three publicly avail-able techniques, developed by Tsai, Heikkila and Zhang. Re-sults are first provided for a simulated camera system andthen verified through carefully controlled experiments usingreal-world measurements. Our aims are to provide sugges-tions to researchers who require an immediate solution forcamera calibration and a warning of the practical difficul-ties one may encounter in a real environment. In addition,we offer some insight into the factors to keep in mind whenselecting a calibration technique. Finally, we hope that thispaper can serve as an introduction to the field for those newlyembarking upon calibration-related research.

Keywords Camera calibration · Camera parameters ·Coordinate transformations · Distortion models · Accuracyevaluation

1 Introduction

The decreasing cost of computers and cameras has broughtrapid growth in stereo-based applications. In consequence,an ever-increasing population of researchers are coming todepend on camera calibration for their projects.

Although camera calibration is a mathematically well-defined problem and numerous methods have been devel-oped to provide solutions, these methods make different sim-plifying assumptions as to how they should be used in prac-tice and which variables need to be measured. It is also notclear how certain factors such as training data quantity, mea-surement error, and the choice of camera model influence

W. Sun (B) · J. R. CooperstockDepartment of Electrical and Computer Engineering, Centre forIntelligent Machines, McGill University, 3480 University Street,Montreal, Quebec H3A 2A7, CanadaE-mail: {wsun, jer}@cim.mcgill.caFax: +1-514-398-7348

calibration accuracy. The effect of noise has been studied byLavest et al. [1] and Zhang [2]; however, their experimentsinvolved a small volume of 3D space and did not use sep-arate test data to verify the scalability of calibrated results.Moreover, a major practical concern is the degree of effortinvolved in providing a given camera calibration algorithmwith the training data it needs to achieve some required ac-curacy.

Using three publicly available techniques, through ex-tensive experimentation with separate training and test data,we conducted an empirical study of the impact of noise, ei-ther in world or pixel coordinates, and training data quantity,on calibration accuracy. We also included a detailed com-parison of various models to determine the relative impor-tance of different distortion components, whether the addi-tion of higher-order radial terms or decentering distortionsactually improves calibration accuracy, and if so, to what de-gree. Our contributions are first, to provide a quick answerto researchers who need an immediate camera calibrationsolution; and second, to give some insight into practical dif-ficulties one may encounter in a real environment, and whichfactors to keep in mind when looking for a suitable calibra-tion technique. Finally, we hope to offer an introduction tothe field for those newly embarking upon calibration relatedresearch.

The calibration methods chosen for experimentationwere developed by Tsai [3], Heikkila [4] and Zhang [5]. Animportant reason for this choice is that the source code forthe three methods is publicly available, well developed, welltested, and therefore provides a fair comparison. In fact, onlyopen source algorithms were serious candidates for a studyof this purpose, because they are open to study and readilyintegrated into a larger system.

Tsai’s method represents a conventional approach that isbased on the radial alignment constraint (RAC) and requiresaccurate 3D coordinate measurement with respect to a fixedreference. Among the conventional calibration methods sur-veyed by Salvi et al. [6], including those developed by Hallet al. [7], Faugeras–Toscani [8], and Weng et al. [9], Tsai’swas reported to exhibit the best performance. His method

52 W. Sun, J. R. Cooperstock

has been widely used in multi-camera applications [10, 11].Heikkila’s method, also world-reference based, although notincluded in that survey, employs the more general direct lin-ear transformation (DLT) technique by making use of theprior knowledge of intrinsic parameters. It also involves amore complete camera model for lens distortions. Zhang’smethod is a different special case of Heikkila’s formulation.It combines the benefits of world-reference based and auto-calibration approaches, which enables the linear estimationof all supposedly constant intrinsic parameters. This methodis flexible in that either the camera or the planar patterncan be moved freely and the calibration procedure is easilyrepeatable without redoing any measurements. These threemethods of course form part of a large space of potential al-gorithms and can each be generalized in a variety of ways.We have applied the researchers’ names to particular formsof their algorithms as embodied in published code.

2 Calibration methods

This section first provides a brief review of the existing cali-bration literature, which focuses mainly on camera modelingand calibration algorithm development, and then explainsthe mathematical details of the methods we investigate, fol-lowed by our evaluation criteria.

2.1 A brief literature review

Camera calibration has received increased attention in thecomputer vision community during the past 2 decades[6, 12]. According to the nature of training data, existingcalibration methods can be classified as coplanar or non-coplanar. The coplanar approaches perform calibration ondata points limited to a planar surface of a single depth.These methods are either computationally complex or fail toprovide solutions for certain camera parameters, such as theimage center, the scale factor, or lens distortion coefficients[13]. In contrast, the non-coplanar approaches, to which ourstudy is confined, use training points scattered in 3D spaceto cover multiple depths, and do not exhibit such problems.

Non-coplanar approaches fall into a number of cate-gories. World-reference based calibration is a conventionalapproach requiring the 3D world coordinates and corre-sponding 2D image coordinates of a set of feature points[3, 4, 6, 8, 9]. The disadvantage of this approach is thateither a well-engineered calibration object is required,or the environment has to be carefully set up to achieveaccurate 3D measurements. Geometric-invariant-basedmethods use parallel lines and vanishing points as cali-bration features. Although world coordinate measurementis not required, special equipment may be necessary formeasuring certain variables, for example, the ratio of focallengths [14]. Implicit calibration methods [15] have noexplicit camera model and may achieve high accuracy, butare computationally expensive because of the large number

of unknown variables involved and they do not revealthe physical parameters of a camera. Auto-calibration orself-calibration approaches determine camera parametersdirectly from multiple uncalibrated views of a scene bymatching corresponding features between views, despiteunknown camera motion and even changes in some of theintrinsic parameters [16, 17]. Unfortunately, due to thedifficulty of initialization [12], auto-calibration results tendto be unstable [18]. Planar auto-calibration [19] addressesthis initialization difficulty by using multiple views of aplanar scene, taking advantage of the fact that planes aresimple to process and allow reliable and precise feature orintensity-based matching. Sturm–Maybank [20] and Zhang[5] both extend this idea by taking into account the relativegeometric information between planar feature points, withZhang’s method estimating lens distortion coefficients,a factor not included in the former work. According toSturm and Maybank’s singularity analysis [20], degeneratesituations can be easily avoided. Another extension of themultiplanar approach suggests the use of angles and lengthratios on a plane but provides no experimental results [21].

It is worth noting that these multiplanar calibration meth-ods differ from the coplanar methods reviewed by Chatterjeeand Roychowdhury [13]. The methods described here relyon a planar calibration pattern positioned at various orienta-tions. They exploit the coplanarity constraint on the pointsin each sample set to reduce or eliminate the need for 3Dmeasurement, but still sample a 3D region. Similarly, a one-dimensional object can be positioned at various spots and invarious orientations in a 3D space to provide non-coplanarpoints for calibration [22].

The choice of camera model [23], varying mainly in itscharacterization of lens distortion, may also influence cal-ibration results. Tsai used the second order radial distor-tion model [3] while Zhang employed both the second- andfourth- order terms [5]. Heikkila included two further de-centering distortion components [4], while Lavest et al. evenadded the sixth-order radial term [1]. Weng et al. introduceda thin prism distortion whose coefficients could be mergedwith those of the decentering distortion in actual calibration[9]. Most camera models assume zero skewness, i.e., the an-gle between x and y image axes is 90◦ [3, 4, 9], but Lavestet al. [1] and Zhang [5] estimate skewness as a variable.

2.2 Calibration methods for experimentation

After reviewing these calibration methods, we chose Tsai[3], Heikkila [4] and Zhang’s [5] methods for experimenta-tion. The mathematical details are now provided.

2.2.1 Camera model

Tsai, Heikkila and Zhang’s methods all use the pinholeprojective model to map 3D scenes to the 2D camera im-age plane. Despite different formulations for lens distor-tion, the mapping between world and image points proceeds

An empirical evaluation of factors influencing camera calibration accuracy 53

Table 1 Coordinate transformations in Tsai, Heikkila and Zhang’s camera models

Tsai Heikkila Zhang

Xc

Yc

Zc

= R

Xw

Yw

Zw

+ t

Xc

Yc

Zc

= R

Xw

Yw

Zw

+ t

Xc

Yc

Zc

= R

Xw

Yw

Zw

+ t

[xu

yu

]= f

Zc

[Xc

Yc

] [xu

yu

]= f

Zc

[Xc

Yc

] [xu

yu

]= 1

Zc

[Xc

Yc

]

[xu

yu

]= (

1 + k(T)1 r2

)[ xd

yd

] [xu

yu

]= (

1 + k(H)1 r2 + k(H)

2 r4)[ xd

yd

] [xd

yd

]= (

1 + k(Z)1 r2 + k(Z)

2 r4)[ xu

yu

]

where r = √xd

2 + yd2 +

[2p(H)

1 xd yd + p(H)2 (r2 + 2xd

2)

p(H)1 (r2 + 2yd

2) + 2p(H)2 xd yd

]where r = √

xu2 + yu

2

where r = √xd

2 + yd2

xp

yp

1

=

sx/dx 0 cx

0 1/dy cy

0 0 1

xd

yd

1

xp

yp

1

=

sx Dx 0 cx

0 Dy cy

0 0 1

xd

yd

1

xp

yp

1

=

α γ cx

0 β cy

0 0 1

xd

yd

1

through the same four transformations, from world coor-dinates (Xw, Yw, Zw), via camera coordinates (Xc, Yc, Zc),undistorted image coordinates (xu, yu), and distorted imagecoordinates (xd, yd), to pixel coordinates (xp, yp), as shownin Table 1.

The first transformation between world coordinates,(Xw, Yw, Zw), and camera coordinates, (Xc, Yc, Zc), is de-termined by camera position and orientation, expressed bythe extrinsic parameters, R and t, where t = [tx ty tz]T is atranslation vector, and

R =

r1 r2 r3

r4 r5 r6

r7 r8 r9

is a 3 × 3 rotation matrix which can also be expressed interms of the roll-pitch-yaw angles, rθ = [θx θy θz]T, as

R =

cos θy cos θz sin θx sin θy cos θz − cos θx sin θz cos θx sin θy cos θz + sin θx sin θz

cos θy sin θz sin θx sin θy sin θz + cos θx cos θz cos θx sin θy sin θz − sin θx cos θz

− sin θy sin θx cos θy cos θx cos θy

.

The other three transformations from camera coordi-nates, (Xc, Yc, Zc), to pixel coordinates, (xp, yp), are con-trolled by internal camera configuration, expressed by theintrinsic parameters. In Tsai and Heikkila’s models, theseare: the camera’s principal point in pixels or the image cen-ter, (cx , cy); the effective focal length, f ; the image scalefactor, sx ; the (supposedly known) center-to-center distancesbetween adjacent pixels in x and y directions, dx and dy , ortheir inverses, Dx and Dy ; the coefficients of Tsai’s radial

distortion, k(T)1 ; and Heikkila’s radial and decentering dis-

tortions, k(H)1 , k(H)

2 , p(H)1 and p(H)

2 . In Zhang’s model, the in-trinsic parameters are: the image center, (cx , cy); the pixel

focal lengths along the x and y axes, α and β; the radialdistortion coefficients, k(Z)

1 and k(Z)2 ; and the skew parame-

ter, γ , describing the relative skewness of the x and y imageaxes of a camera. The focal lengths in the three models arerelated to each other as follows:α = f sx/dx = f sx Dx , β = f/dy = f Dy . (1)

Note that precise values are not needed for dx , dy andDx , Dy in Tsai and Heikkila’s models as any error will becompensated for by the focal length, f , and the image scalefactor, sx , after calibration.

2.2.2 Calibration algorithms

All the three calibration algorithms estimate an initialclosed-form solution by solving an over-determined system

of linear equations. The initial estimates then proceedthrough a non-linear optimization process, such as the stan-dard Levenberg–Marquardt algorithm as implemented inMinpack [24], to minimize residual error. The objectivefunction used in the optimization step is usually the error ofdistorted pixel coordinates, which will be described in detailin Sect. 2.3, Eq. (10). As the three algorithms differ mainlyin their estimation of the closed-form solution, this sectionoffers further detail as to how the initial solution is obtainedin each algorithm.

In Tsai’s algorithm, given the 3D world and 2D distortedimage coordinates of n � 7 feature points and assuming

54 W. Sun, J. R. Cooperstock

the camera’s image center, (cx , cy), to be the center pixel ofa captured image, a linear equation is formed based on theradial alignment constraint (RAC):

[yd Xw ydYw yd Zw yd −xd Xw −xdYw −xd Zw

]

×

t−1y sxr1

t−1y sxr2

t−1y sxr3

t−1y sx txt−1y r4

t−1y r5

t−1y r6

= xd.(2)

With n such equations, an over-determined linearsystem can be established to solve for the unknowns,sx , tx , ty, r1, . . . , r6. The third row of the rotation matrix,r7, . . . , r9, can be computed as the cross product of the firsttwo rows. With the same n feature points, the focal length,f , and the last element of the translation vector, tz , can beestimated from another over-determined system of n equa-tions of the following form:

[r4 Xw + r5Yw + r6 Zw + ty −dy(yp − cy)

] [ftz

]

= dy(yp − cy)(r7 Xw + r8Yw + r9 Zw),

(3)

where dx and dy are the distances between adjacent pixelsin the x and y directions. The estimates of f and tz and anassumption of zero for the radial distortion coefficient, k(T)

1 ,then serve as an initial estimate for a Levenberg–Marquardtalgorithm. Finally, all parameters derived from linear esti-mation and the image center, (cx , cy), are optimized itera-tively by the Levenberg–Marquardt algorithm to refine thesolution.

Heikkila’s algorithm is based on the direct linear trans-formation (DLT) method [17], which is a general formof Tsai’s algorithm. Ignoring nonlinear distortions, a lin-ear projective transformation, P, maps 3D world points,(Xw, Yw, Zw), to their corresponding pixel points, (xp, yp),up to a scale, µ:

µ[xp yp 1]T = P[Xw Yw Zw 1]T (4)

where P = K[R t], R and t are extrinsic parameters, and

K =

f sx Dx 0 cx

0 f Dy cy

0 0 1

determines the coordinate transformation between distortedimage coordinates, (xd, yd), and pixel coordinates, (xp, yp),

as shown in Table 1. Given n � 5 feature points, P is ob-tained from an over-determined system of n pairs of the fol-lowing equation by eliminating µ:

[Xw Yw Zw 1 0 0 0 0 −xp Xw −xpYw −xp Zw −xp

0 0 0 0 Xw Yw Zw 1 −yp Xw −ypYw −yp Zw −yp

]

×

p1

p2

p3

= 0, (5)

where pi (i = 1, 2, 3) is the i th row vector of P. Setting f toan initial estimate, sx to 1, and (cx , cy) to the center pixel ofan image, the extrinsic parameters R and t are computed by

R = K−1 [p1 p2 p3] , t = K−1p4, (6)

where pi (i = 1, 2, 3, 4) is the i th column vector of P. TheLevenberg–Marquardt algorithm is then applied to find ex-act values of the intrinsic parameters including the distortioncoefficients, k(H)

1 , k(H)2 , p(H)

1 , p(H)2 .

In Zhang’s calibration process, a planar calibration pat-tern is presented to a camera in various orientations and isalways assumed to be on Zw = 0 of a changing world co-ordinate system. The calibration algorithm starts similarlyto the DLT method except that the projective transforma-tion, P, in Eq. (4) now degenerates to a 3 × 3 homography,H:

µ[xp yp 1]T = H [ Xw Yw 1 ]T, (7)

where

H = K[r1 r2 t], K =

α γ cx

0 β cy

0 0 1

determines the coordinate transformation between distortedimage coordinates, (xd, yd), and pixel coordinates, (xp, yp),as shown in Table 1, and ri (i = 1, 2, 3) is the i th columnvector of the rotation matrix, R. Hence, for each view of thepattern, a homography, H, is estimated from n feature pointson the pattern as in Eq. (5) and refined by the Levenberg–Marquardt algorithm. Based on the constraint that the imageof the two circular points must lie on the image of the abso-lute conic, the proof of which can be found elsewhere [17],a linear equation pair is formed as follows:

[v12

T

(v11

T − v22T)T

]b = 0, (8)

where vij = [hi1h j1 hi1h j2+hi2h j1 hi2h j2 hi3h j1+hi1h j3 hi3h j2 +hi2h j3 hi3h j3]T, [hi1 hi2 hi3]T is thei th column vector of a homography, H, and b =[b11 b12 b22 b13 b23 b33]T is a 6D vector representation of thesymmetric matrix

An empirical evaluation of factors influencing camera calibration accuracy 55

B = λ K−T K−1 = λ

b11 b12 b13

b12 b22 b23

b13 b23 b33

= λ

1

α2− γ

α2β

cyγ − cxβ

α2β

− γ

α2β

γ 2

α2β2+ 1

β2−γ (cyγ − cxβ)

α2β2− cy

β2

cyγ − cxβ

α2β−γ (cyγ − cxβ)

α2β2− cy

β2

(cyγ − cxβ)2

α2β2+ c2

y

β2+ 1

, (9)

where λ is an arbitrary scale factor. Given m ≥ 3 viewsof the pattern, i.e., the m homographies obtained above, anover-determined system of m pairs of Eq. (8) can be estab-lished to obtain the vector, b, from which the intrinsic pa-rameters can be extracted. The extrinsic parameters with re-spect to each orientation of the pattern plane are computedby Eq. (6). Finally, the Levenberg–Marquardt algorithm op-timizes the result while taking into account the radial distor-tions, k(Z)

1 , k(Z)2 .

2.3 Evaluation of calibration accuracy

In our experiments, a camera is calibrated by Tsai, Heikkilaand Zhang’s methods using a set of training data,1 and thenvalidated with a neutral test set covering a wide distancerange. Some well developed, freely available calibrationtoolboxes are used as implementations of the three algo-rithms in order to achieve a fair comparison: Reg Willson’scode2 for Tsai’s algorithm, Janne Heikkila’s toolbox3 forHeikkila’s algorithm, and Jean-Yves Bouguet’s toolbox4

for Zhang’s algorithm. For evaluating both training andtesting accuracies, four of the most frequently used methods[6] were adopted based on their applicability to the singlecamera calibration case.

The error of distorted pixel coordinates, Ed, is measuredby computing the discrepancy between estimated pixel co-ordinates, (xpi , ypi ), projected from measured world coor-dinates by the camera model with lens distortions, and ob-served pixel coordinates, (xpi , ypi ), obtained from capturedimages:

Ed = 1

n

n∑i=1

√(xpi − xpi )2 + (ypi − ypi )2, (10)

where n is the number of feature points.The error of undistorted pixel coordinates, Eu, is

measured by computing the discrepancy between estimatedundistorted pixel coordinates, (xupi , yupi ), projected frommeasured world coordinates without lens distortions,and observed undistorted pixel coordinates, (xupi , yupi ),computed by removing distortions from observed pixel

1 In real data experiments, training sets are acquired separately foreach of the three methods according to their varying requirements.

2 http://www-2.cs.cmu.edu/afs/cs.cmu.edu/user/rgw/www/TsaiCode.html

3 http://www.ee.oulu.fi/˜jth/calibr/4 http://www.vision.caltech.edu/bouguetj/

calib_doc

coordinates:

Eu = 1

n

n∑i=1

√(xupi − xupi )2 + (yupi − yupi )2. (11)

The distance with respect to the optical ray, Eo,is measured between 3D points in camera coordinates,(Xci , Yci , Zci ), and the optical rays back-projected from thecorresponding undistorted image points on the camera im-age plane, (xui , yui ). For Tsai and Heikkila’s models, Eo isexpressed as:

E (T H)o

= 1

n

n∑i=1

√(Xci− xui · t)2 + (Yci− yui · t)2 + (Zci− f · t)2,

t = (Xci xui + Yci yui + Zci f )/(

x2ui + y2

ui + f 2); (12)

and for Zhang’s model,

E (Z)o

= 1

n

n∑i=1

√(Xci − xui · t)2 + (Yci − yui · t)2 + (Zci − t)2,

t = (Xci xui + Yci yui + Zci )/(

x2ui + y2

ui + 1). (13)

These three measurements are intuitive but sensitive todigital image resolution, camera field-of-view, and object-to-camera distance. Normalized calibration error (NCE),proposed by Weng et al. [9] overcomes this sensitivityby normalizing the discrepancy between estimated andobserved 3D points with respect to the area each back-projected pixel covers at a given distance from the camera.(See Fig. 1.) The NCE is calculated as follows:

En = 1

n

n∑i=1

[(Xci − Xci )

2 + (Yci − Yci )2

Z2ci (α

−2 + β−2)/12

]1/2

, (14)

where (Xci , Yci , Zci ) represent 3D camera coordinates asestimated by back-projection from 2D pixel coordinates todepth Zci and (Xci , Yci , Zci ) represent observed 3D cameracoordinates computed from measured 3D world coordinates.In Tsai and Heikkila’s methods, the values of α and β canbe calculated using Eq. (1).

56 W. Sun, J. R. Cooperstock

Fig. 1 Back-projection of a pixel to the object surface. The back-projected area on the object surface is represented by a pixel rectangleat the center depth. The error between the back-projected area and thepixel rectangle is measured to assess the calibration accuracy. Basedon Fig. 6 of Weng et al. [9]

3 Computer simulations

Our simulated camera had the following properties, cho-sen based on empirical data: center-to-center distances be-tween adjacent pixels in x and y directions of dx =1/Dx =0.016 mm and dy =1/ Dy = 0.010 mm, an image scale fac-tor sx = 1.5, and an effective focal length f = 8 mm, result-ing in pixel focal lengths of α=750 pixels and β =800 pix-els. A second-order radial distortion was simulated with thecoefficients k(Z)

1 = −0.32 mm−2 and k(Z)2 = 0 in Zhang’s

model, or equivalently, k(T)1 = k(H)

1 = 0.005709 mm−2 and

k(H)2 = p(H)

1 = p(H)2 = 0 in Tsai and Heikkila’s models.

The skew parameter, γ , was set to 0. The image resolu-tion was 512 × 512 pixels with the image center at (cx , cy)= (264, 280) pixels.

The training points of all the three methods, covering30–55 cm from the simulated camera, were obtained fromthe grid corners of 20 cm × 20 cm simulated checkerboardpatterns, placed at 16 different orientations in front of thevirtual camera at a 45◦ angle with respect to the imageplane.5 The number of grid corners contained in a simu-lated pattern varies according to the requirements of eachexperiment.

The 4,108 testing points, covering a distance of 10–300 cm, were generated by sampling a 3 m × 3 m × 3 mcubic space at intervals of 10 cm in all three dimensions andkeeping points visible to a camera located at the center ofone face of the cube.

3.1 Effect of noise on calibration accuracy

We simulated 16 views of a 10 × 10 checkerboard pattern togenerate 1,600 training points. Different levels of Gaussiannoise were added to study its effect on calibration accuracy.

5 The number of orientations and the angle of the pattern plane werechosen according to Zhang [2], in which the best performance wasreported with more than 10 orientations and an angle near 45◦.

3.1.1 Calibration accuracy vs. pixel coordinate noise

Figure 2 shows the decrease in training and testing accura-cies of all tested methods as pixel noise increases.6 How-ever, Zhang’s algorithm was found to be more sensitive topixel coordinate noise and hence more dependent on high-accuracy calibration feature detection than either Tsai orHeikkila’s.

3.1.2 Calibration accuracy vs. world coordinate noise

Figure 3 illustrates the expected decrease of calibration ac-curacy as world coordinate noise increases. Zhang’s cal-ibration error was, again, the highest for the same noiserange. While these results might at first seem discouragingfor Zhang’s method, it is important to note that unlike theother algorithms, which require absolute coordinate mea-surement with respect to a fixed world reference, Zhang’srelative world coordinate measurement allows for a minimalequipment requirement, which, in practice, can simply be alaser printed checkerboard pattern on a letter sized sheet. Asa result, most setups can easily achieve measurement noiselevels of σ < 0.5 mm, obtaining a reasonably high calibra-tion accuracy, as shown in the last column of Fig. 3. In con-trast, Tsai and Heikkila’s world coordinate measurementsare inherently prone to higher noise; the fact that their test-ing errors increased significantly when σ > 2 mm poses astrong constraint on real-world setups for accurate measure-ment. Although largely similar, Heikkila’s algorithm per-formed better than Tsai’s at higher noise level, but slightlyworse at lower noise levels. This may be due to Heikkila’suse of fixed empirical values to initialize the linear estima-tion of intrinsic parameters, which, while more robust tomeasurement errors, failed to exploit the potential of accu-rate measurements.

3.2 Effect of training data quantity on accuracy

As the effect of the number of orientations and the angle ofthe pattern plane on calibration accuracy has been studiedby Zhang [2],5 this section explores the effect of the num-ber of feature points per pattern on calibration accuracy. Inthe absence of noise, a small number of training points canyield 100% accuracy. As some existing corner detection al-gorithms claim an accuracy of 0.1 pixels, Gaussian noise ofzero mean and σ = 0.1 pixels was added to the pixel coor-dinates of training data. The average results of 10 trials areillustrated in Fig. 4.

As each checkerboard pattern was viewed at 16 differ-ent orientations, a pattern of 3 × 3 grid corners could pro-duce 144 training points, sufficient for Tsai’s algorithm toachieve reasonable accuracy; however, the calibration errorstabilized further when more than 256 training points were

6 All four evaluation methods described in Sect. 2.3 were used andresults demonstrated very similar trends. Due to limited space, only thenormalized calibration error (NCE) is plotted.

An empirical evaluation of factors influencing camera calibration accuracy 57

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

12

standard deviation of Gaussian noise (pixel)

norm

aliz

ed c

alib

ratio

n er

ror

Tsai’s training accuracy

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

12

standard deviation of Gaussian noise (pixel)

norm

aliz

ed c

alib

ratio

n er

ror

Heikkila’s training accuracy

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

12

standard deviation of Gaussian noise (pixel)

norm

aliz

ed c

alib

ratio

n er

ror

Zhang’s training accuracy

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

12

standard deviation of Gaussian noise (pixel)

norm

aliz

ed c

alib

ratio

n er

ror

Tsai’s testing accuracy

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

12

standard deviation of Gaussian noise (pixel)

norm

aliz

ed c

alib

ratio

n er

ror

Heikkila’s testing accuracy

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

12

standard deviation of Gaussian noise (pixel)

norm

aliz

ed c

alib

ratio

n er

ror

Zhang’s testing accuracy

Fig. 2 Effect of pixel coordinate noise on calibration accuracy. Gaussian noise of µ = 0, σ of 0.02–1.0 pixels added to training data pixelcoordinates. No noise added to test data. Note that zero testing error was achieved when adding zero noise

0 1 2 3 4 50

20

40

60

80

100

standard deviation of Gaussian noise (mm)

norm

aliz

ed c

alib

ratio

n er

ror

Tsai’s training accuracy

0 1 2 3 4 50

20

40

60

80

100

standard deviation of Gaussian noise (mm)

norm

aliz

ed c

alib

ratio

n er

ror

Heikkila’s training accuracy

0 1 2 3 4 50

20

40

60

80

100

standard deviation of Gaussian noise (mm)

norm

aliz

ed c

alib

ratio

n er

ror

Zhang’s training accuracy

0 1 2 3 4 50

20

40

60

80

100

standard deviation of Gaussian noise (mm)

norm

aliz

ed c

alib

ratio

n er

ror

Tsai’s testing accuracy

0 1 2 3 4 50

20

40

60

80

100

standard deviation of Gaussian noise (mm)

norm

aliz

ed c

alib

ratio

n er

ror

Heikkila’s testing accuracy

0 1 2 3 4 50

20

40

60

80

100

standard deviation of Gaussian noise (mm)

norm

aliz

ed c

alib

ratio

n er

ror

Zhang’s testing accuracy

Fig. 3 Effect of world coordinate noise on calibration accuracy. Gaussian noise of µ = 0, σ of 0.2–5.0 mm added to 3D world coordinates ofTsai and Heikkila’s training data and 2D world coordinates of Zhang’s training data, as Zhang’s algorithm assumes all feature points fall on theZw = 0 plane. No noise added to test data

58 W. Sun, J. R. Cooperstock

0 100 200 300 400 500 6000.5

0

0.5

1

1.5

2

2.5

3

3.5

number of points per pattern

norm

aliz

ed c

alib

ratio

n er

ror

Tsai’s training accuracy

0 100 200 300 400 500 6000.5

0

0.5

1

1.5

2

2.5

3

3.5

number of points per pattern

norm

aliz

ed c

alib

ratio

n er

ror

Heikkila’s training accuracy

0 100 200 300 400 500 6000.5

0

0.5

1

1.5

2

2.5

3

3.5

number of points per pattern

norm

aliz

ed c

alib

ratio

n er

ror

Zhang’s training accuracy

0 100 200 300 400 500 6000.5

0

0.5

1

1.5

2

2.5

3

3.5

number of points per pattern

norm

aliz

ed c

alib

ratio

n er

ror

Tsai’s testing accuracy

0 100 200 300 400 500 6000.5

0

0.5

1

1.5

2

2.5

3

3.5

number of points per pattern

norm

aliz

ed c

alib

ratio

n er

ror

Heikkila’s testing accuracy

0 100 200 300 400 500 6000.5

0

0.5

1

1.5

2

2.5

3

3.5

number of points per pattern

norm

aliz

ed c

alib

ratio

n er

ror

Zhang’s testing accuracy

Fig. 4 Effect of training data quantity on calibration accuracy. Training data generated from 16 views of checkerboard patterns containingbetween 3 × 3 and 24 × 24 grid corners. Gaussian noise of µ = 0, σ = 0.1 pixels added to training data pixel coordinates. No noise added totest data

used. Although more robust with limited training data quan-tities, Heikkila’s results demonstrated a slightly inferior per-formance to Tsai’s, which might be explained by the lownoise level in training data as suggested in the previous sec-tion. Zhang’s calibration error was again higher than thoseof the other two methods with small training sets, as theformer is more sensitive to noise. However, increasing thenumber of training points per pattern alleviates this sensitiv-ity, resulting in similar accuracies to Tsai’s algorithm whenemploying more than 200 training points per pattern. Wealso notice in Fig. 4 that testing errors exhibit higher stan-dard deviation with Tsai and Heikkila’s algorithms than withZhang’s. This is likely due to the fact that the former twotreated each feature point independently whereas the lattertook advantage of the coplanar constraint between featurepoints of each view, thus compensating for the inconsisten-cies of noisy training points.

3.3 Effect of distortion model on calibration accuracy

Radial and decentering distortions [23] are the most com-mon distortions modeled in camera calibration and can beexpressed as follows:

[xd

yd

]= (1 + k1r2 + k2r4 + k3r6 + · · · )

[xu

yu

]

+[

2p1xu yu + p2(r2 + 2x2

u

)

p1(r2 + 2y2

u

) + 2p2xu yu

](15)

where ki (i = 1, 2, . . .) and p1, p2 are radial and decenter-ing distortion coefficients, and r =√

x2u + y2

u .In this experiment, five types of cameras were simulated,

each corresponding to a different distortion characteristicthat consists of the first n low-order radial distortion termswith or without the two decentering terms, Rn (n = 1, 2)and RnD2 (n = 1, 2, 3). The simulated coefficients, listedin Table 2, were chosen from empirical data. All the re-maining camera parameters were the same as previously de-scribed. Each of the five simulated cameras was calibratedby Zhang’s algorithm combined, in turn, with each of thefive distortion models, Rn (n = 1, 2) and RnD2 (n = 1, 2, 3),with skew parameter set to zero.

Figure 5 shows, for each simulated camera, the calibra-tion error versus the distortion models used for calibrationon a large low-noise training set. The results indicatethat high calibration accuracy is obtained provided thatthe distortion model assumed in calibration includes allthe distortion components of the camera, although the

Table 2 Distortion coefficients of simulated cameras

DistortionCoefficients R1 R2 R1D2 R2D2 R3D2

k1 (mm−2) −0.3 −0.3 −0.3 −0.3 −0.3k2 (mm−4) 0 0.15 0 0.15 0.15k3 (mm−6) 0 0 0 0 0.1p1 (mm−2) 0 0 0.02 0.02 0.02p2 (mm−2) 0 0 0.015 0.015 0.015

An empirical evaluation of factors influencing camera calibration accuracy 59

R1 R2 R1D2 R2D2 R3D20.01

0.1

1

10

100

simulated camera R1

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D20.01

0.1

1

10

100

simulated camera R1

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D20.01

0.1

1

10

100

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

simulated camera R2

R1 R2 R1D2 R2D2 R3D20.01

0.1

1

10

100

simulated camera R2

distortion model used in calibrationno

rmal

ized

cal

ibra

tion

erro

r (lo

g)

R1 R2 R1D2 R2D2 R3D20.01

0.1

1

10

100

simulated camera R1D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D20.01

0.1

1

10

100

simulated camera R1D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D20.01

0.1

1

10

100

simulated camera R2D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D20.01

0.1

1

10

100

simulated camera R2D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D20.01

0.1

1

10

100

simulated camera R3D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D20.01

0.1

1

10

100

simulated camera R3D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

Fig. 5 Calibration accuracy vs. distortion model. Trained from 16 views of 20 × 20 pattern with Gaussian noise of µ = 0, σ = 0.1 pixels addedto pixel coordinates. No noise added to test data. Logarithmic scale used for y-axis. Measured with Zhang’s algorithm

60 W. Sun, J. R. Cooperstock

R1 R2 R1D2 R2D2 R3D21

10

100

simulated camera R1

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D21

10

100

simulated camera R1

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D21

10

100

simulated camera R2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D21

10

100

simulated camera R2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D21

10

100

simulated camera R1D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D21

10

100

simulated camera R1D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D21

10

100

simulated camera R2D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D21

10

100

simulated camera R2D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D21

10

100

simulated camera R3D2

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

R1 R2 R1D2 R2D2 R3D21

10

100

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror (

log)

simulated camera R3D2

Fig. 6 Testing accuracy vs. distortion model. Trained from 16 views of 10 × 10 (left) and 20 × 20 (right) patterns with Gaussian noise of µ = 0,σ = 0.5 pixels or 0.5 mm added to pixel and world coordinates. No noise added to test data. Logarithmic scale used for y-axis. Measured withZhang’s algorithm

An empirical evaluation of factors influencing camera calibration accuracy 61

Table 3 Simulated camera R2D2 and calibration results using different distortion models

Simulated Distortion models used in calibrationCamera parameters R2D2 R1 R2 R1D2 R2D2 R3D2

α (pixels) 750 754.36 754.32 749.46 749.98 749.98β (pixels) 800 802.98 802.88 799.51 799.98 799.98cx (pixels) 264 220.06 224.67 263.80 263.82 263.82cy (pixels) 280 218.50 217.23 279.94 279.95 279.95k1 (mm−2) −0.3 −0.1422 −0.1019 −0.2812 −0.3005 −0.3001k2 (mm−4) 0.15 – −0.1460 – 0.1538 0.1472k3 (mm−6) 0 – – – – 0.03055p1 (mm−2) 0.02 – – 0.01995 0.01996 0.01996p2 (mm−2) 0.015 – – 0.01497 0.01497 0.01497

Normalized training 188.25 166.44 0.6179 0.5699 0.5701calibration error testing 196.14 174.20 0.6537 0.5102 0.5113

sixth-order radial term does not benefit accuracy. Moreover,adding higher order radial terms affects the estimation oflower order terms, as can be observed in Table 3. Thiscorrelation between radial distortion components may,unfortunately, degrade calibration performance when only alimited amount of noisy training data is available. Figure 6illustrates the same experiment using noisy training data.The addition of higher order radial terms yielded a highererror, especially with small training sets, as shown in theleft column. Nonetheless, including the two decentering dis-tortion components generally guaranteed a high calibrationaccuracy for a camera with unknown lens distortions.

4 Real data experiments

The real data experiments were carried out in our SharedReality Environment (SRE), a laboratory space equippedwith three vertical projection screens, each approximately2.5 m × 1.8 m and raised about 0.7 m above the ground,semi-enclosing a space of 5.4 m2 as shown in Fig. 7. A3Com U.S. Robotics BigPicture Camera with fixed focallength was placed along one screen at a height of 0.5 m,facing the other two adjacent screens. The image resolutionwas 640 × 480 pixels. To investigate the influence ofexperimental setup, two configurations, the casual setupand the elaborate setup, were studied.

Fig. 7 A node of the Shared Reality Environment

4.1 Casual setup

The training data for Tsai and Heikkila’s calibration was ob-tained from 600 grid corners of checkerboard patterns of8×8 cm squares, projected onto the two visible screens, as inFig. 8a.7 Assuming that the projectors were accurately cali-brated and the patterns were projected as rectangular shapes,the 3D world coordinates of the four pattern corners on eachscreen were measured according to a fixed world referencesystem referred to as SRE coordinates, and the remainingpoints were interpolated. Due to the calibration error be-tween projectors and screens, the limited image resolutionof the projected pattern, and the non-rigid material of thescreens, verifying with a few interior points indicated an av-erage interpolation error of 4.1 mm, equivalent to approxi-mately 0.28% of the pattern size. This data set covered a dis-tance of 200–325 cm from the camera and will be referredto as Screen Data in the following text.

The training data for Zhang’s calibration was generatedby printing checkerboard patterns of 8 × 6, 12 × 9, 15 × 14,and 20 × 20 grid corners onto letter sized sheets. Each wasattached to a rigid card8 and viewed at 16 different orienta-tions at roughly 45◦ with respect to the camera image plane,as explained in Sect. 3 and demonstrated in Fig. 8b. Thisproduced four data sets of 768–6400 points. The 2D relativeworld coordinates of these points were measured with re-spect to one of the four corners of each pattern, i.e. the originof a changing world reference system. The four corners weremeasured first and the interior points interpolated due to theregularity of the printed pattern. As our ruler was accurateonly to 1 mm, we assume a maximum measurement errorof 0.5 mm, approximately 0.29% of the pattern size. These

7 Due to the difficulty of achieving 3D measurements of low errorson an arbitrarily positioned small checkerboard pattern, we did not use16 views of a hand-held checkerboard pattern for Tsai and Heikkila’straining data collection as in the simulation experiments, Sect. 3. In-stead, we used large checkerboard patterns projected on big screens,which is likely to produce highly accurate measurements to the advan-tage of Tsai and Heikkila’s calibration.

8 According to Zhang’s study [2], the effect of systematic non-planarity can be ignored in our experiments.

62 W. Sun, J. R. Cooperstock

Fig. 8 A demonstration of casual setup for generating a Tsai and Heikkila’s training data, b Zhang’s training data, and c test data

0 100 200 300 400 500 6000

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Tsai’s training accuracy

0 100 200 300 400 500 6000

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Heikkila’s training accuracy

0 2000 4000 60000

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Zhang’s training accuracy

0 100 200 300 400 500 6000

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Tsai’s testing accuracy

0 100 200 300 400 500 6000

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Heikkila’s testing accuracy

0 2000 4000 60000

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Zhang’s testing accuracy

Fig. 9 Effect of training data quantity on calibration accuracy in casual setup. Admittedly, more training points were used for Zhang’s calibrationthan for Tsai and Heikkila’s. However, as indicated by the simulation results in Fig. 4, there was no accuracy improvement in Tsai and Heikkila’sresults beyong 256 training samples. This is also evident in the present figure and Fig. 14 of our elaborate setup

four data sets each covered a distance of 25–55 cm fromthe camera and will be referred to as Board48, Board108,Board210, and Board400 Data respectively.

The test set for all the three algorithms was created bymoving a wooden board bearing a 100 cm × 85 cm printedpattern of 17 × 15 checks along a specially constructed railat different locations within the SRE space, as pictured inFig. 8c, to provide 1,872 accurately measured points in theSRE coordinates. This data set covered a distance of 100–265 cm and will be referred to as Rail Data.

4.1.1 Effect of training data quantity

Tsai and Heikkila’s algorithms were trained using between50 and 600 points, selected from Screen Data to be evenly

distributed across the screens, and then tested on Rail Data.The average results of 10 trials for each quantity of trainingdata are shown in Fig. 9. With Tsai’s algorithm, no signif-icant improvements in testing accuracy were observed be-yond 300 training samples. Similar to the simulation results,Heikkila’s algorithm produced better results than Tsai’swith small training data quantities. The overall lower test-ing errors of Heikkila’s method than Tsai’s suggested thepresence of high noise in the training data, as discussed inSect. 3.1.

Zhang’s algorithm was trained separately on Board48,Board108, Board210, and Board400 Data, and tested on RailData. As Zhang’s extrinsic parameters were calibrated withrespect to a corner point of the calibration pattern, whoseposition in the SRE coordinates varied over different views,

An empirical evaluation of factors influencing camera calibration accuracy 63

Table 4 The best calibration results obtained in casual setup

Cameraparameters Tsaia Heikkilaa Zhangb

Intrinsiccx (pixels) 338.56 319.55 332.39cy (pixels) 240.34 263.12 268.01f (mm) 6.9333 6.7617sx 1.5575 1.5504dx (mm) 0.016 (given)dy (mm) 0.010 (given)Dx (mm−1) 62.5 (given)Dy (mm−1) 100.0 (given)α (pixels) 675.89β (pixels) 695.15γ 0.000533k(T)

1 (mm−2) 0.007121k(H)

1 (mm−2) 0.006477k(H)

2 (mm−4) 0.000333k(Z)

1 (mm−2) −0.3230k(Z)

2 (mm−4) 0.08667p(H)

1 (mm−2) 0.000627p(H)

2 (mm−2) 0.001838

Extrinsic

t (mm)

[ −1982.9−1455.5

2324.9

] [ −1910.2−1535.9

2256.4

] [ −1963.5−1560.0

2269.7

]

rθ (◦)

[153.18

19.4491.25

] [154.7317.6991.81

] [153.0717.3391.45

]

aTrained on Screen DatabTrained on Board210 Data with extrinsic recalibration

while the Rail Data were measured in the fixed SRE coordi-nates, the extrinsic parameters needed to be recalibrated tothe same coordinate system. The training data for this extrin-sic recalibration was generated by placing a printed 12 × 15checkerboard pattern at a location along the rail to producefeature points measured in SRE coordinates. After extrin-sic recalibration, the testing accuracies on Rail Data wereobtained, as shown in Fig. 9. Both training and testing accu-racies increased with the number of training data per patternwith no significant improvement beyond 100 samples perpattern, which is consistent with the simulations. Recallingthat the test data (100–265 cm from the camera) was ob-tained from a much further distance range than the trainingdata (25–55 cm), one might expect large errors in the testresults given the need for extrapolation from training data.However, despite this large discrepancy, Zhang’s algorithmachieved impressive testing accuracies. This suggests thatthe parameters calibrated by Zhang’s algorithm are scalableto test data over a larger range than that of the training data.

Table 4 lists the best calibration results of each al-gorithm from Fig. 9, with the corresponding accuracyshown in Table 6, casual setup. Under our experimentalconditions with the interpolation measurement errors asdescribed previously, Zhang’s algorithm outperformed Tsai

Table 5 Comparing calibration accuracies assuming skewness γ �= 0and γ = 0

AccuracyAccuracy Training Testingevaluation γ �= 0 γ = 0 γ �= 0 γ = 0

2D distorted 0.2776 0.2782 1.0028 0.8959error (pixels)

2D undistorted 0.2898 0.2905 1.0470 0.9395error (pixels)

Distance from 0.1648 0.1653 2.7689 2.5153optical ray (mm)

Normalized calibration 0.7099 0.7118 2.5602 2.3076error

and Heikkila’s by approximately four and three times,respectively under all evaluation measures.

4.1.2 Effect of distortion model

The 3Com U.S. Robotics BigPicture Camera exhibits obvi-ous lens distortions visible in the images of Fig. 8. We stud-ied the effect of distortion model on calibration accuracy.

Skewness. Although included in a linear transformation andnot part of the actual distortion model, skewness should nev-ertheless be considered as a type of distortion. In Zhang’smethod, the skewness was estimated, as expressed by γin Table 4, but was essentially zero. For comparison, cam-era parameters were calibrated by Zhang’s algorithm onBoard210 Data with γ either estimated or fixed at zero.The calibration accuracies are compared in Table 5, show-ing no improvement when estimating γ . This result can alsobe seen in Fig. 10.

Lens distortion. Zhang’s algorithm was integrated sepa-rately with each of the five distortion models described inSect. 3.3 to calibrate camera parameters on Board210 Data.Calibration accuracies are displayed in Fig. 11. While allmodels achieved almost the same training accuracy, thoseconsidering decentering distortions performed marginallybetter in testing. As shown in Fig. 12, all five models suc-cessfully recovered the original image from distortions withlittle difference in the results.

4.2 Elaborate setup

To investigate how much improvement can be realized byincreasing the measurement accuracy of training data andby reducing the discrepancy of distance coverage betweentraining and test set, the rail structure for obtaining thetest data in Sect. 4.1 was used to generate a total of 3,810accurately measured feature points in an attempt to coverthe volume of the camera’s working space within the SRE.From this set, 50–2,000 evenly distributed points, cover-ing 85–245 cm from the camera, were selected as Tsai andHeikkila’s training data and the remaining points, cover-ing the same range, were used as a neutral test set for all

64 W. Sun, J. R. Cooperstock

Fig. 10 Removing distortions from a original image using Zhang’s model assuming b skewness γ �= 0 and c γ = 0 with d their difference, i.e.|(b) − (c)|, which is barely visible. (We use white to denote values of zero difference)

R1 R2 R1D2 R2D2 R3D20

0.5

1

1.5

2

2.5

3

3.5

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror

training accuracy

R1 R2 R1D2 R2D2 R3D20

0.5

1

1.5

2

2.5

3

3.5

distortion model used in calibration

norm

aliz

ed c

alib

ratio

n er

ror

testing accuracy

Fig. 11 Calibration accuracy vs. distortion model used in calibration. Trained from 16 views of a 15 × 14 checkerboard pattern

Fig. 12 Removing lens distortions from a original image using distortion model b R1, c R2, d R1D2, e R2D2 and f R3D2. The differences of b,c, d and e with respect to f are shown in g, h, i, and j, respectively. (We use white to denote values of zero difference)

An empirical evaluation of factors influencing camera calibration accuracy 65

Table 6 Accuracy comparison of Tsai, Heikkila and Zhang’s calibration algorithms

Accuracy

Training Testing

Experiment setup Accuracy evaluation Tsai Heikkila Zhang Tsai Heikkila Zhang

Casual 2D distorted error (pixels) 0.8843 0.4104 0.2776 3.9206 3.2311 1.00282D undistorted error (pixels) 0.9410 0.6677 0.2898 4.0865 3.2949 1.0470Distance from optical ray (mm) 3.5752 2.4291 0.1648 9.9358 7.7337 2.7689Normalized calibration error 2.3150 1.6397 0.7099 10.1184 8.0920 2.5602

Elaborate 2D distorted error (pixels) 0.3498 0.3480 0.1816 0.3445 0.3505 0.63012D undistorted error (pixels) 0.3730 1.0765 0.1907 0.3701 1.1900 0.6856Distance from optical ray (mm) 0.9456 2.5730 0.4135 0.9244 2.8083 1.6926Normalized calibration error 0.9141 2.6484 0.4667 0.9079 2.9267 1.6767

Fig. 13 A demonstration of elaborate setup for generating a Tsai and Heikkila’s training data and test data, b Zhang’s training data

0 400 800 1200 1600 2000

0

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Tsai's training accuracy

0 400 800 1200 1600 2000

0

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Heikkila's training accuracy

0 1000 2000 3000 4000

0

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Zhang's training accuracy

0 400 800 1200 1600 2000

0

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Tsai's testing accuracy

0 400 800 1200 1600 2000

0

5

10

15

20

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Heikkila's testing accuracy

0 1000 2000 3000 4000

0

5

10

15

20Zhang's testing accuracy

number of training points

norm

aliz

ed c

alib

ratio

n er

ror

Fig. 14 Effect of training data quantity on calibration accuracy in elaborate setup

three algorithms, as demonstrated in Fig. 13a. Zhang’s train-ing data was generated in the same manner as described inSect. 4.1 but replacing the letter sized cardboard pattern withthe 100 cm × 85 cm wooden board pattern of the rail struc-ture, producing 16 views of 20–255 planar points covering95–225 cm from the camera, as demonstrated in Fig. 13b.The extrinsic parameters were then recalibrated with respect

to the SRE coordinates by aligning this wooden board pat-tern with the projection screens at a known location on therail.

The training and testing accuracy of all three algorithmsare shown in Fig. 14 and Table 6, elaborate setup. Comparedto the best results obtained in the casual setup, Tsai andHeikkila’s testing errors decreased by approximately 90%

66 W. Sun, J. R. Cooperstock

and 65% while Zhang’s error decreased by only 35% as,understandably, there was essentially no increase in train-ing data accuracy of the latter. However, the distance changein training samples from the casual setup (25–55 cm) to theelaborate setup (95–225 cm) yielded a modest improvementin Zhang’s results, as the test data (85–245 cm) was nowcloser in range to the latter.

5 Conclusion

An empirical study on camera calibration was carried outin the Shared Reality Environment to investigate how suchfactors as noise level, training data quantity, and distortionmodel affect calibration accuracy. Three of the most popu-lar and representative methods, developed by Tsai, Heikkilaand Zhang, were chosen for experimentation on both sim-ulated and real data. Four commonly used criteria wereapplied to evaluate accuracy on separate training and testsets.

Results indicated that the conventional world-referencebased approach, exemplified by Tsai’s method, can achievehigh accuracy when trained on data of low measurement er-ror. However, this requires accurate 3D measurement, typi-cally involving hundreds of samples with respect to a fixedreference system, which is prone to noise, and, as our ex-periments on actual data confirmed, yields a disappointingNCE of 10.1. After a careful and time-consuming setup andmeasurement process, we managed to limit NCE to approxi-mately 0.9, indicating that the back-projected 3D space errordue to the camera parameters was lower, on average, than theerror introduced by quantizing real world 3D coordinates tothe level of individual image pixels [9]. However, we notethat the effort required to achieve this level of accuracy maywell be inordinate for most researchers. Heikkila’s resultsdemonstrated a similar trend despite its slightly greater ro-bustness for small training data quantities and large mea-surement errors.

In contrast, the planar calibration approach, exemplifiedby Zhang’s method, makes efficient use of world informa-tion by taking into account the planar constraint on the cali-bration pattern, and requires neither a laborious measuringtask nor specialized equipment. With a hand-held patternplaced approximately 40 cm from the camera, we obtainedan NCE of around 2.6, suggesting an acceptable calibrationin which the residual error was almost negligible comparedto the pixel quantization error discussed above [9]. Trainingwith a pattern placed closer to the location of the test datayielded improved results, with NCE falling to 1.7.

Although the comparison between the three methods re-quired that the extrinsic parameters obtained by Zhang’smethod be recalibrated with respect to our fixed SRE coor-dinates, this process may not be relevant for stereo or multi-camera calibration, in which a camera reference system canbe used instead of a world reference system. Moreover, thesensitivity of Zhang’s algorithm to noise may be overcomeby adding training points, simply by printing a checkerboard

pattern containing more grid corners. In summary, theseresults demonstrate the flexibility of the multiplanar ap-proach and its suitability for calibration in dynamic envi-ronments.

Our study also included a detailed comparison of dis-tortion models to determine the relative importance of var-ious coefficients given unknown lens distortion. The zero-skewness assumption made in many methods was confirmedas reasonable, at least for the cameras of average qualitythat we tested, and the second-order term was found suf-ficient for modeling radial distortion [6]. Estimating thefourth order radial term may be desirable at low noise lev-els, although including the sixth order term can degradecalibration performance for small noisy training sets. Fora camera with unknown lens distortion, adding decenteringcomponents, in general, increases the likelihood of accuratecalibration.

Acknowledgements This work was supported by a Phase IIIb re-search grant from the Institute for Robotics and Intelligent Systems(IRIS). The authors thank Jianfeng Yin for help on image captur-ing and coordinate measurement, Kaleem Siddiqi, James Clark andStephen Spackman for insightful advice, Don Pavlasek and JozsefBoka for construction of the measurement assembly. The authorsacknowledge the contribution of Reg Willson’s calibration code,Janne Heikkila’s calibration toolbox and Jean-Yves Bouguet’s cal-ibration toolbox. The authors also wish to thank the journal re-viewers for providing numerous helpful suggestions and considerableinsights.

References

1. Lavest, J.-M., Viala, M., Dhome, M.: Do we really need an accu-rate calibration pattern to achieve a reliable camera calibration?In: Proceedings of the European Conference on Computer Vision,pp. 158–174 (1998)

2. Zhang, Z.: A flexible new technique for camera calibra-tion. Technical Report MSR-TR-98-71, Microsoft Research,http://research.microsoft.com/∼zhang/Calib/ (1998)

3. Tsai, R.Y.: A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf TVcameras and lenses. IEEE J. Robotics Automat. 3(4), 323–344(1987)

4. Heikkila, J.: Geometric camera calibration using circular controlpoints. IEEE Trans. Pattern Anal. Machine Intell. 22(10), 1066–1077 (2000)

5. Zhang, Z.: A flexible new technique for camera calibration. IEEETrans. Pattern Anal. Machine Intell. 22(11), 1330–1334 (2000)

6. Salvi, J., Armangue, X., Batlle, J.: A comparative review of cam-era calibrating methods with accuracy evaluation. Pattern Recog-nit. 35, 1617–1635 (2002)

7. Hall, E.L., Tio, J.B.K., McPherson, C.A., Sadjadi, F.A.: Measur-ing curved surfaces for robot vision. Computer 15(12), 42–54(1982)

8. Faugeras, O., Toscani, G.: The calibration problem for stereo. In:Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition, pp. 15–20 (1986)

9. Weng, J., Cohen, P., Herniou, M.: Camera calibration with distor-tion models and accuracy evaluation. IEEE Trans. Pattern Anal.Machine Intell. 14(10), 965–980 (1992)

10. Kanade, T., Rander, P., Narayanan, P.J.: Virtualized reality: con-structing virtual worlds from real scenes. IEEE Multimedia, Im-mersive Telepresence 4(1), 34–47 (1997)

An empirical evaluation of factors influencing camera calibration accuracy 67

11. Raskar, R., Welch, G., Cutts, M., Lake, A., Stesin, L., Fuchs, H.:The office of the future: a unified approach to image-based mod-eling and spatially immersive displays. In: SIGGRAPH 98 Pro-ceedings of the 25th Annual Conference Computer Graphics andInteractive Techniques, pp. 179–188 (1998)

12. Fusiello, A.: Uncalibrated euclidean reconstruction: a review. Im-age Vis. Comput. 18, 555–563 (2000)

13. Chatterjee, C., Roychowdhury, V.P.: Algorithms for coplanar cam-era calibration. Machine Vis. Appl. 12(2), 84–97 (2000)

14. Caprile, B., Torre, V.: Using vanishing points for camera calibra-tion. Int. J. Comput. Vis. 4(2), 127–140 (1990)

15. Wei, G., Ma, S.: A complete two-plane camera calibration methodand experimental comparisons. In: Proceedings of the 4th Interna-tional Conference on Computer Vision, pp. 439–446 (1993)

16. Faugeras, O., Luong, T., Maybank, S.: Camera self-calibration:theory and experiments. In: Proceedings of the European Confer-ence on Computer Vision, pp. 321–334 (1992)

17. Hartley, R., Zisserman, A.: Multiple View Geometry in ComputerVision. Cambridge University Press, Cambridge (2000)

18. Bougnoux, S.: From projective to euclidean space under any prac-tical situation, a criticism of self-calibration. In: Proceedings ofthe 6th International Conference on Computer Vision, pp. 790–796 (1998)

19. Triggs, B.: Autocalibration from planar scenes. In: European Con-ference on Computer Vision, pp. 89–105 (1998)

20. Sturm, P., Maybank, S.: On plane-based camera calibration: a gen-eral algorithm, singularities, applications. In: Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition,pp. 432–437 (1999)

21. Liebowitz, D., Zisserman, A.: Metric rectification for perspectiveimages of planes. In: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, pp. 482–488 (1998)

22. Zhang, Z.: Camera calibration with one-dimensional objects.IEEE Trans. Pattern Anal. Machine Intell. 26(7), 892–899 (2004)

23. Slama, C.C. (ed): Manual of photogrammetry. American Societyof Photogrammetry, 4th edn. Falls Church, VA (1980)

24. More, J.: The Levenberg–Marquardt algorithm, implementation,and theory. In: Watson, G.A. (ed.) Numerical Analysis. Springer-Verlag, New York (1977)

Wei Sun received the B.Eng. de-gree in Electronic Engineering fromShanghai Jiao Tong University in1995 and the M.Sc. degree in Com-puter Science from Fudan Univer-sity in 1998. At present, she is aPh.D. candidate working at Cen-tre for Intelligent Machines, De-partment of Eletrical and ComputerEngineering of McGill University.Her research interests include im-age and video processing, computervision, machine learning and com-puter graphics.

Jeremy R. Cooperstock (Ph.D.Toronto, 1996) is a professor ofElectrical and Computer Engineer-ing, a member of the Centre for In-telligent Machines, and a foundingmember of the Centre for Interdis-ciplinary Research in Music, Mediaand Technology at McGill Univer-sity. Cooperstock is a member of theACM and chairs the AES TechnicalCommittee on Network Audio Sys-tems. Cooperstock’s research inter-ests focus on computer mediation tofacilitate high-fidelity human com-munication and the underlying tech-nologies that support this goal.