Vision-based sensing of UAV attitude and altitude from downward in-flight images

Article

Vision-based sensing of UAV attitude andaltitude from downward in-flight images

Nathir A Rawashdeh1, Osamah A Rawashdeh2 and BelalH Sababha3

Abstract

Autonomous unmanned aerial vehicles (UAVs) often carry video cameras as part of their payload. Outdoor video

captured by such cameras can be used to estimate the attitude and altitude of the UAV by detecting the location of

the horizon in the video frames. This paper presents a video frame processing algorithm for estimating the pitch and roll

of a UAV, as well as its altitude. The frames are obtained from a downward pointing video camera equipped with a fisheye

lens. These open-loop estimates can serve as redundant data used to implement graceful-degradation in the event that

the main closed-loop control sensors fail, or for fault-tolerance purposes to augment inertial sensors for increased

accuracy. The estimated values had a mean error of �0.7 angular degrees for roll and �0.9 angular degrees for pitch,

while the altitude estimation from the video had a mean error of �0.9 meters. The results are presented and compared

to actual attitude and altitude values obtained from a traditional inertial measurement unit and, in the case of altitude

comparison, an absolute air pressure sensor. The algorithm was developed on a personal computer to work at 10 frames

per second and uses only simple image processing functions that can be deployed using open source libraries on

lightweight computing boards capable of image processing.

Keywords

Altitude, aerial images, fisheye, inertial measurement unit, pitch, roll, UAV, vision, wide angle

1. Introduction

Unmanned aerial vehicles typically use inertial meas-urement units (IMUs) containing gyroscopes andaccelerometers to estimate roll and pitch angles andtheir rates of change (Mostafa and Hutton, 2001;Grinstead et al., 2005). In addition, various technol-ogies commonly depending on air pressure measure-ments are used for estimating altitude (De Leo andHagen, 1978; West et al., 1983). Given that manyunmanned aerial vehicles (UAVs) carry a camerafor surveillance or other imaging purposes, it canbe advantageous to extract information from theaerial video to aid in-flight control. This informationmay be redundant with regard to providing fault-tolerance, and can be used to augment onboardinertial measurement units and altitude sensors toincrease their accuracy, or to replace IMUs and alti-tude sensors in order to reduce cost, size, weight andpower consumption. The latter factor may becomemore significant with the increased interest in microaerial vehicles (MAVs).

This paper describes an image processing algorithmapplied to aerial video recorded through a downwardpointing camera equipped with a fisheye lens. Thealgorithm enables the in-flight estimation of UAVpitch, roll and altitude. With the fisheye lens, it ispossible to capture the whole, or part of the earth’shorizon, which will appear curved. Figure 1 showsfour different sample frames captured by the UAV’sfisheye camera indicating different pitch and roll

1Department of Mechatronics Engineering, German Jordanian University,

Amman, Jordan2Department of Electrical and Computer Engineering, Oakland

University, Rochester, MI, USA3Department of Computer Engineering, Princess Sumaya University for

Technology, Amman, Jordan

Corresponding author:

Nathir A Rawashdeh, Department of Mechatronics Engineering, German

Jordanian University, PO Box 35247, Amman, 11180 Jordan.

Email: [email protected]

Received: 30 August 2014; accepted: 10 April 2015

Journal of Vibration and Control

1–15

! The Author(s) 2015

Reprints and permissions:

sagepub.co.uk/journalsPermissions.nav

DOI: 10.1177/1077546315586492

jvc.sagepub.com

by guest on July 4, 2015jvc.sagepub.comDownloaded from

http://jvc.sagepub.com/

angles. For example, in the top right image, the center ofthe Earth is located at the bottom left relative to thecenter of the image. This suggests that the UAV ismoving upwards and rolling to the left. It is also possibleto estimate the vehicle’s yaw angle changes, but notabsolute yaw, through detection and tracking of oneor more reference points/objects in successive videoframes. However, this use of temporal informationrequires more memory and image processing poweron-board and was not implemented in the present work.

The aerial images in Figure 1 show that the vanish-ing line, or the horizon, is severely curved due to the useof the fisheye lens. This is also the case when the UAVis not flying very high. In the top left image, for exam-ple, the whole Earth is visible at a modest altitudearound 50m.

The algorithm detects several points on the (curved)horizon and finds the radius and center of the circle thatbest fits these points, after removing outliers. It must beemphasized that the spherical shape of the Earth andthe flying altitude of the UAV do not adversely affectthe algorithm’s ability to estimate the Earth’s centercoordinates relative to the center of the lens. In add-ition, when the UAV flies level and the Earth is visiblein the fisheye image, the detected Earth center will

coincide with the center of the image frame. In thiscase, the algorithm will properly report zero angularvalues for pitch and roll. IN contrast, when the Earthis only partially visible the estimates of pitch and rollbecome less accurate because parts of the horizon linewill be affected by the uneven distortion of the fisheyelens towards the edge of the lens.

The algorithm estimates the UAV’s altitude throughmeasuring the radius of the Earth in pixels and trans-lating the value into meters. As the altitude increases,the Earth’s radius appears smaller in the image.At some point, when the altitude is too low, it is notpossible to see the horizon through the downwardpointing camera, and it is likely that large objectssuch as building or mountains will obscure the horizon.The algorithm is aware of these limitations and in suchcircumstances reports its inability to detect the horizon.

The processing stages of the algorithm are describedin this paper and the resulting estimates are comparedto actual IMU and altitude sensor data. The algo-rithm’s estimated roll angles had a mean error of�0.7 angular degrees. For pitch, the error was �0.9angular degrees, while the maximum altitude error esti-mated from the video was �3 meters. This latter can beimproved by using a higher-resolution imaging sensor.

Figure 1. Sample frames captured by the fisheye camera showing different roll and pitch of the UAV.

2 Journal of Vibration and Control



2. Related work

Research on vision-based obstacle avoidance andestimation of camera position with respect to the van-ishing line, i.e. the horizon, is well established(Nordberg et al., 2002; Grinstead et al., 2005; Hrabaret al., 2005; Wang et al., 2008; Li and Hai, 2010).Processing video frames in real-time to estimate the dis-tances of objects from a vehicle has also been studied(Betke and Nguyen, 1998; Cavallo et al., 2001; Gurtneret al., 2007). Other related research has been conductedin the area of vision-based systems for vertical take-offand landing (VTOL) aircraft (Saripalli et al., 2002;Mejias et al., 2006). In this latter work a knowntarget geometry, such as the ‘H’ on a helipad, isdetected in video frames and used to control the lateralposition of the vehicle. Gyroscopes and accelerometersare still necessary for estimation and control of atti-tude. In other work, a dual-camera setup was simulatedfor estimation and control of a quadrotor’s six degreesof freedom position (Altug et al., 2003). One camerawas on board the vehicle and a second camera was partof the ground station. Streams from both cameras wereprocessed and the attitude control proved to be success-ful in simulations. Other research has focused onvision-guided stability by means of horizon detectionin the images of a forward-pointing camera (Ettingeret al., 2003); however, the limited angle of view of stocklenses limits the captured horizon area, imposing vari-ous restrictions. Fowers et al. (2007) used corner detec-tion methods for controlling small drift rates that arenot detectable by means of the IMU: in that instance,field programmable gate array boards were used.Angeletti et al. (2008) proposed an approach for anindoor quadrotor position controller based on off-board vision sensing, where a ground station runninga vision algorithm sends the current position and yawangle back to the quadrotor.

Other work has applied optical flow and pattern rec-ognition techniques that focus on using Moire analysisand other feature tracking techniques in aerial imagesto estimate the six-degrees of freedom position of anaerial vehicle (Tournier et al., 2006). In addition, otherwork has modeled the relationship between altitude andchange in distance in the context of fisheye camera cali-bration (Brauer-Burchardt and Voss, 2001; Gutwin andFedak, 2004; Ho et al., 2005; Kannala and Brandt,2006). In contrast, our work used a downward pointingfisheye lens to ensure the capture of the horizon pos-ition in video frames for the estimation of a wide rangeof pitch and roll angles (AbouSleiman et al., 2009;Rawashdeh et al., 2009). This is achieved by relatingthe center of the Earth in the images to the center of thevideo frame. Both points overlap when pitch and rollare zero. The size of the Earth in pixels is obtained from

the detected curved horizon in the fisheye video and isused for altitude estimation after calibrating Earthradius values to actual altitudes, in calibration flights.The solution that has been developed features simpleimage processing and effectiveness in estimating theactual roll and pitch angles, as well as altitude.

3. Background

Reducing the size, weight and aerodynamic drag of aUAV is necessary for reducing power consumption andincreasing flight stability. Using the video framesacquired by the onboard imaging system of a UAVfor attitude and altitude estimation can help augmentattitude data from inertial measurement units, in orderto increase reliability when the UAV flies sufficientlyhigh for the imaging algorithm to engage. Moreover,in the context of aerial imaging and photography, theuse of fisheye lenses featuring wide fields of view (FOV)will can eliminate the need for using additional gimbalsystems for ground imaging. As such, it is argued thatthe study of the characteristics of fisheye lens and otherrelated factors will help improve research in vari-ous fields of study. This section investigates the charac-teristics, features, parameters and distortion of fisheyelens.

3.1. Fisheye lenses

Fisheye lenses are gaining popularity in various appli-cations due to their wide FOV. Unlike conventionallenses with narrow fields of view, an aerial image atsufficient altitude from a fisheye lens will – as alreadynoted – show the whole or part of the Earth’s horizonas well as parts of the sky. The tradeoff to having a wideFOV is the loss of spatial resolution. For a fisheye lens,the spatial resolution is at a maximum at the center andreduces radially outwards from the center of the lens.The application presented in this paper makes use of awide FOV of the lens, while not requiring high spatialresolution. More details on the FOV of fisheye lensesand how image distortion affects the practical use ofthese lenses in this application are presented inSection 4.

3.2. Fisheye lens camera model and calibration

In order to use a fisheye lens camera, it is necessary todefine intrinsic and extrinsic calibration of suchsystems. Intrinsic calibration is the process of estimatingimportant parameters such as the principal point, focallength and the aspect ratio of the camera. Intrinsic cali-bration is necessary for obtaining a correct mapping ofthe lens coordinates onto the image sensor coordinates.Extrinsic calibration, in contrast, is the process of

Rawashdeh et al. 3



resolving the projection model of 3D objects into theimage’s 2D domain. Both methods are explained herein more detail.

3.2.1. Intrinsic calibration. Commonly, the lens manufac-turer will provide the parameters required for intrinsiccalibration; however, if these are not available, or areinsufficient, the following techniques could be con-sidered to deduce the focal length and principal pointto improve the understanding of the properties of thelens. The focal length is defined as the distance betweenthe lens center and the sensor plane. The focal length fcan be found using the following relationship:

f=h ¼ D=H ð1Þ

where h is the object’s image size, f is the focal length, Dis the distance of the object from the lens and H is theobject’s actual size. Because the fisheye lens introducesdistortion, this makes determining its focal length moredifficult. This problem can be overcome by selecting asmall object in the FOV, where the distortion is minimal,and using it to calculate an approximate focal length.

In order to illustrate the principal point, one mustfirst define the principal surface: this is the imaginarysurface that contains all the points where light rays arerefracted inside the lens. The principal point is the inter-section of this imaginary surface with the optical axis.Many different techniques can be used to estimate theprincipal point. The fisheye lens used in this researchproduces an image smaller than the plane of the CCD(charge-couple device) and, as such, the lens projects asymmetrical circular image around the center, leavingthe CCD corners unexposed, i.e. black. The center ofthe circle that is generated is considered to be the prin-cipal point (Shah and Aggarwal, 1996; Schwalbe, 2005;Van Den Heuvel et al., 2006).

3.2.2. Extrinsic calibration. Several projection models forfisheye lenses have been presented in the literature(see, for example, Shah and Aggarwal, 1996;Schwalbe, 2005; Van Den Heuvel et al., 2006). A poly-nomial is normally used to describe these models. Ageneral projection model has the form (Van DenHeuvel et al., 2006):

rð�Þ ¼ a� þ b�2 þ c�3 þ d�4 þ e�5 . . . ð2Þ

where � is the angle between the optical axis and theincoming ray, r is the distance between the image pointand the principal point, and f is the focal length; how-ever, the number of terms is clipped for computationalsimplicity. The coefficients a, b, c, . . . will then besuch that r(�) is monotonically increasing between0 and 90 degrees.

The perspective projection of a pinhole camera isdescribed by:

r ¼ f tan � ð3Þ

while fisheye lenses, depending on their design, couldhave one of the following projection characteristics(Van Den Heuvel et al., 2006):

r ¼ 2f tanð�=2Þ ð4Þ

r ¼ f� ð5Þ

r ¼ 2f sinð�=2Þ ð6Þ

r ¼ f sinð�Þ ð7Þ

In practice, fisheye lenses may not obey these formu-lae exactly, due to distortion. The fisheye lens used inthis work follows approximately the orthogonal projec-tion model given in Equation (7); however, an experi-mentally derived linear relationship was used, and iscompared to this model in Section 4.

An illustration of the fisheye lens camera model isshown in Figure 2. By knowing the Cartesiancoordinates of a point in the real object plane, and byconverting these coordinates into polar coordinates, allthe angles shown in the diagram can be calculated.

Figure 2. Illustration of the fisheye lens camera projection

model.




Thus, the coordinates of a point in the image plane willbe x¼ r cos(o), and y¼ r sin(o), where the vectorlength r is found from the fisheye lens projectionmodel given by Equation (7), and ! is the angle inpolar coordinates.

To convert into pixel coordinates, the number ofpixels per unit distance must be defined by means of asimple experiment, which involves choosing a point ona plane and then changing the angle of view � and rec-ording the changes in pixel locations (x, y). Thisexperiment is described in more detail in Section 4.

By knowing the pixel–distance ratio and thePrinciple Point’s coordinates, the actual pixel coordin-ates of the point in the image plane can be found usingthe following relationships:

xpixel ¼ kx þ x0 ð8Þ

ypixel ¼ ky þ y0 ð9Þ

where xpixel and ypixel are the x and y coordinates inpixels, respectively kx and ky are the distance changesin the x and y directions, respectively, and x0 and y0 arethe (x, y) coordinates of the principal point.

By knowing the intrinsic parameters, the extrinsiccalibration can be performed by specifying a numberof 3D points and their 2D projections. Then, using anonlinear least-squares optimization, an approximationof the projection model of the lens can be determined.

4. Experimental setup

The video acquisition setup consists of a camera with amounted fisheye lens. The system contains a CCDcamera with an 8.5mm lens, weighing 22 g, and with

a 420 TV Lines horizontal resolution. The fisheye lens isa 1.2mm aspheric lens with a 188 degree viewing angle.AS noted earlier, the wide viewing angle provides forfull capture of the Earth’s horizon at relatively low alti-tudes. The drawback of using a wide angle lens is thatsuch lenses suffer from a certain amount of barreldistortion which affects the image by reducing magnifica-tion as pixels move farther from the orthogonal opticalaxis, i.e. the center of the image frame.As illustrated in theFigure 1, the barrel distortion of the lenswraps the imagedregular rectangular grid to a spherical appearance.Nevertheless, fisheye lenses can map very wide angles ofthe object plane onto the limited area of the CCD sensor.Some of the equipment used is shown in Figure 3.

The UAV was a tele-operated model airplaneequipped with an IMU and the downward pointing fish-eye lens camera mounted on the IMU. The IMU telem-etry data were transmitted to a ground station computerusing a wireless aerial module. Similarly, the flight videowas streamed to the ground station in real-time and rec-orded in synchronization with the telemetry data forlater off-line algorithm development. The video frameswere in RGB color with a size of 640� 480 pixels: theframe rate was 29 frames per second.

The algorithm that was developed is intended to beused in aerial vehicles that do not experience more than60o of roll and pitch, because we can assume a lineartrend for the lens distortion in this range. To examinethis distortion, we conducted an experiment in whichimages of a regular grid target on a sheet of graphingpaper were taken through the fisheye lens. The viewingdistance was 10 cm and the target grid line spacing was6mm. The experiment entailed changing the viewingangle of the camera – that is, the angle between theline perpendicular to the plane and the optical axis

Figure 3. The employed UAV, IMU, fisheye lens, camera, and transmitters.

Rawashdeh et al. 5



where the camera is pointing. The distance between thecenter of the lens/frame and the center of the target (inpixels) was recorded as a function of viewing angle. Theresults, shown in Figure 4, reveal a close to linear trendfor angles of interest between zero and 60o. Figure 4shows a superimposed linear model given in Equation(10) and the trigonometric model given in Equation (7),with f¼ 255 and �¼ 400. The linear degrees-to-pixelsrelationship equation used is:

pð�Þ ¼ 3:68� þ 412:33 ð10Þ

and conversely

� ¼pð�Þ � 412:33

3:68ð11Þ

where � is the viewing angle in degrees of the camera(the angle between the perpendicular line on the imageplane and the camera’s optical axis), and p(�) is thepixel value at angle �. The assumption of linearity inEquation 11 simplifies the conversion from detecteddisplacement, in pixels, between the Earth’s centerand the center of the video frame, to pitch and rollvalues that can be used by the avionics system.

A second experiment was conducted to model thebehavior of the fisheye lens with respect to viewing dis-tance variation. Similar to the first experiment, target

points were set on a sheet of graphing paper. The dis-tance between the camera and the target plane wasvaried from 40mm to 300mm. The results show howthe x and y pixel coordinates change as a function ofcamera elevation, i.e. altitude. As shown in Figure 5and Figure 6, the numbered targets are moving towards

Figure 4. Fisheye lens projection model points with linear and trigonometric models superimposed.

Figure 5. Target point movement through a fisheye lens with

varying elevation, i.e. altitude.




the center of the image where the optical axis and targetplane intersect.

Target point 1 is in the center and, as expected, doesnot move as the camera distance is increased. Theremaining target points move in both the x and y dir-ections. The rate of change, with distance, of the targetlocation coordinates, as a function of proximity to thelens center must be taken into account when calculatingabsolute altitude values. This can be achieved byscaling the Earth’s diameter measured in pixels. Theconclusion from the angle data in Figure 4 and theelevation relationships in Figure 6 is that the lens dis-tortion is significant. The algorithm uses the angle datato translate pixel displacements to roll and pitch angles;however, for the altitude, ground truth altitude wasrelated to Earth diameter values in pixels, as explainedin Section 5.

5. Video frame processing

The algorithm processes frame images from the aerialvideo in order to detect the horizon curvature and fromit the Earth’s center and radius in pixels. The executionfrequency and frame rate necessary for successful flightcontrol must be determined experimentally and can bea function of aircraft dynamics and airspeed. The firstprocessing step is the extraction of a video frame and its

conversion from a color RGB image to a grayscale, i.e.intensity, image. Such an example frame image isshown in Figure 7. The Earth’s horizon is partially vis-ible in this example and the sky appears as a lightercrescent above it. In addition, as also evident inFigure 5, there are dark corners in the image due tothe fisheye lens creating a circular image on the

Figure 6. Target point movement in pixels with viewing distance, i.e. altitude.

Figure 7. Intensity frame image after barrel whitening showing

the coordinate system origin, and the image center marked with

an ‘x’ and arrow.

Rawashdeh et al. 7



rectangular imaging sensor. The algorithm proceeds towhiten these corners, as shown in Figure 7. This steprequires manual specification of the x–y coordinates ofthe lens center, marked with an ‘x’ in the figure, inaddition to setting a variable that is the radius of thelens barrel in pixels. A circular mask is then applied towhiten all pixels outside the fisheye lens’ view. In thisexample, the Earth center is slightly lower than, and tothe left of, the lens/frame center. This is a result ofthe UAV rising (having positive pitch) and bankingto the left (with negative roll).

In order to detect the Earth center and relate it to thelens center, the algorithm proceeds to detect severalpoints on the horizon. From the coordinates of thesepoints it is possible to specify the center and radius of acircle that passes through the horizon points and is cen-tered at the Earth’s center, using a least-squares fit afterremoving outlier points caused by noise or irregularitiesin the horizon from (for example) mountains or tallbuildings. The horizon points are iteratively determinedthrough edge detection in the frame image columnsafter the frame is binarized to values of 0 (black) and1 (white), as shown in Figure 8. Detected horizonpoints are shown on the example frame in Figure 9.These points include outliers that lie inside, not on,the horizon.

A nontraditional approach was used to detect thehorizon edge pixels. It is explained here through anillustrative example – supported by Equations (12)and (13). Suppose we want to detect horizon points inthe upper half of the image. After binarization, thealgorithm looks over the image’s columns. In each

column, it looks for a pixel that has, say, seven whitepixels directly above it, as well as seven black pixelsdirectly below it. Pixels that satisfy this condition –that is Equation (12) – are labeled as horizon points.A similar approach is employed to a horizon pointssearch in the lower image half using Equation (13).It is sufficient to perform edge detection in the columnsof the upper and lower image halves to guarantee hori-zon detection. The upper half columns will containhorizon points when the sky is in the first two quad-rants. Similarly, processing the lower half of the col-umns will detect the horizon points when the sky is inthe two lower quadrants. It is possible to reduceprocessing requirements by processing only a subsetof columns. In addition, detecting a horizon point byaveraging several columns will increase robustness tonoise. The binary image simplifies horizon detection,since light area pixels (like the sky) have a value of 1and dark areas (such as the earth), have a value of 0.A horizon point (xh, yh) in the upper two quadrantssatisfies the following two conditions:

X�

i¼1f ðxh, yhþiÞ ¼ � and

X�

i¼1f ðxh, yh�iÞ ¼ 0 ð12Þ

and similarly for a horizon point in the lower twoquadrants:

X�

i¼1f ðxh, yhþiÞ ¼ 0 and

X�

i¼1f ðxh, yh�iÞ ¼ � ð13Þ

where f is the intensity value of the pixel and can beeither 0 (if black) or 1 (if white), xh is the index of thecolumn being processed, yh is the row index of the edge

Figure 8. Frame image after binarization to white (value 1) and

black (value 0).

Figure 9. Several detected horizon points, including outliers in

an example frame.




(or horizon point), and � can be regarded as a sensitiv-ity threshold. A value of �¼ 5 was used. In otherwords, Equation (12) ensures that a horizon point(xh, yh) in the upper two quadrants has � whitepixels above it, and � black pixels below it. Anypoints within the column of search, with an index xhthat does not satisfy the conditions in Equation (12),are not considered horizon points. Similarly, Equation(13) ensures that detected horizon points in the lowertwo quadrants have � white pixels below, and a blackpixels above them.

Figure 9 shows an example frame with detected hori-zon points in the upper two quadrants. There are sev-eral mis-detected horizon points, where the edgedetection found edges below the horizon. This canoccur when the sky is bright at the horizon, makingthe horizon bright as well, and if bodies of water(which reflect the light sky) or white buildings are pre-sent. An additional situation where horizon points canbe mis-detected is when the UAV’s altitude is low andthe horizon contains mountains or tall buildings, orwhen an object is in the sky, or the lens is contaminatedwith dust or water droplets. In order to remove theoutliers – that is, the horizon points that lie outsidethe horizon curvature that most other detected points

will follow, the algorithm calculates a linear fit to thegradient of the horizon points, and discards all pointsthat are far away. The detected horizon points inFigure 9 are first displayed as a curve across theimage, as shown in Figure 10.

Here, the curvature and the outlier points are moreevident. A least-squares circle fit would estimate theEarth’s center only poorly in the presence of the out-liers shown.

The outliers are removed by fitting a line to theapproximate derivative of the detected horizon points.Horizon points that lie on the fitted line are suitablebecause they also lie on the horizon curvature. Anypoints far from the line do not follow the horizoncurvature and are deemed outliers, and are removed.The approximate derivative of the detected horizonpoints is calculated as follows:

y0 ¼yi � yiþ1xi � xiþ1

, i ¼ 1, 2, 3, . . . , k ð14Þ

where i is the index into the detected ‘horizon points’array and K is the number of detected horizon pointsincluding outliers. This process is illustrated inFigure 11.

50 100 150 200 250 300 350 400-40

-30

-20

-10

0

10

20

30

40

50

Figure 11. Linear fit to approximate derivative of detected horizon points (Y vs. X pixel coordinates).

50 100 150 200 250 300 350 400200

220

240

260

280

300

320

340

360

380

Figure 10. Graph view of detected horizon points to highlight curvature and outliers (Y vs. X pixel coordinates).

Rawashdeh et al. 9



It can be seen that the approximate derivativeremains close to zero and follows a straight line forpoints (x, y) that follow the horizon curvature.Whenever a point does not follow the curvature, theapproximate derivative will exhibit a peak value andlie outside the fitted line. A tunable threshold is usedto discard all points that are too far away from thefitted line, leaving only points that follow the horizoncurvature.

Once the noisy horizon points (outliers) are dis-carded, it is possible to approximate the horizonthrough a least-squares circle fit to the remaining hori-zon points, while ensuring a good fit because the largeresiduals have been removed, as indicated in Figure 12.The figure also shows the estimated Earth center, i.e.the fitted circle center, relative to the image center.

The least squares circle fit procedure of detectedhorizon points finds the best circle while minimizingthe sum S of orthogonal distances of horizon pointsto the fitted circle (Crawford, 1983; Moura andKitney, 1991; Coope, 1993; Nievergelt, 1994). Thiscan be expressed as:

S ¼X

i

½ðxi � xeÞ2þ ð yi � yeÞ

2� r2e �

2ð15Þ

where the sum index i runs over all horizon points, xiand yi are the horizon points’ coordinates, xe and ye arethe Earth’s (that is, fitted circle) coordinates, and re isthe circle (or Earth’s) radius in pixels. After estimating

the Earth center coordinates, the algorithm estimatesthe pitch P, roll R, and altitude A in pixels as follows:

P ¼ yc � ye ð16Þ

R ¼ xe � xc ð17Þ

A ¼ re ð18Þ

where xc and yc are the pixel coordinates of the lenscenter. The origin of the coordinate system is thebottom left corner of the frame as shown in Figure 7.The result illustrated in Figure 12 indicates that theUAV needed to lower its nose (it has positive pitch)and roll slightly to the right (it has negative roll) inorder to fly level. One advantage of this approach isthe ability to estimate the horizon, by fitting a min-imum of three points, even if a large part of theEarth is outside the field of view. When fewer thanthree horizon points are detected, the algorithmassumes that the altitude is too low for an accurateestimation and that the Earth is filling the view of thelens. A flow diagram of the video frame processingalgorithm is shown in Figure 13.

Figure 13. Algorithm flow diagram.

Figure 12. Fitted circle to horizon points after removal of

outliers. The image and estimated Earth centers are indicated

with a diamond and a small circle (and arrows), respectively.




The algorithm was developed using MATLAB andimplemented on a Pentium-4 based Laptop computerrunning the Windows XP operating system. The videowas streamed in real-time with 640� 480 pixel framestransmitted from the UAV to ground at 29 frames persecond. The algorithm processed at 10 frames persecond on the laptop and performed well, resulting inreliable attitude and altitude estimations. Because thealgorithm only uses simple image processing functionssuch as color conversion, gray-level threshold pixelmasking, counting and subtraction, the authors believeit is feasible to use the algorithm on board the UAVusing light-weight computing boards capable of run-ning open source image processing libraries such asC-based OpenCV.

It is important to point out that the higher the UAVflies, the smoother the Earth horizon appears in thefisheye image and the smaller its radius. A higher alti-tude also increases the probability of the whole Earthbeing visible, thus improving the algorithms estimationof roll and pitch. However, altitude estimation becomesless sensitive with increasing altitude, because thehigher the UAV the less pronounced is the Earthradius decrease in terms of image pixels.

Experimental calibration is required for altitude esti-mation. As discussed in Section 4, pixel coordinates oftarget points on the image plane move towards thecenter as altitude increases. Accordingly, the Earth’sestimated radius in the video frame will decrease withaltitude. However, there is another parameter thatcomes into play; that is, the horizon view distance asa function of altitude. Because – for the purpose of thisprogram – the Earth can be regarded as spherical, thedistance to the horizon changes with height, becausepoints more distant are visible as the altitude increases.This relationship is governed by the following equationand illustrated in Figure 14 (Ho et al., 2005):

d ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffihð2rþ hÞ

pð19Þ

where h is the height above sea level, 2r is the Earth’sdiameter (r& 6371 km), and d is the distance to thehorizon. Figure 15 shows a plot of the distance to thehorizon as a function of altitude in the range of interestaccording to Equation (19).

The estimation of altitude from the fisheye image isperformed by finding the Earth radius in pixels andrelating this value to the calibration altitude data. Thehorizon view distance, as well as the lens distortion,affects this pixel-to-altitude mapping. An appliedexperimental approach was adopted, where calibrationflights using a balloon were used to find the relation-ship. Figure 16 shows how the estimated Earth’s radiusdecreases as altitude increases in the calibration flights.

A linear fit was used to relate altitude in meters tothe Earth’s radius in pixels, as shown in Figure 16 andgoverned by the equation:

re ¼ �0:096189 � hþ 427:32 ð20Þ

and conversely

h ¼re � 427:32

�0:096189ð21Þ

where re is the Earth radius in pixels and h is the UAValtitude in meters.

6. Calibration and results

Several thresholds used by the algorithm must bedefined for the specific camera/lens configuration usedand the type of terrain over which the UAV flies. First,the field of view radius of the fisheye lens is required, tomask the dark barrel corners of the image on the rect-angular CCD – the radius used was 233 pixels. Second,the terrain and time of day affect the threshold valueused to binarize the image to black and white – thethreshold used was 55% for daytime video. Finally,the threshold number of black/white pixels that consti-tute a horizon point edge was 5 (pixels). In addition,roll, pitch and altitude estimates from the video are inunits of pixels and, to become real, require mappingthrough the curves in Figure 4; that is, Equation (11)for pitch and roll angles and Figure 16 – Equation (21)– for altitude.

The algorithm was applied to several aerial fisheyecamera videos. IMU data were also recorded and

Figure 14. Illustration of horizon view distance due to Earth’s

curvature.

Rawashdeh et al. 11



Figure 15. Horizon distance as a function of UAV altitude.

Figure 16. Altitude calibration data and linear model.




correlated with captured flight video frames. Figure 17shows the estimated and real pitch and roll values for aportion of a test flight. There is good agreementbetween the estimates and measurements, where themean error was around� 0.7 and� 0.9 angular degreesfor roll and pitch respectively. In addition, the

maximum errors are� 2 and� 1.9 degrees respectivelyfor estimated roll and pitch.

Figure 18 shows the estimated altitude error relativeto the ground truth. The maximum altitude error is� 3meters, and the average error is� 0.9 meters. The alti-tude estimate error appears to be larger at lower

Figure 18. Error in estimated altitude relative to ground truth.

Figure 17. Estimated pitch (a) and roll (b) values from a test flight in comparison to IMU baseline measurements.

Rawashdeh et al. 13



altitudes and decreases notably above 80 meters. This isprobably due to the fact that the higher the UAV fliesthe smoother the horizon appears, yielding a bettercircle fit to the horizon in the image. Conversely, atlow altitudes the horizon line appears less smooth,because mountains and tall buildings are relativelylarge in the images causing more variability in thecircle fit from frame to frame. The error ranges forestimated roll, pitch and altitude relative to theground truth from an inertial measurement unit aresummarized in Table 1.

The algorithm was tested under various daylight andweather conditions: sunny, cloudy, dark, rain andsnow. The contour of the earth was clear in most con-ditions except in darkness and snow because in theseconditions the contrast between sky and earth wasdecreased. Since the processing steps are relativelysimple pixel-level operations, we assume that the algo-rithm can be deployed in real-time on appropriatehardware on board the UAV.

It is worth noting that the attitude and altitude esti-mates are based on the Earth horizon circle fit obtainedby processing the video frames. The position of thecircle center in the frame produces the attitude esti-mates; and, similarly, the circle radius produces the alti-tude estimate. As such, the estimates are not affected byeach other but, rather, are correlated, in the sense that apoor circle fit induces errors into all three estimates ofroll, pitch and altitude.

7. Conclusions

A fisheye lens on an aerial downward pointing camerawas used with a video frame processing algorithm toobtain estimates of UAV pitch, roll and altitude.Although several algorithm parameters, such as binar-ization and edge detection thresholds, are likely to varyin different environments, the experimental resultsshowed that the proposed vision based system canserve as a viable attitude and altitude estimator forUAVs. The algorithm, which is based on detectingthe Earth’s contour, was successful in most weatherconditions. The estimated UAV roll and pitch angleshad a mean error of � 0.7 and � 0.9 angular degrees,respectively. In addition, the estimated altitude had amean error of � 0.9 meters.

Funding

This research received no specific grant from any funding

agency in the public, commercial, or not-for-profit sectors.

References

AbouSleiman R, Sababha B, Yang H, Rawashdeh N and

Rawashdeh O (2009) Real-time estimation of UAV atti-

tude from aerial fisheye video. Proceedings of AIAA

Infotech@Aerospace Conference, Paper # 1933, AmericanInstitute of Aeronautics and Astronautics, Reston, VA,

USA.Altug E, Ostrowski J and Taylor C (2003) Quadrotor control

using dual camera visual feedback. ICRA’03. IEEE

International Conference on Robotics and Automation,

14–19 September, pp. 4294–4299. IEEE. Xplore, NewYork, NY, USA.

Angeletti G, Valente J, Iocchi L and Nardi D (2008)Autonomous indoor hovering with a quadrotor.

SIMPAR Workshop Proceedings. Venice, Italy, 3–4

November, pp. 472–481.Betke M and Nguyen H (1998) Highway scene analysis from

a moving vehicle under reduced visibility conditions.

Proceedings of the International Conference on IntelligentVehicles, Stuttgart, Germany, 28–30 October, pp. 131–136.

New York, USA: IEEE Industrial Electronics Society.Brauer-Burchardt C and Voss K (2001) A new algorithm to

correct fish-eye-and strong wide-angle-lens-distortion

from single images. International Conference on Image

Processing, 7–10 October 2001, Thessaloniki, Greece, pp.225–228. IEEE, Xplore, New York, NY, USA.

Cavallo V, Colomb M and Dore J (2001) Distance

perception of vehicle rear lights in fog. Human Factors:Journal of the Human Factors and Ergonomics Society

43(3): 442–451.Coope I (1993) Circle fitting by linear and nonlinear least

squares. Journal of Optimization Theory and Applications

76(2): 381–388.Crawford J (1983) A non-iterative method for fitting circular

arcs to measured points. Nuclear Instruments and Methods

in Physics Research 211(1): 223–225.De Leo R and Hagen F (1978) Pressure sensor for determin-

ing airspeed, altitude and angle of attack. US Patent4096744.

Ettinger S, Nechyba M, Ifju P and Waszak M (2003) Vision-

guided flight stability and control for micro air vehicles.Advanced Robotics 17(7): 617–640.

Fowers S, Lee D, Tippetts B, Lillywhite K, Dennis Aand Archibald J (2007) Vision aided stabilization and

the development of a quad-rotor micro UAV. CIRA

2007: International Symposium on Computational

Intelligence in Robotics and Automation, 20–23 June, pp.143–148. IEEE, Xplore, New York, USA.

Grinstead B, Koschan A and Abidi M (2005) A comparisonof pose estimation techniques: Hardware vs. video.

Proceedings of SPIE Unmanned Ground Vehicle

Technology VII (5804), 28 March, Orlando, FL, pp. 166–

173. SPIE, Bellingham, WA, USA.Gurtner A, Boles W and Walker R (2007) A Performance

Evaluation of Using Fish-Eye Lenses in Low-Altitude

Table 1. Summary of estimation errors relative to ground

truth.

Estimated quantity Average error Maximum error

Roll �0.7 degrees �2 degrees

Pitch �0.9 degrees �1.9 degrees

Altitude �0.9 meters �3 meters




UAV Mapping Applications. Proceedings of the 12thAustralian International Aerospace Congress (AIAC), 19–22 March, Melbourne, QUT Library, Brisbane, Australia.

Gutwin C and Fedak C (2004) A comparison of fisheye lensesfor interactive layout tasks. Proceedings of the 2004Graphics Interface Conference, 17–19 May, London,Ontario, Canada, pp. 213–220. Canadian Human-

Computer Communications Society, Waterloo, Ontario,Canada.

Ho T, Davis C and Milner S (2005) Using geometric con-

straints for fisheye camera calibration. Proceedings ofIEEE OMNIVIS Workshop, 21 October, OMNIVIS2005, Beijing, China.

Hrabar S, Sukhatme G, Corke P, Usher K and Roberts J(2005) Combined optic-flow and stereo-based navigationof urban canyons for a UAV. 2005 IEEE/RSJ

International Conference on Intelligent Robots andSystems, 2–6 August 2005 (IROS 2005), Edmonton,Canada, pp. 3309–3316. IEEE, New York, NY, USA.

Kannala J and Brandt S (2006) A generic camera model and

calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Transactions on Pattern Analysis andMachine Intelligence 28(8): 1335–1340.

Li S and Hai Y (2010) Estimating camera pose fromH-pattern of parking lot. 2010 IEEE InternationalConference on Robotics and Automation (ICRA), 3–7

May, pp. 3954–3959. IEEE, New York, NY, USA.Mejias L, Campoy P, Usher K, Roberts J and Corke P (2006)

Two seconds to touchdown-vision-based controlled forcedlanding. 2006 IEEE/RSJ International Conference on

Intelligent Robots and Systems, 9–15 October, pp. 3527–3532. IEEE, New York, NY, USA.

Mostafa M and Hutton J (2001) Direct positioning and orien-

tation systems: How do they work? What is the attainableaccuracy? Proceedings of The American Society ofPhotogrammetry and Remote Sensing Annual Meeting,

St. Louis, MO, 24–27 April, pp. 23–27. ASPRS,Bethesda, Maryland, USA.

Moura L and Kitney R (1991) A direct method for least-

squares circle fitting. Computer Physics Communications64(1): 57–63.

Nievergelt Y (1994) Computing circles and spheres of arith-mitic least squares. Computer physics communications

81(3): 343–350.

Nordberg K, Doherty P, Farneback G, Forssen P, Granlund

G, Moe A and Wiklund J (2002) Vision for a UAV

Helicopter, International Conference on Intelligent Robots

and Systems (IROS), Workshop on Aerial Robotics, 1

October, Lausanne, Switzerland, pp. 29–34. IEEE/RSJ,

New York, USA.Rawashdeh O, Rawashdeh N, Sababha B and Yang H (2009)

Altitude and attitude estimation from aerial fisheye video.

International Conference on Information and

Communications Systems, 20–23 December Irbed,

Jordan, Paper # 419, ICICS, Irbed, Jordan.

Saripalli S, Montgomery J and Sukhatme G (2002) Vision-

based autonomous landing of an unmanned aerial vehicle.

ICRA’02: IEEE international conference on Robotics and

Automation, Volume 3, 11–15 May, Washington, DC, pp.

2799–2804. IEEE, New York, NY, USA.

Schwalbe E (2005) Geometric modelling and calibration of

fisheye lens camera systems. In: Proceedings of 2nd

Panoramic Photogrammetry Workshop, International

Archives of Photogrammetry and Remote Sensing 36(5): W8.Shah S and Aggarwal J (1996) Intrinsic parameter calibration

procedure for a (high-distortion) fish-eye lens camera with

distortion model and accuracy estimation. Pattern

Recognition 29(11): 1775–1788.Tournier G, Valenti M, How J and Feron E (2006)

Estimation and control of a quadrotor vehicle using mon-

ocular vision and moire patterns. AIAA Guidance,

Navigation and Control Conference, 21–24 August,

Keystone, Colorado, pp. 21–24. AIAA, Reston, VA, USA.

Van Den Heuvel F, Verwaal R and Beers B (2006) Calibration

of fisheye camera systems and the reduction of chromatic

aberration. International Archives of Photogrammetry,

Remote Sensing and Spatial Information Sciences 36,

IAPRS, Vol. XXXVI, Part 5.

Wang G, Wu J and Ji Z (2008) Single view based pose esti-

mation from circle or parallel plane. Pattern Recognition

Letters 29(7): 977–985.West J, Lahiri S, Maret K, Peters Jr. R and Pizzo C (1983)

Barometric pressures at extreme altitudes on Mt. Everest:

Physiological significance. Journal of Applied Physiology

54(5): 1188–1194.

Rawashdeh et al. 15



Vision-based sensing of UAV attitude and altitude from downward in-flight images

Documents

Transcript of Vision-based sensing of UAV attitude and altitude from downward in-flight images