MViDO: A High Performance Monocular Vision-Based ... - MDPI

39
applied sciences Article MViDO: A High Performance Monocular Vision-Based System for Docking A Hovering AUV André Bianchi Figueiredo * ID and Aníbal Coimbra Matos ID Faculty of Engineering, University of Porto and INESC TEC-INESC Technology and Science, 4200-465 Porto, Portugal; [email protected] * Correspondence: andre.b.fi[email protected] Received: 13 February 2020; Accepted: 30 March 2020; Published: 25 April 2020 Abstract: This paper presents a high performance (low computationally demanding) monocular vision-based system for a hovering Autonomous Underwater Vehicle (AUV) in the context of autonomous docking process-MViDO system: Monocular Vision-based Docking Operation aid. The MViDO consists of three sub-modules: a pose estimator, a tracker and a guidance sub-module. The system is based on a single camera and a three spherical color markers target that signal the docking station. The MViDO system allows the pose estimation of the three color markers even in situations of temporary occlusions, being also a system that rejects outliers and false detections. This paper also describes the design and implementation of the MViDO guidance module for the docking manoeuvres. We address the problem of driving the AUV to a docking station with the help of the visual markers detected by the on-board camera, and show that by adequately choosing the references for the linear degrees of freedom of the AUV, the AUV is conducted to the dock while keeping those markers in the field of view of the on-board camera. The main concepts behind the MViDO are provided and a complete characterization of the developed system is presented from the formal and experimental point of view. To test and evaluate the MViDO detector and pose an estimator module, we created a ground truth setup. To test and evaluate the tracker module we used the MARES AUV and the designed target in a four-meter tank. The performance of the proposed guidance law was tested on simulink/Matlab. Keywords: hovering AUV; docking; vision-based tracking system; monocular; guidance system 1. Introduction This work presents a monocular vision-based system for a hovering Autonomous Underwater Vehicle (AUV) in the context of the autonomous docking process. In this work, we developed a complete approach to bring a hovering AUV to a docking station using only one sensor: a single camera. The aim was to explore the development of a system to assist autonomous docking based on a single cost-effective sensor only. It was intended to realize how far we can get by using only the image in determining the relative pose and tracking the target. Space limitations onboard the AUV and energy limitations dictated the requirements for the development of this work. Based only on the data collected by a low-cost camera it was possible to reach a guidance law that drives the AUV to the cradle of a docking station. This work was intended to observe the target at a rate of 10 frames per second minimum (processing time less than 100 ms). This is due to the fact that the control operates at 20Hz and it is not desired that there is more than one estimation between two consecutive frames. As can be seen from the results (Section 5, Section 5.2.2, Figure 31), this objective has been achieved. The detector and relative pose estimation algorithm provides 12 frames per second (83 ms) to which adds the processing time of the developed filter which for 1000 particles per marker occupies 13 ms of the total processing time. We propose algorithms that should be able to run in a Appl. Sci. 2020, 10, 2991; doi:10.3390/app10092991 www.mdpi.com/journal/applsci

Transcript of MViDO: A High Performance Monocular Vision-Based ... - MDPI

applied sciences

Article

MViDO: A High Performance MonocularVision-Based System for Docking A Hovering AUV

André Bianchi Figueiredo * ID and Aníbal Coimbra Matos ID

Faculty of Engineering, University of Porto and INESC TEC-INESC Technology and Science, 4200-465 Porto,Portugal; [email protected]* Correspondence: [email protected]

Received: 13 February 2020; Accepted: 30 March 2020; Published: 25 April 2020�����������������

Abstract: This paper presents a high performance (low computationally demanding) monocularvision-based system for a hovering Autonomous Underwater Vehicle (AUV) in the context ofautonomous docking process-MViDO system: Monocular Vision-based Docking Operation aid.The MViDO consists of three sub-modules: a pose estimator, a tracker and a guidance sub-module.The system is based on a single camera and a three spherical color markers target that signal thedocking station. The MViDO system allows the pose estimation of the three color markers evenin situations of temporary occlusions, being also a system that rejects outliers and false detections.This paper also describes the design and implementation of the MViDO guidance module for thedocking manoeuvres. We address the problem of driving the AUV to a docking station with the helpof the visual markers detected by the on-board camera, and show that by adequately choosing thereferences for the linear degrees of freedom of the AUV, the AUV is conducted to the dock whilekeeping those markers in the field of view of the on-board camera. The main concepts behind theMViDO are provided and a complete characterization of the developed system is presented fromthe formal and experimental point of view. To test and evaluate the MViDO detector and pose anestimator module, we created a ground truth setup. To test and evaluate the tracker module we usedthe MARES AUV and the designed target in a four-meter tank. The performance of the proposedguidance law was tested on simulink/Matlab.

Keywords: hovering AUV; docking; vision-based tracking system; monocular; guidance system

1. Introduction

This work presents a monocular vision-based system for a hovering Autonomous UnderwaterVehicle (AUV) in the context of the autonomous docking process. In this work, we developed acomplete approach to bring a hovering AUV to a docking station using only one sensor: a singlecamera. The aim was to explore the development of a system to assist autonomous docking basedon a single cost-effective sensor only. It was intended to realize how far we can get by using onlythe image in determining the relative pose and tracking the target. Space limitations onboard theAUV and energy limitations dictated the requirements for the development of this work. Based onlyon the data collected by a low-cost camera it was possible to reach a guidance law that drives theAUV to the cradle of a docking station. This work was intended to observe the target at a rate of10 frames per second minimum (processing time less than 100 ms). This is due to the fact that thecontrol operates at 20Hz and it is not desired that there is more than one estimation between twoconsecutive frames. As can be seen from the results (Section 5, Section 5.2.2, Figure 31), this objectivehas been achieved. The detector and relative pose estimation algorithm provides 12 frames per second(83 ms) to which adds the processing time of the developed filter which for 1000 particles per markeroccupies 13 ms of the total processing time. We propose algorithms that should be able to run in a

Appl. Sci. 2020, 10, 2991; doi:10.3390/app10092991 www.mdpi.com/journal/applsci

Appl. Sci. 2020, 10, 2991 2 of 39

low power consumption and low-cost system with minimal interference with existing HW modulesof AUV MARES [1]. Being our goal to develop high-level algorithms and not to develop hardware,we have chosen a generic platform called Raspberry Pi 2, which we can classify in general as a lowpower consumption and low-cost system. So, instead of considering data sources from other sensorsand increase the computation cost for sensor fusion, we choose to explore the maximum capabilitiesof data provided by the image sensor, as we will explain at Section 2. Our approach, consideringRaspberry Pi 2 processing constraints, should be able to get more than 10 estimations per second(a requirement for smooth control of MARES). In this work, in addition to presenting the developedvision-based system that detects and tracks the docking station visual markers, we also address theproblem of defining a guidance law for a hovering AUV when performing a docking maneuver.We used as a case study the docking of MARES, a small size modular AUV [1], but our approachis pretty general and can be applied to any hovering vehicle. The results presented in this paperare the outcome of continuous effort made by the center for robotics at INESC TEC for autonomousdocking of AUVs. This work has started with the localization and positioning of an AutonomousSurface Vehicle (ASV), assuming parallel target and camera planes [2]. Then the efforts were devotedto the preliminary characterization of relative position estimation and vehicle positioning. In thatwork [3], it was demonstrated that the hovering AUV MARES (Figure 1) was capable to track andposition itself with regard to the target, at a given pre-specified depth. The estimation of the relativeposition, based solely on the image, is subject to abrupt variations as well as indeterminate solutionsin the absence of one of the markers. There was a need to persist with the tracking even in the caseof partial or momentary occlusions of the markers that compose the target. The solution we foundto overcome this problem is presented on Section 2 (Section 2.3). The proposed solution allowed notonly to maintain the estimates of the relative pose even during the occlusion of one or more markers,but also to reduce the number of outliers detections (Section 5). In order to drive the AUV MARESfrom any point within the camera field of view to the point where the cradle of the docking stationis, we designed and developed a guidance law (Section 4). The proposed law allows always keepingthe markers in line of sight during the docking maneuver. We show that by adequately choosing thereferences for the linear degrees of freedom (DOF) of the AUV it is possible to dock while keeping thetarget markers in the field of view of the on-board camera.

Figure 1. MARES tracking the visual target.

This document is organized as follows:

• Section 1 contains a survey of the state of the art related to our work; the contributions of our workand the methodology followed for the development of the presented system.

• Section 2 contains the developed system for docking the AUV MARES using a single camera.In particular, the pose estimation method, the developed tracking system and the guidance law,

Appl. Sci. 2020, 10, 2991 3 of 39

• Section 3 contains a theoretical characterization of the used set camera-target,• Section 4 describes the developed guidance law,• Section 5 describes the experimental setup and presents the experimental results: an experimental

validation of the developed algorithms under real conditions such as the sensor noise,lens distortion and the illumination conditions,

• Section 6 contains the results of tests to the guidance law performed in a simulation environment,• Section 7: a discussion about the results,• Section 8: conclusions.

1.1. Related Works

In the context of autonomous docking a hovering AUV, our work was concerned in fourmain issues:

• build a target composed by well identifiable visual markers,• visually detect the target through image processing and estimate the relative pose of the AUV with

regard to the target: the success of the docking process relies in accurate and fast target detectionand estimation of the target relative pose,

• ensure target tracking: the success of the docking process depends on a robust tracking of thetarget even in situations of partial target occlusions and the presence of outliers,

• define a strategy to guide the AUV to the station’s cradle without ever losing sight of the target.

Taking these concerns into account, a survey of the state of the art, related to each of these issues,was made and the different approaches were analyzed.

1.1.1. Vision-Based Approaches to Autonomous Dock An Auv

In recent years, the use of artificial vision and identifiable markers for pose estimation in anunderwater environment, has become a subject of interest for the scientific community working insubaquatic robotics. Recent examples are the work of [4] where it is proposed a solution using activelight-marker localization for docking an AUV. For that, the authors use a module based on a monocularsystem for detect a set of active light markers. Since the light-markers are placed in known positions ofthe docking station the authors in [4] can estimate the pose between the docking station and the camera.In order to detect the light-markers, the authors in [4] use a blinking pattern to allow the markers tobe discerned. Visual markers are one of the most used systems to help to guide the AUV to the dock,and other approaches have been proposed by [5]. In [5] the authors propose a docking method for ahovering AUV at a seafloor station. The proposed method is based on visual positioning and makesuse of several light sources located on the docking station to visual guide the AUV. The proposedmethod is divided into two stages by distance, a first stage in which the vehicle is visual guided basedon the light sources and a final stage in which the AUV just keeps its attitude and moves forward intoa corn-shaped station. This method is not applicable to our case because the entrance of our dockingstation is not corn-shaped. Note that in the method proposed in [5], the authors suspend vision-basedguidance when the AUV is at the dock entrance since the light sources are too close for the AUV to see.The solution we propose in our work does not need to suspend guidance based on vision, allowing amore aware docking of the docking maneuver. The use of a single camera on-board of an AUV andmultiple light markers mounted around the entrance of a docking station in order to guide visuallyan AUV is also a solution presented in [6]. In this type of approach, the estimation of the relativepose required for autonomous docking is done using the markers geometry. As in the case of thework mentioned above ([5]), also in the work presented in [6] there was a position where the lightson the dock were outside of the camera viewing range when the AUV was close to the dock, and thevision-guidance is invalid in this area. In the literature, we also find works that are based on solutionsthat use passive targets to signal the dock. In the case of [7] the passive targets are docking poles.Passive markers in an underwater environment limit the process both in terms of minimum distance tothe target and visibility conditions. In the solution we propose, we tried to circumvent these limitations

Appl. Sci. 2020, 10, 2991 4 of 39

by building a hybrid target (active and passive). In other works, such as [8], the authors proposed amethod to estimate the relative pose of a docking station using a single camera. They assume thatthe dock is circular so the proposed method is for a 5-DOF pose estimation neglecting the roll effectsbecause all the markers are physically identical, and there is no way to distinguish them. Furthermorenot considering the roll effect, the proposed method in [8] presents ambiguities in angles estimation.

1.1.2. Vision-Based Related Works

For purposes other than docking, works based on artificial vision were found. In [9] an approachis proposed for estimating the relative pose between AUVs in tight formation. For a pose estimationbased on visual information, they use active light markers as optical markers placed on the followerAUV. The pose estimation method in [9] is then based on the minimization of a non-linear cost functioncomprising error terms from light marker projection, plus range and attitude measurements whenavailable. The use of a camera to estimate the relative position of an underwater vehicle also servesother purposes, such as the work in [10], where it is intended that an AUV performs an autonomoushome into a subsea panel in order to perform an intervention. For the subsea panel detection andsubsequent relative position estimation, the authors in [10] propose a method in which the imagesgathered by a single camera on-board AUV are compared with a priori known template of the subseapanel. There are other works whose approach is based on stereo vision, such as the work in [11],where the authors use dual-eye cameras and an active 3D target. The real-time pose estimation is madeby means of 3D model-based recognition and multi-step genetic algorithm. Although we recognize theadvantage in using stereo vision, the energy and space requirements available in our vehicle dictated asolution based only on a single camera. Vision-based homing has been widely implemented in theground and aerial robotics [12] and its concepts have been translated to underwater robotics in someworks such as [13].

1.1.3. Approaches Based on Different Sensors

An alternative homing method (non vision-based) was proposed by [14]. The proposed methodin [14] uses electromagnetic waves emitted and received by means of large coils placed in both thedocking station and the AUV. The overall system makes it possible to compute bearing to the dockat distances up to 35 m in seawater. Bearing was also employed by [15] using an USBL (ultra-shortbaseline) sensor carried on the vehicle. The control law derived just has to ensure that the bearingangle is null along the trajectory to home. However, it is well known that, in USBL systems, the angleresolution decreases with the distance and the sensor may not be omnidirectional, thus being possibleto measure angles and ranges only when the beacon is located at a given angular position, with regardto the USBL sensor. In [16], an extremum-seeking algorithm was introduced to find the maximumapproach rate to a beacon. Nevertheless, the method in [16] requires initialization otherwise the AUVmay be driven to a stable equilibrium point in the opposite direction of the beacon. In works suchas [17], a priori information on the area is used to generate an artificial potential field combined with asliding mode control law to home an AUV to its docking station.

1.1.4. Vision-Based Relative Localization

The detection of markers and the computation of the position and attitude of the AUV withrespect to them have been addressed in several works [5,18]. In the context of vision-based relativelocalization estimation, there are strategies such as ego-motion based on cameras. Ego-motion isdefined as the 3D motion of an agent in relation to its environment, that strategy consists in estimatinga camera motion relative to a rigid scene. Ego-motion using a single camera-Monocular VisualOdometry (MVO)-consists of sequential estimation process of camera motions depending on theperceived movements of pixels in the image sequence. MVO operates by incrementally estimatingthe pose of the vehicle through the examination of the changes that motion induces on the imagesof its onboard camera. For MVO to work effectively, there should be sufficient illumination in the

Appl. Sci. 2020, 10, 2991 5 of 39

environment and a static scene with enough texture to allow apparent motion to be extracted [19].Furthermore, consecutive frames should be captured by ensuring that they have sufficient sceneoverlap. For accurate motion computation, it is of utmost importance that the input data do notcontain outliers. Outlier rejection is a very delicate step, and the computation time of this operation isstrictly linked to the minimum number of points necessary to estimate the motion [19]. The work [20]proposes robust methods for estimating camera egomotion in noisy, real-world monocular imagesequences in the general case of unknown observer rotation and translation with two views and asmall baseline. The recent works have demonstrated that the computational complexity of ego-motionalgorithms is a challenge for embedded systems because there are iterative operations that limit theprocessing speed. However, the recent work [21] proposes a hardware architecture for the “ego-motionchallenge” that consists of using look-up tables and a new feature matching algorithm. The cameramotion is estimated with no iterative loop and no geometrical constraints. The authors in [21] claimthat their algorithm complexity is low and that a small size GPU-based implementation is feasible andsuitable for embedded applications. The major drawback of the camera-based ego-motion approach isdue to image analysis which is typically computationally expensive [21]. Perspective-n-points (PnP)algorithms can accurately solve points-based pose estimation (see work [22]), but a drawback of PnPalgorithms, just like the other methods found in the literature, is that they fail in estimating posesin cases where not all markers are detected and a partial occlusion of the target, i.e., an incompleteobservation is highly possible in a docking process. In our work, for estimate the relative position of anAUV with regard of the target we consider the algorithm presented on [23]. The algorithm presentedin [23] serves our needs with regard to position estimation, however, in terms of the relative attitudeestimation the algorithm presented in [23] has some assumptions and simplifications that do not fitour problem. There are two ambiguities when using a target with only three markers: an ambiguityin the signal when the target is rotated around the x axis and another ambiguity when the target isrotated around the y axis. For that reasons, we decided to develop a new alternative algorithm forattitude estimation (presented in Section 2.2).

1.1.5. Tracking: Filtering Approaches

In the literature we could find some classical approaches to Bayesian nonlinear filtering such as theones described in [24,25]. The most common choice lies in the Kalman Filter, but do not serve our caseconsidering the nonlinearity of the particular problem that we want to solve and a non-Gaussian sensornoise. On the other hand, the Extended Kalman filter could be applied for nonlinear non-Gaussiansmodels, however, the linearized model considers Gaussian noise. Despite other approaches such as thePoint Mass Filter approach applies to any nonlinear and non-Gaussian model [24], the main limitingissue is the dimensionality of the grid size and the algorithm is of quadratic complexity in the gridsize. In fact, in grid based systems, adjusting the grid size by varying the cell size in order to leadto less expensive computational cost will not be a good choice because by increasing the grid sizewould reduce the system resolution. A Particle Filter approach has advantages of simple calculationand fast processing. Particle Filters can be implemented for arbitrary posterior probability densityfunction, which arises in particular when faced with nonlinear observations and evolution models ornon-Gaussian noises [26]. On the other hand, in particle filter, the tradeoff between estimation accuracyand computational load comes down to adjusting the number of particles [26]. The particle filters canpropagate more general distributions and that is of particular benefit in underwater visual tracking.

1.1.6. Docking: Guidance Systems

In the literature we found some works focused on the purpose of guiding an AUV to a dock usinga vision-based system. We highlight here two:

• In [27] a monocular vision guidance system is introduced, considering no distance information.The relative heading is estimated and an AUV is controlled to track a docking station axis linewith a constant heading, and a traditional PID control is used for yaw control,

Appl. Sci. 2020, 10, 2991 6 of 39

• In another work [28], two phases compose the final approach to the docking station: a crabbedapproach where the AUV is supposed to follow the dock centerline path. The cross-track erroris computed and fed-backed; and a final alignment to eliminate the contact of the AUV and thedocking station.

None of the works found in the literature takes into account our concern to never lose sight of thetarget until the AUV is fully anchored in the cradle. For this reason, a new solution was developedthat is presented in this paper.

1.2. Contributions of This Work

Taking into account the state of the art, we highlight the following points as being the contributionsof our work:

• A module for detection and for attitude estimate of an AUV dock station based on a singlecamera and a 3D target: for this purpose, a target was designed and constructed whose physicalcharacteristics maximize its observability. The developed target is a hybrid target (active/passive)composed by spherical color markers which could be illuminated from the inside allowing toincrease the visibility of the markers at a distance or in very poor visibility situations. It was alsodesigned an algorithm for detecting the target that responds to needs of low computational costand that can be run in low power, low size computers. A new method for estimate the relativeattitude was also developed in this work.

• A novel approach for tracking by visual detection in a particle filtering framework. In order tomake the pose estimation more resilient to markers occlusions, it was designed and implemented asolution based on Particle Filters that considers geometric constraints of the target and constraintsof the markers in the color space. These specific approaches have improved the pose estimatorperformance, as presented in the results section. The innovation in our proposal for the trackingsystem consists of the introduction of the geometric restrictions of the target, as well as therestrictions in the color space as a way to improve the filtering performance. Another contributionis the introduction, in each particle filter, of an automatic color adjustment. This allowed not onlyto reduce the size of the region of interest (ROI), saving processing time, but also to reduce thelikelihood of outliers.

• It was developed a method for real-time color adjustments during the target tracking process,which improves the performance of the markers detector through better rejection of outliers.

• It was designed and implemented an experimental process, with Hardware-in-the-loop,to characterize the developed algorithms.

• A guidance law was designed to guide the AUV with the aim of maximizing the target’s observanceduring the docking process. This law was designed from a generalist perspective and can beadapted to any system that bases its navigation on monocular vision, or another sensor whosefield of view is known.

1.3. Requirements

For the purpose of design and implement the MViDO system, it would be necessary to finda specific solution to meet certain requirements. One of the requirements for the development ofthat system was that its implementation was done on a computer that was small enough to beaccommodated onboard the AUV-embedded low-cost hardware. Such a requirement implied amore limited processing feature, i.e., a high performance (low computationally demanding) system.By commitment to the control system of the AUV, the NViDO system should not exceed 0.011 s toprocess each frame received from the camera. Another requirement was that the system be basedonly on a single camera so that the AUV dynamics were the least changed possible, i.e., it was notintended to alter the MARES body, so any extra equipment (sensor) to be installed in the vehicle wouldnecessarily have to be incorporated into the housing available for that purpose. On the other hand,

Appl. Sci. 2020, 10, 2991 7 of 39

the space on board the vehicle is limited and the energy available as well. The housing available to adda sensor only supports a camera and a light source. Furthermore, using more than one camera wouldcreate more drag and consume more energy. These were the starting points for the development of theMViDO system.

Components Specifications

The developed system is integrated as an independent module in the MARES AUV system. All thedeveloped algorithms, presented here, were implemented on a Raspberry Pi 2 model B, a computerthat is small enough to be accommodated onboard the MARES AUV. The single camera used forthe vision system was a Bowtech underwater camera with a Sony 1/3′′ EX-View HAD CCD sensor.The camera and lens specifications are those shown in Tables 1 and 2.

Table 1. Camera sensor specifications.

Sensor Size (mm) 3.2 × 2.4

Resolution (pixels) 704 × 576

Pixel Size (µm) 6.5 × 6.25

Table 2. Camera lens specifications.

Focal length (mm) 3.15

Diagonal Field of View (degrees) 65

Maximum Aperture f 1.4

1.4. Methodology

We had to deal with a specific problem and we proposed a concrete solution. Taking intoaccount the requirements, it was necessary to find a solution based on a monocular system and withlimited processing capacity (a solution that can be run in low power, smaller computers). Whereasit was intended to estimate the relative position and attitude of the AUV in relation to a dock usinga single camera, and taking into account that, in the vicinity of the dock, the AUV’s camera maynot necessarily be facing the dock, we chose to construct a target to attach to the dock and whosemarkers were visible on different points of view. On the other hand, we chose to use different colorsfor each of the markers in order to avoid ambiguities in the relative pose estimation. In order toestimate a 3D relative pose of the AUV in relation to the built target, we propose an algorithm of lowcomputational cost that detects, identifies and locates the spherical color markers in the image andthen estimates the position and orientation of the target with regard to the camera frame mountedonboard AUV. The module developed for the detection of markers and relative pose estimationpresents fairly satisfactory estimates and with sufficient accuracy to feed the control, especially withaccurate orientation information needed in the final docking phase. We propose here a particularsolution to deal with situations of partial occlusion of the target (absence of one or more markers) orappearance of outliers that affect the estimate of the 3D relative pose. Our particular solution, based onparticle filters, rather than following the more common approach of throwing particles over the targetas a whole, considers a particle filter for each of the markers and considers the geometric constraints ofthe target and the constraints of the markers in the color space. The proposed solution also considersreal-time color adjustments which allow, in a scenario with features of color similar to our markers,a greater number of outliers is avoided and the weight of the particle of the filter has a more adjustedvalue and that the results are more accurate.

Appl. Sci. 2020, 10, 2991 8 of 39

2. Mvido: A Monocular Vision-Based System for Docking a Hovering Auv

This section presents three components of the modular MViDO system developed in order toassist the autonomous docking process of an AUV. Here are three of our contributions: the target,the attitude estimator, and the tracking system. The total MViDO system is also composed of othercontribution-the developed guidance law-which is presented in a separate Section 4.

2.1. Target and Markers

The docking station choice plays an important role in vision-based docking. We believe thatchoosing a target composed of well-identifiable markers to attach the dock aids in 3D pose estimation.In the context of this work, a target was designed and built. The developed target is formed bythree color spherical markers rigidly connected. Our solution for the dock’s target based on sphericalmarkers, as shown in Figures 2 and 3, allows the target to be viewed from different points of viewand without shadows to make image segmentation easier We chose to use different colors for eachof the markers in order to avoid ambiguities in the relative pose estimation. The first solution wasbased on passive visual markers, occasionally illuminated outside only in certain circumstances ofvisibility. This solution proved not to be the best since lighting from outside created unwanted shadows.A second solution was developed (Figure 2) based on hybrid visual markers. In this hybrid target,the markers could be illuminated from the inside allowing to increase the visibility of the markersat a distance or in very poor visibility situations. The interior lighting intensity is controllable fromthe docking station. The fact that the lighting is controllable allows us to adjust the intensity so thatthe colors continue to be noticeable. In a limit situation, where a stronger interior lighting is required,when the camera is facing the color lightning marker, what we see is very bright ellipses and a coloredhalo around that ellipses and in this way we continue to have color information. In situations of goodvisibility or in close proximity to the target, interior lighting is almost unnecessary and the targetbecomes quasi-passive avoiding energy consumption. In certain situations, the fact that the markerscan be illuminated can also serve as a first beacon signaling the presence of the target. The choiceof dimensions for the target (see Figure 2a) was related to three factors: the maximum operatingdistance Camera-Target, the field of view of the lens and the constructive characteristics of the dock(see Figure 3). The target presents different distances on the two axes a and b (see Figure 2a) in order toavoid ambiguities in the relative position estimation.

0.100m

b=0.240 m

a=0.150 ma=0.150 m

(a)

(b)

Figure 2. The hybrid target. (a) The Target dimensions: different distances on the two axes. (b) Thedeveloped hybrid Target: the interior lighting intensity is controllable from the docking station.

Appl. Sci. 2020, 10, 2991 9 of 39

Figure 3. Docking station.

2.2. Attitude Estimator

In our previous work [3], we develop an algorithm to visually detect the target formed by thethree-color spherical markers rigidly connected. The detection algorithm is based on an image colorsegmentation which detects the center of mass of each of the markers after an iterative process where,for each captured frame, a threshold is made for the range of the reference colors of the markers.When the three points in the image are identified, we proceed to relative pose estimation. For the 3Dpose estimation, the developed algorithm is based on a u, v image that determines the target positionand orientation with regard to the camera frame-mounted onboard AUV.

For estimate the relative position of the AUV MARES with regard of the target (see Figure 4),we use the algorithm presented on [23] and we consider the pinhole camera model presented onFigure 5. A relative position vector is defined as represented bellow:

d =[

xr yr zr

]T(1)

where xr, yr, zr are the estimated relative distances of the centre of the color markers from the cameraalong xc, yc, zc directions. The estimate of zr can be obtained from the following equation in z2

r :

z4r (k2k3 − k1k4)

2 − z2r

(k2

1 + k22 + k2

3 + k24

)+ 1 = 0, (2)

where, k1, k2, k3 and k4 are known constants, determined in function of the coordinates of the markersin the image, the focal length of the camera and the fixed real distances between the markers on thetarget. That constants are given by:

k1 =uC − uA2 ∗ a ∗ f

(3)

k2 =vC − vA2 ∗ a ∗ f

(4)

k3 =uA + uC − 2 ∗ uB

2 ∗ b ∗ f(5)

k4 =vA + vC − 2 ∗ vB

2 ∗ b ∗ f, (6)

where (uA, vA); (uB, vB); (uC, vC) are the image coordinates of the markers, a is the distance from themarker A and C to the center of the target reference frame and b is the distance from the marker B to

Appl. Sci. 2020, 10, 2991 10 of 39

the center of target reference frame, f is the focal length of the lens, More details of the algorithm canbe found in [23].

Yr

Xr

Zr

AB

C

BA

TARGET CENTEREDREFERENCE FRAME

aC a

b

Xc

Yc

vu

LENS

f

AUV MARES

TARGET

IMAGEPLANE

CAMERA CENTEREDREFERENCE FRAME

Zc

Xb

Zb

Yb

Figure 4. Definitions of the camera and target reference frames for relative localization.

v

u

x

y

f

IMAGEREAL SCENE

Z

Figure 5. Pinhole camera model.

In terms of the relative orientation estimation, the algorithm presented in [23] has someassumptions and simplifications that do not fit our problem. There are two ambiguities when using atarget with only three markers: an ambiguity in the signal when the target is rotated around the x axisand another ambiguity when the target is rotated around the y axis. For that reasons, we decided todevelop a new algorithm for attitude estimation. For that, we considered the Figure 6. In first place,we rotate the ’plane r’ (the plane of the target) in order to place it parallel to the ’plane c’ (a mirrorplane of the CCD sensor of the camera). Then, a scale factor is applied in order to obtain the samelimits for both planes. Finally, a translation is applied to match the two planes.

Appl. Sci. 2020, 10, 2991 11 of 39

Plane r

Plane c

pcA

pcB

pcC

prC

prB

prA

Figure 6. Pose estimator. In the figure, the ‘plane r’ is the plane of the target and the ‘plane c’ is a mirrorplane of the sensor of the camera.

This will allow us to transform any point from one referential frame into the other:

pc = αRpr + tr (7)

In this perspective projection α is a scale factor (which depends on the focal length), R is therotation matrix, tr is the translation term, pc and pr are a point in the plane c (a mirror plane of thesensor of the camera) and in the plane r (plane of the target) respectively. With reference to Figure 4and according to the Figure 6 the vectors pc and pr are defined as:

pc ∈ R3 : pc =

pcu

pcv

0

(8)

pr ∈ R3 : pr =

prx

pry

prz

. (9)

Beginning by eliminating the translation term in Equation (7):pA

c − pBc = αR

(pA

r − pBr)

pAc − pC

c = αR(pA

r − pCr)

pBc − pC

c = αR(pB

r − pCr)

RRT = I

(10)

In matricial form:Ppc =

[pA

c − pBc pA

c − pCc pB

c − pCc

]∈ R3x3 (11)

Ppr =[pA

r − pBr pA

r − pCr pB

r − pCr

]∈ R3x3. (12)

Once,Ppc = αRPpr , (13)

then, [pA

c − pBc pA

c − pCc pB

c − pCc

]= (14)

= αR[pA

r − pBr pA

r − pCr pB

r − pCr

]. (15)

In order to solve this, a pseudo-inverse is applied:

Ppc PprT = αRPpr Ppr

T . (16)

Appl. Sci. 2020, 10, 2991 12 of 39

Then we determine αR:αR = Ppc Ppr

T(

Ppr PprT)−1

, (17)

converting the rotation matrix R into quaternion elements:q2

0 + q21 − q2

2 − q23 = R11

2q1q2 + 2q0q3 = R12

2q1q2 − 2q0q3 = R21

q20 − q2

1 + q22 − q2

3 = R22

, (18)

where R11, R12, R21 and R22 are the coefficients of the matrix R and q0, q1, q2 and q3 are thequaternion elements.

Now, solving the equations in order to the quaternion elements:

q40 −

(12

)(R11 + R22) q2

0 −(

116

)(R12 − R21) = 0

q2 =(

14q1

)(R12 + R21)

q3 =(

14q0

)(R12 − R21)

−q41 −

(12

)(R22 − R11) q2

1 +(

116

)(R12 + R21)

2 = 0

(19)

q =

q0

q1

q2

q3

· 1

‖[q0 q1 q2 q3

]. ‖

(20)

This is how we compute the attitude.

2.3. Resilience to Occlusions and Outliers Rejection

Just by itself, the color-based detector algorithm proved to be insufficient for a robust later poseestimation because of the false detections, the outliers and the noise in the color detections. On theother hand, partial occlusions of the target implies only a partial detection which would not allowa pose estimation. To overcome these limitations, we decided to develop a filter algorithm in acolor-based context. The implemented algorithm allows an adaptive color detection, in the sensethat each marker model is updated over time in order to make it more robust against changes inwater turbidity conditions and underwater illumination changes. The main goal is to track the stateof each of the three markers in the captured images. In our color tracker, we used a probabilisticBayesian approach and in particular a sequential Monte Carlo technique and we considered two typesof constraints related to the target in order to improve the performance of the filter: the geometricalconstraints of the target and the constraints of the markers in the color space. The developed solutionallowed to improve the performance of the relative pose estimator by rejecting outliers that occursduring the detection process and made it possible to obtain pose estimates in situations of temporaryocclusion of the markers.

For the proposed tracking system, we developed the filters based on a particle filter framework.For the formal model of the particle filter we adopt the one presented on [24] and we choose theresampling method presented in [29]. The Figure 7 presents the scheme we propose for the proposedtracking system. As shown in the Figure 7, the filter block appears between the markers detector blockand the pose estimator block. The filter block consists of three particle filters, one filter per each marker.Each particle filter receives from the detector block the position u, v and the radius R of the respectivemarker in the image.

Appl. Sci. 2020, 10, 2991 13 of 39

IMAGE [u,v,R]

COLORSEGMENTATION

PARTICLEFILTER

DETECTORCAMERA

POSEESTIMATOR

- DATA FROM IMU- CONST. VELOCITY/ACCELERATION MODEL

- CONTROL ACTIONS

TO UPDATE

TO PREDICTION

[u,v,R]

TO UPDATE

[u,v,R]

TO UPDATE

PARTICLEFILTER

PARTICLEFILTER

COLOR CONSTRAINTS

GEOMETRICCONSTR.

GEOMETRICCONSTR.

Figure 7. Proposed approach for tracking the target in video captures and to estimate the relative poseof the AUV with regard to that target. In the Figure, the elements u, v and R are the position of the blobon the image and the blob radius on the image.

2.3.1. Problem Formulation

Considering an image I composed by a set of pixels I{

xi, yj}

where i ∈ N [0 Iwidth] and j ∈N[0 Iheight

], in the Hue, Saturation, Value (HSV) space IHSV where is extracted a set of color blobs

Bl =[

Blred, Blyellow, Blgreen

]. Considering that in HSV color space, the Hue refers to the perceived

color; the Saturation measures its dilution by white light (the “vividness” of the color); and the Valueis the intensity information. The detector block is used to detect the color blobs in the HSV space.Based on that visual sensing, the position and the radius of each detected color blob Bli

k is extractedfrom the image with some uncertainty. Each detected color blob in the image is represented byBli

k = [u, v, rdBl ]. Where u, v is the position of the blob on the image and rdBl is the blob radius on theimage. The blobs are selected considering the colour likelihood model, which is defined in a way thatprivilege the hypotheses with HSV values which are close to the reference HSV values and the blobswith bigger radius.

P(Blicolor) = C1 ∗ Pcolor(Bli

color) +C2 ∗ rdBli

color

max(rdBlcolor)

, (21)

where C1 and C2 are parameters to weight the values given by the model color and radius. Pcolor isa function that returns the mean probability of all pixels inside the circle defined by [u, v, rdBl ] andconsidering the function (24).

Analyzing Equation (21), the probability P(Blicolor) of being the color blob i will result from the

sum (a fusion) of two terms. The first term C1 ∗ Pcolor(Blicolor) concerns to constraints in the color space.

The second termC2∗rd

Blicolormax(rdBlcolor

)concerns to the geometric constraints of the target.

We want to use a particle filter per each marker that compose the target, to determining theposition of each marker in an image. For each marker, a particle filter is implemented in order tomake the system reliable under one marker occlusion. With three particle filters we increase thesystem redundancy making it more reliable. For each color marker BlCk, we have a state vector aspresented below:

BlCk =[u, v, rdBl , cx, cy, crdBl

], (22)

where k is the marker index, u, v is the position of the marker on the image, rdBl is the marker radiuson the image and cx, cy, cz is the linear velocity. The BlCk is initialized with the values of Bli

k which

Appl. Sci. 2020, 10, 2991 14 of 39

has the higher probability (max(P(Blcolor)) on the set Blk. Each filter uses a constant velocity model(see Figure 7).

2.3.2. Considering Geometrical Constraints

In the filter block (see Figure 7) we consider geometrical constraints related to the target in orderto improve the performance of the filter. Since the target geometry is well-known, such as the distancebetween the three markers (see Figure 8) and the radius of each marker, we used this information tocondition filter estimates.

A B

C

dABdAB

dBC

dAC

Figure 8. Estimated distances for the target.

If the underwater scene contains other objects with a similar color to that of the markers, the colourcues became more prone to ambiguity. For that reason, it is necessary to use the target geometryconstraints to allow the markers to be located with low ambiguity. When we take into account thegeometric constraints, the procedure we implemented is as follow: considering the first instant t = 0when the markers A and B are detected, the estimated distance dAB at t = 0 is d0

AB. Supposing a timeprogression, at the instant t = 1, the distance d1

AB will change according to the movement of the AUV,that is:

d1AB = d0

AB + uIAUV , (23)

where uIAUV is the movement of the AUV projected in the frame I of the camera. That movement

depends on the proximity and pose of the AUV with regard to the target, and depend on the propertiesof the camera sensor and lens, namely the sensor size and the focal length:

uIAUV = F (x, y, z, θ, φ, Ψ, f , sx) = dAB, (24)

where θ, φ, Ψ are the angles of pitch, roll and yaw respectively and sx is the physical dimension of theCCD sensor in the x axis.

Knowing that,

(xc, yc) =

(f

xw

zw, f

yw

zw

), (25)

where xw, yw, zw are coordinates in the referential associated with the center of the camera lens W ,xc, yc are the coordinates in the referential associated with the CCD of the camera C, f is the focallength of the camera lens. We could say that,

(dAB

)2=

(f

xAw sx

zAw− f

xBwsx

zBw

)2

+

(f

yAw sx

zw A − fyB

wsx

zwB

)2

. (26)

The uIAUV depends on the pose of the AUV, However, if the initialized distance is based on a

wrong marker position extraction, since the system has three particle filters, one per marker, the systemwill automatically reset the wrong initialized particle filter.

Appl. Sci. 2020, 10, 2991 15 of 39

Considering that the movement uncertainty is described by a Gaussian function:

G(x) =1

σ√

2πe−(x−µ)2

2σ2 . (27)

The probability of the particle ptAi is given by:

P(

ptAi

)=

e−(dpti

−d0AB−uI

AUV)2σ2

σ√

2π, (28)

where dpti is the distance of particle pti to the marker of reference (see Figure 9), and σ is the value ofthe standard deviation.

P

d AB0

d AB1

xx xxx x

pt1

Apt2

Apt3

Apt4

Apt5

A

uAUVI

d AB

Figure 9. Illustration of the proposed approach to the geometric constraint in the filter.

When there is no AUV velocity feedback in the particle filter, then:

2.3.3. Considering Constraints in the Color Space

In the filter block (see Figure 7) we also consider constraints in the color space in order to improvethe performance of the filter. The assignment of weights to the particles takes into account the colorinformation that is sampled from the image at the inside of each particle. σ = max

(uI

AUV)

anduI

AUV = 0.In order to consider the color constraints, we developed and implemented the follows procedure:

For each particle pti with position pi ∈ R2 in the image, will be evaluated N points of color pocj ∈ R3

(HSV color space), at positions oj ∈ R2 such as:

pocj(oj)

, pocj ∈ R3, oj ∈ R2, j ∈ {1, · · · , N} :∥∥pi − oj

∥∥ < rd, (29)

where rd is the estimated radius of the corresponding marker.Considering the particle ptA

i referring to the marker A and described by the parameters u, v andrd, where u, v are the localization of the particle in the image, and rd is the estimated radius for themarker A. See the Figure 10

The probability P(

ptAi)

will be the result of nine samples such as:

P(

ptAi)= PA

(i(u,v)

)× PA

(i(u∓a,v)

)× PA

(i(u,v∓a)

×PA(

i(u∓b,v)

)× PA

(i(u,v∓b)

).

(30)

P(

ptAi

)=

9

∏j=1

PA (oj)

(31)

Appl. Sci. 2020, 10, 2991 16 of 39

oj =

[u∓ rd∓ δ

v∓ rd∓ δ

](32)

o0 =

[uv

], (33)

where δ is a constant adjustable parameter that serves to place the sample on the periphery of themarkers edges. PA

(i(u,v)

)is a value according to the color of the pixel i(u,v), positioning at position

u, v of the image I. The PA(

i(u,v)

)= FA (H, S, V), where FA (H, S, V) is a discrete function and with

auto adjustable value.

(u , v)

rd

Figure 10. description of a particle by the u, v and rd parameters.

If the components H, S, V of the i(u,v) are within the pre-defined reference HSV range (See the

Figure 11), then PA(

i(u,v)

)= 1. Otherwise, PA

(i(u,v)

)= γmin. Where, γmin is a parameter of non-zero

value that ensures that in case of an outlier in color, do not lead to P(

ptAi)= 0.

1

1

1

PH

γmin

γmin

γmin

Hmin

V min

S min

Hmax

S max

V max

2550

PS

PV

H

S

V

Figure 11. Reference range of the Hue, Saturation, Value (HSV) components.

2.3.4. Automatic Color Adjustment

In an underwater environment, the color changes along the water column due not only to theluminosity but also to the turbidity of the water. The idea here is to readjust the graph of the Figure 11for the actual context of the system and in this way, obtain a more adjusted definition of the color of themarker in order to reduce the probability of a false detection of the marker being assumed to be true.

We pick a sample of the color components H, S, V at the most representative particles, the particleswith greater weight. With that samples we construct a histogram of occurrences of each componentvalue. Instead of using the reference intervals for each component H, S, V we started to use that

Appl. Sci. 2020, 10, 2991 17 of 39

histogram, and the reference values for the color components becomes that which occurs more often.Considering for example the marker with the red color, the probability function of the red marker isinitialized as:

PredH,S,V =

1 i f Hredmin < H < Hred

max∧ Sred

min < S < Sredmax

∧Vredmin < V < Vred

max0.5 i f S > Ssat ∧ V > Vsat

0 otherwise

(34)

The function is discretized in N values in the space of each component. In this way, an occurrencehistogram, HI, is created for each color component, H, S, V (See the Figure 12). Assuming initially thevalue 0 in each Bin of the histograms

(HIk

H , HIkS, HIk

V

).

H

V

S

...

...

...

HIH

k

HIS

k

HIV

k

Figure 12. Occurrence histograms.

At the first instant in which the target is tracked then, for each marker the N particles of greaterprobability are selected. Using the inner five pixels (the same used for color constraints Section 2.3.3),Pixel

(i(u,v)

), Pixel

(i(u∓a,v)

)and Pixel

(i(u,v∓a)

). The values of the HSV components of that pixels

will be used to increment the histograms(

HIkH , HIk

S, HIkV

)for each marker. That is,

HIkH = HIk−1

H + HIPixel 0 ,Particle 0H + · · ·+ HIPixel 5 ,Particle N

HHIk

S = HIk−1S + HIPixel 0 ,Particle 0

S + · · ·+ HIPixel 5 ,Particle NS

HIkV = HIk−1

V + HIPixel 0 ,Particle 0V + · · ·+ HIPixel 5 ,Particle N

V .(35)

Since each color marker will have its histogram,(

HIkH , HIk

S, HIkV

)Green,(

HIkH , HIk

S, HIkV

)Blueand(

HIkH , HIk

S, HIkV

)Yellow, then, taking the example of the red marker, at the instant T the function Pred

H,S,Vdescribed in Equation (34) becomes:

PredH,S,V =

HIkH (H)

max(

HIkH

) × HIkS (S)

max(

HIkS

) × HIkV (V)

max(

HIkV

) . (36)

The function (36) is the new probability function to be used in the Filter with constraints in thecolor space.

Appl. Sci. 2020, 10, 2991 18 of 39

3. Mvido: Theoretical Characterization of the System Camera-Target

We consider here the image sensor characterized by it physical dimension sx and sy and by itresolution in pixels, sxp and syp (see Figure 13). We consider that the target is characterized by itsdimension, sLx and sLy, and by the radius of each of the three spheres that compose the target, rdsphere(see Figure 13). For the theoretical characterization of the system, we consider the model presented inFigure 14, which relates the horizontal field of view of the camera HFOV, the horizontal sensor size sy

and the working distance WD for a given angle of view AOV.Assuming that, we start by characterizing the minimum distance at which the camera could see

the target.

X

Y

CAMERACCD

sx = 0.0024m

sLy= 0.5m

sy = 0.0032m

sLx= 0.345m

0.100m

0.245 m

0.150 m0.150 m

Figure 13. Image sensor size and Target dimensions in xy.

SCENE

SENSOR

LENS

f

WD

HFOV

AOV

AOV

2

sy

2

s y

Figure 14. Relationship between horizontal field of view, horizontal sensor size and working distancefor a given angle of view.

For that, we consider the rectilinear lens field of view as a function of lens focal length:

HFOV = WD ∗Sy

f(37)

Appl. Sci. 2020, 10, 2991 19 of 39

From the Equation (37) and the model Figure 14, we can arrive at the minimum working distance,WDmin, that will be given by:

WDmin = OPA ∗f

OPB(38)

where,OPA = max

(sLx, sLy

)(39)

andOPB = min

(sx, sy

). (40)

We choose for OPA and for OPB the worst case, i.e., the case that is more limiting. Therefore,OPA = sLy because for our target the measured sLy is greater than the sLx measurement, and OPB = sx

because for CCD camera sensor the measured sx is less than the sy measurement. Whereby, for theused camera and lens and for the used target, the minimum distance between the camera and thetarget that ensure that the entire target is visible in the image is:

WDmin ≈ 0.66m. (41)

We also defined the maximum distance at which each marker is detectable, considering idealconditions of visibility and constant luminosity. For that, let us consider the resolution of the sensor asa function of field of view:

resolution =NpixelsHFOV

(42)

where Npixels is the horizontal number of pixels of the camera sensor. In that way, the workingdistance WD in terms of rdsphere and number of pixels Npixels of the sensor, is given by:

WD =Npixels ∗ sy

rdsphere ∗ f. (43)

From that, we considered that the maximum working distance (the worst case) is given by:

WDmax =OPA1 ∗OPBrdsphere ∗ f

, (44)

whereOPA1 = min

(Npixelsx, Npixelsy

). (45)

The worst scenario is when the maximum working distance is a small value so we choose thesmaller number of pixels, i.e., Npixelsx = 576pixels and the smaller measure of the CCD sensorsx = 0.0024 m. Therefore, for the used camera and lens and for the used target, the maximum workingdistance is:

WDmax ≈ 4.4m. (46)

To define the limit pose of the camera without the target leaving the image projected in the sensor,we considered the worst case in terms of clearance between the target and frame boundaries, that is,we consider the target and the sensor as squares (as illustrated in Figure 15).

Appl. Sci. 2020, 10, 2991 20 of 39

Z

X

Y

CAMERAs =0.003m

sL= 0.5m

TARGET

sL= 0.5m

Rz

Ry

Rx

Figure 15. Camera-target.

Lets consider the Figure 16,

CAMERA TARGET

SL

d HFOV2

θ1 θ2

Figure 16. Maximum angle.

Thus, the rotation of the camera around the z-axis has no relevance, and the analysis will be donefor the rotations around x-axis and y-axis, in which case the analysis is the same.

Taking into account Figure 16 and Figure 14, then

θmax = θ1 − θ2 (47)

θ1 =AOV

2= tg−1

(s

2 f

)(48)

θ2 = tg−1( sL

2WD

)(49)

θmax = θ1 − θ2 (50)

θmax = tg−1(

s2 f

)− tg−1

( sL2WD

)(51)

Given the characteristics of our CCD sensor, our lens and our target, the graph of the followingFigure 17 represents the maximum rotation of the camera in Rx or Ry as a function of the workingdistance, in order not to lose the target.

Appl. Sci. 2020, 10, 2991 21 of 39

WD (m)

Angle_max (degrees)

Figure 17. Maximum rotation of the camera in Rx or Ry.

Sensitivity Analysis

We want to know how the error of the location of a marker in the image is reflected in ourestimate. The algorithm we use to estimate the relative localization, estimates Ze from the followingsimplified equation:

ze =

√f 2(b2(uC−uA)2+b2(vC−vA)2+a2(uA+uC−2uB)2+a2(vC+vA−2vB)2)

(uCvB−uCvA+uAvC+vAuB−uAvB−vCuB)2+4a2 f 4b2 , (52)

where (uA, vA), (uA, vA) and (uA, vA) are the coordinates in the image of the marker A, B and Crespectively. The uncertainty in ze is obtained by taking the partial derivatives of ze with respect toeach variable uA, uB, uC, vA, vB and vC, then, the uncertainty in ze is given by:

∆ze =

√(∂ze∂uA

.∆uA

)2+(

∂ze∂uB

.∆uB

)2+(

∂ze∂uC

.∆uC

)2+(

∂ze∂vA

.∆vA

)2+(

∂ze∂vB

.∆vB

)2+(

∂ze∂vC

.∆vC

)2. (53)

In an ideal case, without noise in the marker’s detection, the uncertainties associated with eachvariable ∆uA, ∆uB, ∆uC, ∆vA, ∆vB and ∆vC are the uncertainty associated with the description, that is 1(one) pixel. From the experimental validation of the detection algorithm, we know that the uncertaintyassociated with each variable ∆uA, ∆uB, ∆uC, ∆vA, ∆vB and ∆vC is about 10 pixels on average. In orderto analyze how the error of the location of a marker in the image is reflected in the estimate of ze weproceed with a linearization around a point. In this sense, the red marker was centered at zero in theimage, and we vary the projection of a in the image. We know that the real measure of a (see Figure 18)when projected onto the image becomes a′. In the image, a′ varies depending on the distance thetarget is from the camera. When the target moves away from the camera, the value of a′ decreases andvice versa.

Appl. Sci. 2020, 10, 2991 22 of 39

b '

a 'a ' u

vIMAGE

(0,0)

b

a a

REAL SCENE

(a',b')

(2a',0)A

B

C

Figure 18. Linearize around a certain point:the red marker was centered at zero.

The graph of the following Figure 19 shows the result obtained, considering for each variable theuncertainty associated with a description and the uncertainties on the markers detection:

a' (µm)

error (m)

Figure 19. Sensitivity analysis.

4. Mvido: Guidance Law

It is a challenge to define guidance laws that conduct the AUV to the dock, while keepingvisual markers in sight. For our point of view, the solution can be based on basic motion primitivesthat are already implemented by the vehicle on-board control system. This section describes theimplementation of a guidance system for docking maneuvers for a hovering AUV. We address theproblem of driving the AUV to a docking station with the help of visual markers detected by thevehicle on-board camera, and show that by adequately choosing the references for the linear degrees offreedom of the AUV it is possible to conduct it to the dock while keeping those markers in the field ofview of the on-board camera. We address the problem of defining a guidance law for a hovering AUVwhen performing a docking maneuver. We used as a case study the docking of AUV MARES, but ourapproach is pretty general and can be applied to any hovering vehicle. During the docking, it is ofutmost importance that the visual markers are within the field of view of the camera. Considering a

Appl. Sci. 2020, 10, 2991 23 of 39

local frame with origin in the center of the markers, with the usual North-East-Down convention,there should be a compromise between the motion on the vertical coordinate z and the motion onthe horizontal plane ρ =

√x2 + y2 (where x and y are Cartesian coordinates) from the center of the

markers to the center of the AUV’s camera. We want to ensure that the ratio zρ gives a slope greater

than the slope of the cone, which is defined by the angle of view of the camera AOV. This means thatwe define the condition:

ρ 6 −z.tan(AOV

2) (54)

As is commonly the case with AUVs, we consider here that its on-board control system is able toindependently control its DOFs. Relevant for this work is the ability to keep a fixed position on thehorizontal plane and of keeping a given depth. Such controllers are part of the Mares on-board controlsystem as described in [1]. For the purpose of defining the guidance law and testing its performance,we model here the closed-loop behavior of the AUV as a first-order system, both in the horizontalplane and in the vertical coordinate [30,31]. Two decoupled models were considered: a model for thebehavior in the horizontal plane ρ and a model for the behavior in the vertical coordinate z. In this way:

Z(s)Zre f (s)

=pz

s + pz(55)

P(s)Pre f (s)

=ph

s + ph, (56)

where P(s) and Z(s) are respectively the Laplace transforms of ρ and of the vertical coordinate z.In order to drive the AUV to the cradle while keeping the markers within the camera field of view,we propose a guidance law that can ensures a much faster convergence of ρ to zero that of z to zero.This law is defined by:

ρre f (t) = 0 (57)

and,zre f (t) = −ρ(t)e(|TsZ−TsP |)ερt, (58)

where (|TsZ − TsP|) is the difference between the settling times of the closed loop models (vertical andhorizontal) for a unit step input, and ερ is the actual error to the reference of ρ.

For this assumptions it is possible to conclude that

limt→∞

ρ(t)z(t)

= 0. (59)

Observation: in our law, we force a restriction on the evolution generated for the reference zre f (t)by a condition: reference values below to z0 are not considered, keeping the actual z.

5. Experimental Results: Pose Estimator and Tracking System

5.1. Experimental Setup

5.1.1. Pose Estimator

For the experimental validation of the relative pose estimates, the AUV MARES camera wasplaced at a fixed position and then using tube rails, the target were moved along the longitudinaland lateral axes to specified known positions, as illustrated in Figures 20 and 21. At each position,the markers platform was rotated around the longitudinal, lateral and vertical axis by different knownangles. All that experimental procedure was done under a specific lighting condition. This procedurewas done with the hardware in the loop, and 109 videos captures were made with a duration of

Appl. Sci. 2020, 10, 2991 24 of 39

1 min each, taking into account the on-board storage capacity of the hardware. The resulting logs arecompared against the ground truth given by this process.

Figure 20. Pose estimator: experimental validation.

Rx

CAMERA

Z

yi=ymax

3∗i ymax

0.100m

0.245 m

0.150 m0.150 m

ZX

Y

Y

Ry

Rz

Figure 21. Experimental procedure for a characterization of the developed system.

5.1.2. Tracking System

To test the developed filter, the target was placed in the bottom of a tank, and the AUV Maresstarted a dive from the surface, four meters from the target, towards the target. The AUV Mares stopswhen arrives at close proximity of the target (1 m from the target). A video from the AUV onboardcamera was captured during the test. That video was used to test the filter. During the video anoutlier and temporary occlusions of the marks were created as detailed in the Figure 22. Contrary tothe experimental validation of the pose estimator, where ground truth was made, the experimentalvalidation of the filter was essentially qualitative, since no ground truth was done.

Appl. Sci. 2020, 10, 2991 25 of 39

Figure 22. Testing the filter: in this figure, we can observe three images: the left one is an image of thetarget placed on the bottom of the tank, the tracking algorithm is running with a particle filter per eachmarker; the middle one is an image of the target placed on the bottom of the tank with an outlier (redpoint) located on the left hand; the image on the right is an image of the target placed on the bottom ofthe tank with an occlusion of the green marker

5.2. Results

5.2.1. Pose Estimator

The Figure 23, give us the notion about the system precision based on error to ground truth inworking distance estimations and the standard deviation analysis. As expected, the error increaseswith the working distance reaching the maximum error for the WD of 2.5 m. The worst case is when thetarget was positioning at the maximum working distance and rotated 60◦ in Rx, in this case the meanerror was about eight centimetres. For the perception of the error distribution, the Figures 24–26 showthe histogram of the error in the function of three working distances: The minimum working distanceWD = 0.5 m; the maximum working distance to be operated WD = 2.5 m, and an intermediateworking distance WD = 1.3 m. For each histogram is estimated (using R) the Gaussian curve that isclosest to the obtained data. In order to analyze the impact in Ze estimation when the target is withinthe limit of the field of view of the camera, we present the histogram of the error in Ze (Figure 27)when the target is placed at a critical position, i.e., the maximium working distance to operate andnear the border of the field of view of the camera Once we have developed an algorithm for attitudeestimation, we present here the results in terms of error to ground truth when the target was rotatedaround the three axes. The Figures 28–30 show the histograms that summarizes the obtained results.Here we present three cases: Rx = 60◦, Ry = 60◦ and Rz = 90◦.

WD (m)

Ze e

rror

(m)

0

0.02

0.04

0.06

0.08

0.5 1 1.5 2 2.5

Rx=60º Rx=0º

Ze error and STD in function of WD

0.005

0.02 0.01

0.030.02

0.015

0.02

0.04

0.04

0.03

0.05

0.03

Figure 23. Error and standard deviation in function of the working distance when the target wasrotated of Rx = 60◦ or not rotated Rx = 0◦. The points in the chart are the mean values of the error andeach point is labelled with the respective value of standard deviation.

Appl. Sci. 2020, 10, 2991 26 of 39

Histogram of Ze error for WD=0.5m mean=0.007m standard deviation=0.004m

Ze error (m)

Frequency

0.000 0.005 0.010 0.015 0.020 0.025

0100

200

300

400

500

Figure 24. Histogram of Ze error for WD = 0.5 m. The curve in blue is the curve estimated by R as theGaussian curve that is closest to the obtained data.

Histogram of Ze error for WD=1.3m mean=0.015m standard deviation=0.017m

Ze error (m)

Frequency

0.00 0.02 0.04 0.06 0.08 0.10 0.12

0100

200

300

400

500

600

Figure 25. Histogram of Ze error for WD = 1.3 m. The curve in blue is the curve estimated by R as theGaussian curve that is closest to the obtained data.

Histogram of Ze error for WD=2.5m mean=0.06m standard deviation=0.03m

Ze error (m)

Frequency

0.00 0.05 0.10 0.15 0.20

0100

200

300

400

500

Figure 26. Histogram of Ze error for WD = 2.5 m. The curve in blue is the curve estimated by R as theGaussian curve that is closest to the obtained data.

Appl. Sci. 2020, 10, 2991 27 of 39

Histogram of Ze error for WD=2.5m and Y=1.43m mean=0.04m standard deviation=0.031m

Ze error (m)

Frequency

0.00 0.02 0.04 0.06 0.08 0.10

050

100

150

200

250

Figure 27. Histogram of Ze error when the target is placed within the limit of the field of view of thecamera. The curve in blue is the curve estimated by R as the Gaussian curve that is closest to theobtained data.

Histogram of Rx error for 60º mean=3.62º standard deviation=1.86º

Rx error (º)

Frequency

0 1 2 3 4 5 6

0200

400

600

800

Figure 28. Histogram of Rx for 60◦ of rotation. The curve in blue is the curve estimated by R as theGaussian curve that is closest to the obtained data.

Histogram of Ry error for 60º mean=3.71º standard deviation=0.73º

Ry error (º)

Frequency

0 2 4 6 8 10

0200

400

600

800

Figure 29. Histogram of Ry for 60◦ of rotation. The curve in blue is the curve estimated by R as theGaussian curve that is closest to the obtained data.

Appl. Sci. 2020, 10, 2991 28 of 39

Histogram of Rz error for 90º mean=0.7º standard deviation=0.49º

Rz error (º)

Frequency

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

0200

400

600

800

Figure 30. Histogram of Rz for 90◦ of rotation. The curve in blue is the curve estimated by R as theGaussian curve that is closest to the obtained data.

5.2.2. Tracking System

For the parametrization of the filters, the choice of the number of particles was related to thecomputational resources, namely with the processing time of each frame. It was necessary to establisha compromise between the adjustment of the number of particles and the processing time. For theparticle number adjustment, we chose to analyze the effect of an increase in the number of particles(per marker) at moments in which the AUV was in a hovering situation after dynamic moments.Each time we adjust the number of particles, the threads time of the algorithm was measured as shownin the Figure 31. This time will give us an idea of the rate at which we provide the estimation data tothe control.

In addition to the number of particles, it was also analyzed the parametrization of the uncertaintiesof update (observations) and prediction (a static model) of the filter. For the update uncertainties,the values were adjusted according to the characterization of the pose estimator. For the adjustment ofthe prediction uncertainties was considered the typical velocity of the AUV Mares, the maximal divingvelocity of the AUV Mares and the characteristics of the camera.

With the parameterized filter, we compared the results by using the filter with the results withoutthe use of the filter. Figure 32 represents the descent of the AUV towards the target. In the graph ofFigure 32 it is possible to see the performance of the estimations algorithm with and without the use ofthe filter. To generate the graph of the Figure 32, 360 frames were sampled. The outlier occurs from theframe 44 to frame 108. The temporary occlusions occur from the frame 124 to frame 284. There wereseveral partial occlusions, sometimes partial occlusion of a marker, sometimes partial occlusion oftwo markers.

NUMBER OF PARTICLES THREADS TIME (SEC.)

100 0.008

200 0.011

1000 0.0133

Figure 31. Threads time: in the parametrization of the filter, the adjustment of the number of particlesis made taking into account the processing time of each frame. It is intended to observe the target ata minimum rate of 10 frames per second (less than 100 ms). This is due to the fact that the controloperates at 20 Hz and it is not desired that there is more than one estimation between two consecutiveframes. That is, it is desired that the algorithm has a processing time of less than 10 ms. The processingtime of the developed filter for 1000 particles occupies 13 ms of the total processing time. In addition tothe detector/estimator processing time (83 ms) it is found that the goal of a total processing time ofless than 100 ms has been achieved. The equipment used was the developed vision module (CameraBowtech + Raspberry Pi) whose characteristics are described at the beginning of this chapter.

Appl. Sci. 2020, 10, 2991 29 of 39

Z(m)

Frames

Figure 32. Descent of the Autonomous Underwater Vehicle (AUV) MARES towards the target:estimation algorithm with and without a filter. The black curve is the estimation without filter and thered curve is the estimation with filter.

The Figure 33 is a detail of the graph of the Figure 32, from the frame 44 to frame 108, and allowsus to observe how the algorithm behaves in the outlier zone. In the graph of the Figure 33 the resultantcurve of the pose estimator is compared with the resulting curve of the estimator with the use of thefilter, in a situation of outliers detection.

The Figure 34 is a detail of the graph of the Figure 32, from the frame 124 to frame 284, and allowsus to observe how the algorithm behaves in the partial occlusions zone.

As can be seen in Figure 34, the use of the filter allows the continuity in obtaining estimates duringsituations of temporary and partial occlusions of the markers. However in areas where occlusions occurmore frequently, the performance can be improved, and for this reason the constraints of the target geometrywere added to the filter. The graph of the Figure 35 is a detail of the graph of the Figure 32, from the frame124 to frame 284, where we can observe how the filter behaves in the temporary and partial occlusions zonewhen the geometric constraints were added to the filter. In the graph of Figure 35 it is possible to comparethe estimates in the situation in which the occlusions occur more frequently.

Z(m)

Frames

Figure 33. Outlier zone: this graph is a detail of the graph of the Figure 32, from the frame 44 to frame108, here we can compare the estimation in the outlier zone with and without the use of the filter.The black curve is the estimation without filter and the red curve is the estimation with filter. The frame44 was renumbered with the number 1.

Appl. Sci. 2020, 10, 2991 30 of 39

Z(m)

Frames

Figure 34. Occlusion zone: this graph is a detail of the graph of the Figure 32, from the frame 124 toframe 284, here we can compare the estimation in the temporary occlusions zone with and without theuse of the filter. The black curve is the estimation without filter and the red curve is the estimation withfilter. The frame 124 was renumbered with the number 1.

On the other hand, in the case of outliers and false detections, the graph of Figure 36 is a detailof the graph of the Figure 32, from the frame 44 to frame 108, and represents the results obtained inthe outlier zone. it is possible to compare the estimates without the use of the filter with the estimatesmade using the filter with and without color restriction.

Z(m)

Frames

Figure 35. Occlusions zone: this graph is a detail of the graph of the Figure 32, from the frame 124 toframe 284, here we can compare the estimation in the temporary occlusions zone with and without theuse of the filter and adding the geometric constraints to the filter. The black curve is the estimationwithout filter, the red curve is the estimation with filter and the green curve is the estimation using thefilter with geometric constraints of the target. The frame 124 was renumbered with the number 1.

Appl. Sci. 2020, 10, 2991 31 of 39

Z(m)

Frames

Figure 36. Outlier zone: this graph is a detail of the graph of the Figure 32, from the frame 44 to frame108, here we can compare the estimation in the outlier zone with and without the use of the filter andadding the color constraints to the filter. The black curve is the estimation without filter, the red curveis the estimation with filter and the green curve is the estimation using the filter with color constraintsof the target. The frame 44 was renumbered with the number 1.

6. Simulation Results: Guidance Law

The performance of the proposed law was tested in a simulation environment—simulink/Matlab.We consider here a first-order model for the closed-loop behavior of the AUV in the horizontal plane ρ

and a first-order model for the closed-loop behavior of the AUV in the vertical coordinate z. Once wepropose a guidance law that can ensure a much faster convergence of ρ to zero that of z to zero, the bestway to test this law was to consider the horizontal behavior slower than the vertical one. That meansthat for a first simulation, the relation between the pole of the horizontal model P(s)

Pre f (s), and the pole of

the vertical model Z(s)Zre f (s)

, is:

pz > ph. (60)

The Figure 37 presents the trajectories in the (ρ, z) plane considering three different situations:pz = 2ph; pz = 4ph; pz = 8ph.

If instead we consider that the behaviour in the horizontal plane is faster than the vertical one,pz < ph, the resultant trajectories are those shown in the Figure 38 considering three different situations:pz =

ph2 ; pz =

ph4 ; pz =

ph8 .

Going back to the case where pz > px, if we considered that the initial rho0 is farther from thereference rhore f = 0 than the initial z0 from reference zre f = 0, the resultant trajectory obtained is thatshown in Figure 39.

Considering now a realistic hypothesis that motion in the horizontal plane presents behaviouraldifferences between x axis and y axis. In this case, instead of assuming a single model for horizontalbehaviour, P(s)

Pre f (s), different behaviours were assumed for x and y, and for that we used two first order

models for the closed loop horizontal behaviour of the AUV, i.e.,

X(s)Xre f (s)

=px

s + px(61)

Y(s)Yre f (s)

=py

s + py. (62)

Appl. Sci. 2020, 10, 2991 32 of 39

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1rho

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

Z

pz = 2 phpz = 4 phpz = 8 ph

Figure 37. Trajectories in the (ρ, z) plane for starting points (ρ0, z0) = (1,−1) and considering pz = 2ph;pz = 4ph; pz = 8ph. Even at the situation at which the horizontal behaviour is the slowest one there isa faster convergence from ρ0 to zero than from z0 to zero as intended.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1rho

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

Z

pz = ph/2pz = ph/4pz = ph/8

Figure 38. Trajectories in the (ρ, z) plane for starting points (ρ0, z0) = (1,−1) and considering pz =ph2 ;

pz =ph4 ; pz =

ph8 . Even at the situation at which the vertical behaviour is the slowest one, the motion

remains tangent to the vertical axis.

0 1 2 3 4 5 6 7 8rho

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

Z

ph < pz

Figure 39. Trajectories in the (ρ, z) plane for starting points (ρ0, z0) = (7.1,−1) and pz > ph.The trajectory illustrate the motion remains tangent to the vertical axis.

Appl. Sci. 2020, 10, 2991 33 of 39

In the horizontal plane, the result of assuming a slower behaviour in y coordinate than for xcoordinate, i.e., px > py, is illustrated in Figure 40.

-0.2 0 0.2 0.4 0.6 0.8 1X

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Y

Figure 40. Trajectories x, y assuming a slower behaviour in y coordinate than for x coordinate, px > py.The trajectories were generated considering that the poles start to be located at the same point inrelation to the origin px = py and then px moves away from the origin in percentages of 10 percentmaking the behaviour in coordinate x faster than the behaviour in y coordinate.

Applying the guidance law that we propose, in 3D space, and assuming now that there is adifference in behavior between the horizontal plane axes, where px > py, we obtain the trajectoriespresented in Figure 41. Note that in Figure 41 we considered that the behavior in the vertical coordinatez is faster than the behavior in the horizontal plane.

1-1 0.81

-0.8

-0.6

0.60.8

Z

X

-0.4

-0.2

0.6

Y

0.4

0

0.4 0.20.200

px > py and Tsz < Tsh

Figure 41. Trajectories in the (x, y, z) plane considering px > py and that the the behaviour in thevertical coordinate z is faster than the behaviour in the horizontal plane. There is convergence for thereference (0, 0, 0) on all axes.

Analyzing the Figures 40 and 41, when considering a slower behavior on the y coordinate,the vehicle behaves as if it is under the action of a disturbance in the direction of that coordinate.Even in the situation where a much slower dynamic is considered for the y-coordinate with respect tox, the vehicle remains within the visibility cone in the xy plane.

Appl. Sci. 2020, 10, 2991 34 of 39

7. Discussion

7.1. Pose Estimator

A target was constructed whose physical characteristics maximize its observability. The three colormarkers are spherical in order to be observable from different perspectives. We chose to use differentcolors for each of the markers in order to avoid ambiguities in the relative pose estimation. The solutionis based on passive visual markers, occasionally illuminated outside only in certain circumstances ofvisibility. The choice of dimensions for the target was related to three factors: the maximum operatingdistance WD, the field of view of the lens and the constructive characteristics of the dock. For therelative pose estimation, it was intended to explore as much as possible a minimum system, based onlyon visual information and with the minimum calculations processing time. The results obtainedwere more than satisfactory considering that, in the worst case, the accuracy for position estimationwas around 7 cm for a working distance of 2.5 m, with the target rotated of 60◦ around the x axis.That working distance is higher than the intended maximum working distance of 2 m. The maximumaverage error in target rotation was less than four degrees for a critical situation where the target wasrotated 60 degrees on the y axis. From the experimental characterization of the system, we can alsoconclude that there is an increase of the error as the target moves away from the camera, that is, as theworking distance WD increases. This increase in the value of the error is due to four reasons:

• a misalignment of the camera related to the rail: the nonexistence of a mechanical solution thatguarantees the correct alignment between the center of the camera to the tube rail. This misalignmenthas influence especially on the Y axis, which means a yaw rotation of the camera related to the rail,

• the increasing offset along the z-axis is related to the non-calibration of the value of the focal lengthof the cameras lens,

• small variations in markers illumination that affect the accuracy in detecting the center of mass ofeach marker in the image,

• the more the target is away from the camera, any small variation in the detection of the blobs inthe image will imply a greater error in the pose estimation. The pinhole model shows that as thetarget moves away (any small variation on the sensor side (IMAGE) implies a greater sensitivityand a greater error on the SCENE side).

Despite these systematic observed errors that can be easily corrected, the results obtained for themarkers detector and pose estimator are quite satisfactory, however, the pose estimator algorithmhas an ambiguity: we know the amplitude of Rx and Ry estimated angles but we don’t know thesignal. That is, when the camera is positioned in relation to the target as shown in Figure 21, it existsan ambiguity in the signal for the Rx and Ry rotations, however, in the context of our work, this isan ambiguous situation that does not happen considering the way the sea-floor docking station isapproached (see Figures 1 and 3) by a hovering AUV. This situation can be resolved by adding onemore marker to the target. In this case, this fourth marker should be non-coplanar with the otherthree. In terms of robustness of relative pose estimates, we recognize that there would be an advantagein fusing the monocular vision information with inertial rate sensor measurements to generate anestimate of the relative position between a moving observer and a stationary object. This fusion wouldcreate redundancy and therefore increase the robustness of the estimates. That will be a future lineof work.

7.2. Resilience to Occlusions and Outliers Rejection

We have developed an approach for visual tracking in a particle filtering framework in order toimprove the performance of image markers detection, namely to make the algorithm less sensitive tooutliers and to allow estimates even in occlusion situations. In order to improve the computationalweight of the relative localization system, the image rectification step was not been considered.This option has increased the level of complexity of the noise model for the maker’s detectors.

Appl. Sci. 2020, 10, 2991 35 of 39

However the use of a particle filter-based approached with an adaptive color tracker has provedto be a good option and with high levels of accuracy on the positioning estimation, as can be seen inthe Figures 34–36. In detail we can conclude from the results, in situations of partial occlusion of themarkers, the longer duration of the occlusion implies a dispersion of the particles, so the filter wouldeventually diverge. The inclusion of the geometric constraints in the filter solved this question andgood results were obtained. On the other hand, our color-based markers detector, uses an oversizedamplitude so that the detection can occur in a greater number of scenarios. This extended amplitudeimplies the occurrence of outliers detection. For that reason, and to avoid these outliers we add a colorhistogram in the filter that is adjusted along the tracking. This allows, in the presence of features witha similar color to our markers, to avoid a greater number of outliers and the weight of a particle of thefilter has a more adjusted value which results in a more accurate estimation. To increase the cone ofvisibility of a single camera, it should be explored in future work the use of a pan-tilt based solution.This will imply the use of active perception systems for example when a UAV realizes Pitch roll thiscan be compensated by the pith tilt system.

7.3. Guidance Law

From the deep system characterization, we know the zone that guarantees the best observationof the target. Knowing that, our strategy has been to look for a way to make the target remain atthis central zone of the pyramid (from the point of view of the camera) when the AUV approachesthe docking station. We have proposed a strategy to lead the AUV from any point on the pyramidobservation zone to the point where the cradle of the docking station is (in the vicinity of the apex ofthe pyramid), always keeping the target in line of sight. We show that by adequately choosing thereferences for the linear degrees of freedom (DOF) of the AUV it is possible to dock while keepingthe target markers in the field of view of the on-board camera. The performance of the proposed lawwas tested on simulink/Matlab. A first order model for the closed-loop behavior of the AUV in thehorizontal plane ρ and a first-order model for the closed-loop behavior of the AUV in the verticalcoordinate z were considered for the first tests. Once we propose a guidance law that can ensures amuch faster convergence of ρ to zero that of z to zero, the best way to test this law was to considerthe horizontal behavior slower than the vertical one. From the mathematical point of view and fromthe simulated tests we can ensure that the AUV motion behavior in the vertical coordinate z is alwaysfaster than the behavior in the horizontal plane, ensuring always visibility of the markers and the AUVconvergence to the docking point. In future work, we will extend the work presented here to moregeneral closed-loop dynamics and acquire data from real experiments with the MARES AUV.

8. Conclusions

In our conceptualization and development of a system to aid autonomous docking of a hoveringAUV, It was intended to realize how far we can get by using only a single camera and a dedicatedprocessing unit small enough to be accommodated onboard the MARES. In this sense the resultsshowed that this minimal system is a very capable system. We chose to explore the maximumcapabilities of data provided by the image sensor in an environment where visibility is very low.The solutions found to work around this problem yielded very interesting results.

Since no target was found in the literature that would meet our needs, namely that it met thefollowing requirements:

• versatility (active in certain situations and passive in others)• be easily identifiable even in low visibility situations• allow to be seen from different points of view.

A 3D target was designed and built that meets the proposed requirements. The 3D hybrid targetmaximizes the observability of the markers from the point of view of the AUV camera and proved tobe a successful solution.

Appl. Sci. 2020, 10, 2991 36 of 39

Since an algorithm for estimating the relative attitude that fit our needs was not found in theliterature, namely, a low computational cost algorithm, an algorithm was developed from the groundup. Together with the marker detection algorithm, the developed algorithm occupies 83 ms ofprocessing time, which satisfies our requirements.

Not having been found in the literature a solution for situations of partial occlusion of the targetand the presence of outliers (which affect the performance of the pose estimator), an innovativesolution was developed that presented very satisfactory results. One of the challenges of this workwas the persistence of tracking the target even in the case of partial or momentary occlusions of themarkers, this challenge was met with good results as a consequence of the implementation of thefilters that were designed. The particle filter-based solution also allowed a significant reduction infalse detections, which improved pose estimator performance.

Regarding the need for a guidance law for docking the AUV, none of the works found inthe literature takes into account our concern to never lose sight of the target until the AUV isfully anchored in the cradle. For that reason, it was necessary to find a solution to this problem.With the characterization of the system, it was possible to gain knowledge of the system as a whole,which allowed us to advance in choosing an approach to the problem of how to ensure that duringdocking maneuvers the vehicle is guided to the dock station without losing sight of the target markers.For this purpose, a guidance law was defined. Such law can ensure a much faster convergence in thehorizontal mode to zero that of vertical to zero as intended. Our approach is pretty general and can beapplied to any hovering vehicle.

Author Contributions: Conceptualization, A.B.F. and A.C.M.; methodology, A.B.F. and A.C.M.; software, A.B.F.;validation, A.C.M.; formal analysis, A.B.F. and A.C.M.; investigation, A.B.F.; resources, A.C.M.; data curation,A.B.F.; writing–original draft preparation, A.B.F.; writing–review and editing, A.B.F. and A.C.M.; fundingacquisition, A.C.M. All authors have read and agreed to the published version of the manuscript.

Funding: A. Bianchi Figueiredo acknowledges the support of the Portuguese Foundation for Science andTechnology (FCT) through grant SFRH/BD/81724/2011, supported by POPH/ESF funding. This work is financedby National Funds through the Portuguese funding agency, FCT—Fundação para a Ciência e a Tecnologia,within project UIDB/50014/2020. The authors also acknowledged the support of the INTENDU project (ref.MARTERA/0001/2017) funded by FCT within the scope of MarTERA ERA-NET (EU grant agreement 728053) forproviding technical conditions to perform the work.

Conflicts of Interest: The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUV Autonomous Underwater VehicleASV Autonomous Surface VehicleMARES Modular Autonomous Robot for Environment SamplingMVO Monocular Visual OdometryMViDO Monocular Vision-based Docking Operation aidDOF degrees of freedomEKF extended Kalman filterGPS global positioning systemIMU inertial measurement unitFOV lens field of viewHFOV horizontal field of viewAOV angle of viewNpixels number of pixels

Appl. Sci. 2020, 10, 2991 37 of 39

References

1. Cruz, N.A.; Matos, A.C. The MARES AUV, a Modular Autonomous Robot for Environment Sampling.In Proceedings of the OCEANS 2008, Washington, DC, USA, 15–18 September 2008; pp. 1–6.

2. Figueiredo, A.; Ferreira, B.; Matos, A. Tracking of an underwater visual target with an autonomous surfacevehicle. In Proceedings of the Oceans 2014, St. John’s, NL, Canada, 14–19 September 2014; pp. 1–5.[CrossRef]

3. Figueiredo, A.; Ferreira, B.; Matos, A. Vision-based Localization and Positioning of an AUV. In Proceedingsof the Oceans 2016, Shangai, China, 10–13 April 2016.

4. Vallicrosa, G.; Bosch, J.; Palomeras, N.; Ridao, P.; Carreras, M.; Gracias, N. Autonomous homing and dockingfor AUVs using Range-Only Localization and Light Beacons. IFAC-PapersOnLine 2016, 49, 54–60. [CrossRef]

5. Maki, T.; Shiroku, R.; Sato, Y.; Matsuda, T.; Sakamaki, T.; Ura, T. Docking method for hovering type AUVsby acoustic and visual positioning. In Proceedings of the 2013 IEEE International Underwater TechnologySymposium (UT), Tokyo, Japan, 5–8 March 2013; pp. 1–6. [CrossRef]

6. Park, J.Y.; huan Jun, B.; mook Lee, P.; Oh, J. Experiments on vision guided docking of an autonomousunderwater vehicle using one camera. Ocean Eng. 2009, 36, 48–61. [CrossRef]

7. Maire, F.D.; Prasser, D.; Dunbabin, M.; Dawson, M. A Vision Based Target Detection System for Dockingof an Autonomous Underwater Vehicle. In Proceedings of the 2009 Australasian Conference on Robotics andAutomation (ACRA 2009), Sydney, Australia, 2–4 December 2009; University of Sydney: Sydney, Australia, 2009.

8. Ghosh, S.; Ray, R.; Vadali, S.R.K.; Shome, S.N.; Nandy, S. Reliable pose estimation of underwater dock usingsingle camera: A scene invariant approach. Mach. Vis. Appl. 2016, 27, 221–236. [CrossRef]

9. Gracias, N.; Bosch, J.; Karim, M.E. Pose Estimation for Underwater Vehicles using Light Beacons.IFAC-PapersOnLine 2015, 48, 70–75. [CrossRef]

10. Palomeras, N.; Peñalver, A.; Massot-Campos, M.; Negre, P.L.; Fernández, J.J.; Ridao, P.; Sanz, P.J.;Oliver-Codina, G. I-AUV Docking and Panel Intervention at Sea. Sensors 2016, 16. [CrossRef] [PubMed]

11. Lwin, K.N.; Mukada, N.; Myint, M.; Yamada, D.; Minami, M.; Matsuno, T.; Saitou, K.; Godou, W. Docking atpool and sea by using active marker in turbid and day/night environment. Artif. Life Robot. 2018. [CrossRef]

12. Argyros, A.A.; Bekris, K.E.; Orphanoudakis, S.C.; Kavraki, L.E. Robot Homing by Exploiting PanoramicVision. Auton. Robot. 2005, 19, 7–25. [CrossRef]

13. Negre, A.; Pradalier, C.; Dunbabin, M. Robust vision-based underwater homing using self-similar landmarks.J. Field Robot. 2008, 25, 360–377. [CrossRef]

14. Feezor, M.D.; Sorrell, F.Y.; Blankinship, P.R.; Bellingham, J.G. Autonomous underwater vehiclehoming/docking via electromagnetic guidance. IEEE J. Ocean. Eng. 2001, 26, 515–521. [CrossRef]

15. Singh, H.; Bellingham, J.G.; Hover, F.; Lerner, S.; Moran, B.A.; von der Heydt, K.; Yoerger, D. Docking for anautonomous ocean sampling network. IEEE J. Ocean. Eng. 2001, 26, 498–514. [CrossRef]

16. Bezruchko, F.; Burdinsky, I.; Myagotin, A. Global extremum searching algorithm for the AUV guidancetoward an acoustic buoy. In Proceedings of the OCEANS’11-Oceans of Energy for a Sustainable Future,Santander, Spain, 6–9 June 2011.

17. Jantapremjit, P.; Wilson, P.A. Guidance-control based path following for homing and docking using anAutonomous Underwater Vehicle. In Proceedings of the Oceans’08, Kobe, Japan, 8–11 April 2008.

18. Wirtz, M.; Hildebrandt, M.; Gaudig, C. Design and test of a robust docking system for hovering AUVs.In Proceedings of the 2012 Oceans, Hampton Roads, VA, USA, 14–19 October 2012; pp. 1–6. [CrossRef]

19. Scaramuzza, D.; Fraundorfer, F. Tutorial: Visual odometry. IEEE Robot. Autom. Mag. 2011, 18, 80–92.doi:10.1109/MRA.2011.943233. [CrossRef]

20. Jaegle, A.; Phillips, S.; Daniilidis, K. Fast, robust, continuous monocular egomotion computation.In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm,Sweden, 16–21 May 2016; pp. 773–780.

21. Aguilar-González, A.; Arias-Estrada, M.; Berry, F.; de Jesús Osuna-Coutiño, J. The fastest visual ego-motionalgorithm in the west. Microprocess. Microsyst. 2019, 67, 103–116. [CrossRef]

22. Li, S.; Xu, C.; Xie, M. A Robust O(n) Solution to the Perspective-n-Point Problem. IEEE Trans. Pattern Anal.Mach. Intell. 2012, 34. [CrossRef] [PubMed]

23. Philip, N.; Ananthasayanam, M. Relative position and attitude estimation and control schemes for the finalphase of an autonomous docking mission of spacecraft. Acta Astronaut. 2003, 52, 511–522. [CrossRef]

Appl. Sci. 2020, 10, 2991 38 of 39

24. Thrun, S.; Burgard, W.; Fox, D. Probabilistic Robotics (Intelligent Robotics and Autonomous Agents); The MITPress: Cambridge, MA, USA, 2005.

25. Arulampalam, S.; Maskell, S.; Gordon, N.; Clapp, T. A Tutorial on Particle Filters for On-lineNon-linear/Non-Gaussian Bayesian Tracking. IEEE Trans. Signal Process. 2001, 50, 174–188. [CrossRef]

26. Arnaud Doucet, N.d.F.N.G. (Ed.) Sequential Monte Carlo Methods in Practice; Springer: Berlin, Germany, 2001.

Appl. Sci. 2020, 10, 2991 39 of 39

27. Li, B.; Xu, Y.; Liu, C.; Fan, S.; Xu, W. Terminal navigation and control for docking an underactuatedautonomous underwater vehicle. In Proceedings of the 2015 IEEE International Conference on CyberTechnology in Automation, Control, and Intelligent Systems (CYBER), Shenyang, China, 8–12 June 2015;pp. 25–30. [CrossRef]

28. Park, J.Y.; Jun, B.H.; Lee, P.M.; Oh, J.H.; Lim, Y.K. Underwater docking approach of an under-actuated AUVin the presence of constant ocean current. IFAC Proc. Vol. 2010, 43, 5–10. [CrossRef]

29. Caron, F.; Davy, M.; Duflos, E.; Vanheeghe, P. Particle Filtering for Multisensor Data Fusion With SwitchingObservation Models: Application to Land Vehicle Positioning. IEEE Trans. Signal Process. 2007, 55, 2703–2719.[CrossRef]

30. Andersson, M. Automatic Tuning of Motion Control System for an Autonomous Underwater Vehicle.Master’s Thesis, Linköping University, Linköping, Sweden, 2019.

31. Yang, R. Modeling and Robust Control Approach for Autonomous Underwater Vehicles. Ph.D. Theses,Université de Bretagne occidentale-Brest, Brest, France; Ocean University of China, Qingdao, China, 2016.

c© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open accessarticle distributed under the terms and conditions of the Creative Commons Attribution(CC BY) license (http://creativecommons.org/licenses/by/4.0/).