Multisensor data fusion for autonomous vehicle navigation in risky environments

22
MULTISENSORDATA FUSION FOR AUTONOMOUS VEHICLE NAVIGATION IN RISKY ENVIRONMENTS G.L. Foresti, Senior Member, IEEE, and Carlo S. Regazzoni # , Senior Member, IEEE Department of Mathematics and Computer Science (DIMI) University of Udine, Via delle Scienze 208, 33100 Udine, Italy # Department of Biophysical and Electronic Engineering (DIBE) University of Genoa, Via all'Opera Pia 11A, I-16145 Genova, Italy Abstract This paper describes a multisensor data-fusion system for driving an autonomous earthwork vehicle operating in a sanitary landfill 1 . The system acquires data from a set of ultrasonic sensors, a laser range finder and several CCD cameras, and produces as output alarms that indicate potential dangerous situations, e.g., the presence of fixed or mobile obstacles in the vehicle working area. The proposed system adds to the vehicle important functionalities such as to avoid terrain holes or down slopes, or to discriminate between heaps of waste to be compacted and other man-made obstacles. Data fusion allows one to increase the system’s reliability and to compensate for the inaccuracies and limited operating ranges of individual sensors. Experimental results show the system’s functioning both under normal operational conditions and in the presence of dangerous situations. Moreover, the performances of the system in bad environmental situations (e.g., rain, low lighting) have been evaluated. Index Terms Autonomous vehicle driving, multisensor data fusion, feature extraction, outdoor environments, ultrasonic sensors, laser range finder. I. INTRODUCTION RISK detection is a basic task for several computer-based systems such as autonomous vehicles [1-4] and surveillance systems [5,6]. The possibility of a robot to interact with and operate in unstructured environments without the complete control of the human operator is strictly dependent on its capability to detect risky situations (e.g., fixed or mobile obstacles, holes or terrain slopes, etc.). A commonly accepted observation is that such complex tasks cannot be solved by using a single sensor, and that a synergetic use of multiple sensors is mandatory. The decreasing costs and the increasing performances of the sensors available today have made the aforesaid tasks feasible. Moreover, real-time processing and fusion of multisensor information are becoming possible thanks to the considerable progress made in the areas of distributed architectures and processing systems. The major advantage that a robot or an autonomous system can obtain by integrating information from multiple sensors is that such information is more accurate in terms of features that may be impossible to acquire by using just individual sensors. The use of multiple sensors 1 This research was partially founded by the CEE-ESPRIT project 6068 ATHENA (Advanced Teleoperation for eartHwork Equipment Navigation).

Transcript of Multisensor data fusion for autonomous vehicle navigation in risky environments

MULTISENSORDATA FUSION FOR

AUTONOMOUS VEHICLE NAVIGATION IN RISKY ENVIRONMENTS

G.L. Foresti, Senior Member, IEEE, and Carlo S. Regazzoni#, Senior Member, IEEE

Department of Mathematics and Computer Science (DIMI)

University of Udine, Via delle Scienze 208, 33100 Udine, Italy

#Department of Biophysical and Electronic Engineering (DIBE)

University of Genoa, Via all'Opera Pia 11A, I-16145 Genova, Italy

Abstract This paper describes a multisensor data-fusion system for driving an autonomous

earthwork vehicle operating in a sanitary landfill1. The system acquires data from a set of

ultrasonic sensors, a laser range finder and several CCD cameras, and produces as output alarms

that indicate potential dangerous situations, e.g., the presence of fixed or mobile obstacles in the

vehicle working area. The proposed system adds to the vehicle important functionalities such as to

avoid terrain holes or down slopes, or to discriminate between heaps of waste to be compacted and

other man-made obstacles. Data fusion allows one to increase the system’s reliability and to

compensate for the inaccuracies and limited operating ranges of individual sensors. Experimental

results show the system’s functioning both under normal operational conditions and in the

presence of dangerous situations. Moreover, the performances of the system in bad environmental

situations (e.g., rain, low lighting) have been evaluated.

Index Terms Autonomous vehicle driving, multisensor data fusion, feature extraction, outdoor

environments, ultrasonic sensors, laser range finder.

I. INTRODUCTION

RISK detection is a basic task for several computer-based systems such as autonomous vehicles

[1-4] and surveillance systems [5,6]. The possibility of a robot to interact with and operate in

unstructured environments without the complete control of the human operator is strictly

dependent on its capability to detect risky situations (e.g., fixed or mobile obstacles, holes or

terrain slopes, etc.). A commonly accepted observation is that such complex tasks cannot be

solved by using a single sensor, and that a synergetic use of multiple sensors is mandatory. The

decreasing costs and the increasing performances of the sensors available today have made the

aforesaid tasks feasible. Moreover, real-time processing and fusion of multisensor information

are becoming possible thanks to the considerable progress made in the areas of distributed

architectures and processing systems.

The major advantage that a robot or an autonomous system can obtain by integrating

information from multiple sensors is that such information is more accurate in terms of features

that may be impossible to acquire by using just individual sensors. The use of multiple sensors

1 This research was partially founded by the CEE-ESPRIT project 6068 ATHENA (Advanced Teleoperation for

eartHwork Equipment Navigation).

2

can also increase the system’s reliability in the case of the failure of an individual sensor.

Complementary information from a group of disparate sensors can be exploited to compensate

for the inaccuracies and limited operating ranges of individual sensors. As a result overall

information turns out to be is accurate over a very wide range.

Several systems recently developed in the context of autonomous-vehicle driving are focused

on real-time risk detection and/or multisensor-data fusion. A visual-based system for detection

and tracking of multiple vehicles in road traffic scenes was developed by Malik et. al [3]. This

system addresses the problem of occlusions in tracking multiple 3D objects in a known

environment by employing a contour-tracker algorithm based on intensity and motion

boundaries. The positions and motions of contours, represented by closed cubic splines, are

estimated by means of linear Kalman Filters. A multisensor visual-based system, named BVV3,

for autonomous vehicle driving and risk detection was proposed by Graefe [4]. The system

architecture includes multiple parallel processors, a systembus for inter-processor

communications, and a separate video-bus for digital image data transmission. An obstacle

detection module continuously scans an image to search for a feature that may represent the first

sign of a possible obstacle (e.g., another vehicle). Large objects (trucks) can be detected at very

long distances, up to 700 m, and smaller ones (cars) at about 350 m. When an object candidate

has been detected, 2D generic models describing the visual characteristics of objects belonging to

predefined classes (trucks, cars, etc.) are employed to classify road obstacles. The main limitation

on these systems is that they are based only on visual sensors and, consequently, they depend on

environmental conditions (e.g., fast change in lighting, uneven lighting, bad visibility, etc.). Rao

et al. [5] presented a complex multisensor surveillance system using CCD cameras, ultrasonic

sensors and optical barriers to track (in real time) people and robots moving around a factory

room. This system allows a complete decentralization of the Kalman filter equations (employed

for object tracking), thus eliminating computational bottlenecks; moreover, it can degrade slowly

in the case of sensor and processor failures. However, it exhibits some limitations: (a) multiple-

object tracking is not completely reliable due to possible communication deadlocks or levelocks,

and (b) occlusions and shadows limit the tracking capabilities of the system. Corrall [6] proposed

a multisensor system for airport stand surveillance, ground-traffic control and road-traffic

accident detection. In particular, it involves real-time analysis and description of complex scenes

(containing also object occlusions) belonging to image sequences, but is restricted to cases where

scene structures, objects and their expected behaviors are known.

In recent years, a number of researchers have begun to develop methods for integrating and

fusing information from multiple sensors. In order to manage the large amount of data generated

3

by such sensors, various integration and fusion approaches have been proposed in the literature

(e.g., neural networks [7], Bayesian networks [8], Kalman filter architectures [8], production-rule

systems [9], knowledge-based systems [10], and other [11,12]). An accurate overview of such

methods can be found in [13,14].

An important additional advantage of data-fusion architectures can be obtained by considering

systems based on a distributed organization of computational resources: multiple processors

gather and process information provided by multiple information sources to analyze observed

situations. This approach results in the development of Distributed Sensor Networks (DSNs)

[15], which are systems made up of connected, co-operating information sources organized into

highly structured, message-exchanging networks. In a DSN, sensors may be either physical (i.e.,

they generate information by translating observed physical quantities into electrical signals) or

virtual (i.e., they derive new information from existing data). The main goals of a DSN are to

process data acquired by different sensors, to integrate them (thus reducing the degree of

uncertainty naturally associated with acquired information) and to produce an interpretation of

the environment observed. The main advantages of a DSN are the following:

1. Each local node of the network has one degree of intelligence and knows only one part of the

whole problem to be solved. Therefore, the information representation on each node is

structurally simpler than the corresponding information representation in a centralized system.

2. Imposing local constraints at the lower levels decreases the number of possible solutions

from the lower to the higher levels of the distributed system. At the same time, the network

performance may improve if an appropriate fusion mechanism is used.

3. The amount of information is reduced as only significant information is propagated

through the network.

In this paper, we describe how a DSN can be applied to integrate data coming from multiple

sensors for the purpose of automating the operations currently performed by human drivers of

earthwork vehicles working at waste-disposal sites (sanitary landfills). The main objectives are to

operate such vehicles without need for drivers and to provide the vehicles with autonomous risk-

detection capabilities. Three different kinds of sensors have been employed: ultrasonic sensors

(USs), a laser range finder (LRF) and multiple CCD cameras. USs and the LRF are placed on-

board an earthwork vehicle and aim to detect risky situations in its neighborhood area (a circular

region of radius of about 12 meters). Cameras are placed on high poles around a sanitary landfill,

and are used to detect and track intruders in order to predict possible collisions with the

autonomous vehicle. Data coming from these sensors are integrated to obtain a complete view of

4

the whole environment. Each detected risky situation is also spatially located on a 2D top-view

map.

II. SYSTEM DESCRIPTION

A. Vehicle automation and control

An autonomous vehicle mainly consists of a bulldozer equipped with a front blade for the

spreading and leveling of waste, and with steel wheels for the breaking and compacting of waste

(Fig. 1a). An on-board computer interfaces the actuators with some sensors in order to close the

vehicle’s control loop. The basic sensor data are represented by the position of the blade, the

steering angle between the two bodies of the vehicle, the speed of the vehicle and the engine

r.p.m. These data can be obtained by two linear potentiometers installed on the two jacks used to

move the blade and to steer the vehicle, by an encoder placed on the power shaft and by an

encoder interfaced with the alternator.

Figure 1b shows the general view of the vehicle control system; it can be roughly divided into

two parts connected through a high-speed radio link: the on-board part (which is in charge of the

real-time control of the vehicle) and the off-board part (which helps the human operator in its

supervision).

The off-board part of the system is equipped with visual cameras placed on high poles along

the perimeter of the working area, and is mainly in charge of detecting, locating and tracking any

intruder moving in the scene. It is also composed of a man-machine interface that provides the

operator with adequate devices for the input of commands and for data visualization, and of a

communication server for managing the data link with the on-board system.

The on-board part is more complex, being the interface between the commands coming from

the man-machine interface and the vehicle’s sensors and actuators. It equipped with ultrasonic

sensors and a laser range finder to detect risky situations in the neighborhood area of the vehicle

(e.g., fixed or mobile obstacles, terrain slopes or holes, etc.). It is composed of several modules:

the vehicle motion control which decides on the movements of the vehicle, the position

determination system, which provides the vehicle position and speed by means of an advanced

laser-based sensor, and the path planner, which computes the optimum trajectory of the vehicle.

An analog remote radio console is also present in the system architecture as an independent

auxiliary control system able to take the control of the vehicle, at any moment, in case of fault of

the radio link or of some module of the digital control system. For safety reasons, a stop button

on the analog console is always provided that allows the system to override any other command.

B. Multisensor data fusion system

5

The aim of this paper is to detail the multisensor data fusion system that aims to ensure the

driving capabilities of the vehicle moving in a complex and risky environment.

A distributed sensor network organized into a hierarchical tree structure has been developed

(Figure 2). The tree consists of a root node and as many leaf nodes (sensor level) as the number

of real sensors used by the system (e.g., several CCD cameras, a laser range finder and sixteen

ultrasonic sensors). The second level (feature extraction) is composed of as many nodes as the

number of features extracted by the sensors. The third level of the tree (low-level data fusion) is

composed of a set of virtual sensors that integrate the informational content of one or more

features to accomplish different tasks. As we would like the fusion process to exhibit an

“intelligent-like” behavior, we assume that it is able to detect the existence of an obstacle before

tracking it. To this end, the following tasks are performed at the third level: (a) detection of static

obstacles, (b) detection of terrain holes or down slopes, (c) recognition of specific fixed objects

(e.g., trash heaps), (d) detection and location of mobile objects (e.g., other vehicles, pedestrians,

etc.). Each virtual sensor points out information about the estimated number of detected objects

and their 3D positions computed by an appropriate reference system. In particular, a Cartesian

(Xs,Ys,Zs) reference system is used by both the visual and ultrasonic sensors, whereas a polar

(reference system is used by the laser sensor. The origins of these systems are placed in

the related sensors. Different voting-based strategies are applied to fuse data coming from

multiple virtual sensors of the same type, e.g., data coming from different cameras are merged

into a general reference system (X,Y,Z) related to the working site. The root node (high-level data

fusion) uses the symmetry and perturbation model to integrate the evidences coming from all

virtual sensors. The objective is to obtain a fused information about the number of objects and

their 3D poses in the (X,Y,Z) general reference system. Moreover, a Kalman-based approach is

adopted to track any object moving in the working area of the vehicle and to estimate its motion

parameters. To this end, information from the virtual sensors performing the intruder-location

and intruder-detection tasks is required. Finally, the fused information is used by the vehicle

motion control to decide on the best action to take, e.g., stopping the vehicle, moving the vehicle

in a certain direction, etc.

III. APPLIED SENSORS AND FEATURE EXTRACTION

The proposed system is made up of different kinds of sensors placed both on-board and off-board

the vehicle. Several ultrasonic sensors placed at appropriate boundary positions round the vehicle

(i.e., 5 wide beams and 5 narrow beams) (Figs. 3a, 3b and 3c) and a laser range finder are on-

board the vehicle(Fig. 3d). The range of the wide-beam sensors is 12m, their angular spatial

resolution is about 20, their resolution is 1m and their accuracy is about 1mm in the existing

6

configuration. The narrow- beam sensors have the same characteristics, except for their range,

which is 6 meters. They are placed on fixtures with 2 degrees of freedom, thus they can move

along the x and y-axes. By changing their angles to the y and x axes, it is also possible to cover

the area around the vehicle and to prevent it from going down a steep slope. Each ultrasonic

sensor gives the distance di to an obstacle, in its field of view, where 1 m di 12 m.

The laser range finder sensor is placed on a fixture in the vehicle’s front part. The laser beam

points upwards, is reflected by a mirror at an angle between 0 and 45 to the vertical axis, and

rotates in parallel with the vertical axis. Thus, the laser beam scans a spherical segment of 16m in

diameter, limited by the horizontal plane and a cone having its acme in the center of the spherical

segment at an angle of 90. A complete data-acquisition cycle requires 10 complete revolutions

and takes about 2 sec. The data returned by the laser scanner are expressed in polar coordinates

and have the format ( whereis the range measured at the horizontal and vertical angles,

and , respectively (Figure 3e). The laser resolution is about 32 mm and its accuracy is about

100 mm in , 2 on the horizontal plane, and 0.5 on the vertical plane. Each complete revolution

returns about 1600 data triplets.

Several color cameras have been installed on high poles at the entrance of the sanitary landfill

and at each corner of the working area in such a way that their fields of view cover the whole

site. First, the absolute difference D(x,y) between the current image I(x,y) and the background

image BCK(x,y) is computed every k time instant (e.g., 500 msec) as:

]..1,..1[),(),(),(),( NNyxyxBCKyxIyxD (1)

where NN is the image size. Then, a hysteresis function with two thresholds, THRin and

THRout, is applied to each point of the difference image D(x,y) to establish wheter the related

point of the current image belongs to the background or to a moving object. A binary image

B(x,y) is obtained, where changing pixels are set to 1 and background pixels are set to 0. A

morphological filter is used to eliminate isolated or small groups of pixel s[23]; as a result, the

output image is only composed of significant groups of connected pixels, called blobs, bi. As the

scene background is time-varying, a background-updating procedure is applied to estimate

significant changes in the scene [24].

Figure 4a shows a real image (chosen as a running example) representing an earthwork

vehicle moving into an outdoor test site. Figures 4b, 4c and 4d show respectively the background

image and the blobs obtained for different values of the thresholds THRin and THRout.

For each detected blob bi, the following features are computed on the image plane: (a) the center

of mass, ri=(xi,yi), and (b) the dimensions, l and h, of the minimum bounding rectangle (MBR).

7

Figure 4e shows the result of the focus-of-attention procedure applied to the image of the

running example. Noise effects and an inadequate setting of the THR thresholds can produce

false MBRs on the image plane. In the current case, the trees present in the scene are moved by

the wind; therefore, a false MBR is detected in the upper part of the image in Fig. 4e.

IV. LOW-LEVEL DATA FUSION

The main objective of low-level data fusion is to integrate the informative contents of the

features extracted from one or multiple sensors of the same type in order to obtain a rough

estimate of the number of objects present in the observed scene and their 3D positions. A specific

data-fusion procedure is applied to each kind of sensor. In particular, a weighted function based

on confidence measures is used to integrate the information coming from the different cameras

whose fields of view are partially overlapped. A voting mechanism is also adopted to infer the

presence of obstacles from the 3D data generated by the laser sensor. A simple grouping rule is

used to integrate the information coming from the ultrasonic sensors.

A. Visual-data integration

Multiple cameras, installed on high poles at fixed points (e.g., corners) of the sanitary landfill,

continuously acquire images from the monitored environment. The goal of the virtual sensors

associated with the CCD cameras is to detect and locate any intruder object moving inside the

working area of the autonomous vehicle. The detected blobs in each image are considered as

virtual data to locate the object in the 3D environment. In particular, the position rb=(xb,yb) of

the center of mass of the blob, and the dimensions, l and h, of the related MBRs are used.

The transformation of an image plane point, r0=(x0,y0) into a 3D reference system point,

Rw=(Xw,Yw,Zw), is a well-known ill-posed problem. However, if the intrinsic parameters (camera

position, optical axis orientation and focal length f) and the calibration matrix M [16,17] of the

camera are known, the problem can be regularized [31]. A linear system ri=M.Rw, where the 2D

coordinates ri=(xi,yi) on the image plane and the 3D point Rw=(Xw,Yw,Zw) on the camera

reference system (Figs. 5a and 5b) are known, should be solved. By considering the matrix

M= m m m1 2 3, ,T

, where mi represents the i-th row, a system characterized by two equations

and three unknowns is obtained:

x

y

i

i

m

m

m

m

w

w

w

w

1

3

2

3

R

R

R

R

(2)

8

By assuming the ground plane hypothesis [18], the system is reduced to 2 equations and 2

unknowns, i.e., Xw and Zw :

x M X M Z M M X M Z M

y M X M Z M M X M Z M

i w w w w

i w w w w

8 10 11 0 2 3

8 10 11 4 6 7

0

0

(3)

The ground plane hypothesis is assumed in order to regularize the location problem: all

objects (e.g., cars, lorries, etc.) moving towards the sanitary landfill are considered on the ground

plane (i.e., Y=0).

For each camera, the projection of the blob position onto the 3D-camera reference system is

performed. A suitable and simple confidence measure ci, depending on the quality of the acquired

image, on the accuracy c of the calibration matrix (computed off-line during the camera

calibration procedure) and on the estimated distance z of the object from the sensor, is

associated with each estimated object position:

zLMTNfc ci

ˆ

1),( 321 (4)

where the coefficient i are normalization constants such that i

i 1 .The image quality

function is expressed as [31]:

2

221

1),( LMw

TSwLMTNf (5)

where w1 and w2 are weights used to normalize the different parameters and to enforce the most

significant contributions. The parameter TN represents the tenengrad measure associated with the

focus of the camera lens, and it can be computed as:

T N S x y S x y T h

y

N

x

N

( , ) ( , )2

11

for

(6)

where 22),(),(),( yxIiyxIiyxS yx , i

x and iy are the Sobel operator’s [30] for the

computation of the gradient in the x and y directions, respectively, and * is the convolution

operator. The parameter LM represents the average brightness, which is defined as:

LM l Oi ii

0

255

(7)

where Oi is the number of occurrences of the gray-level li in the image.

The estimated object position is considered valid only if the related confidence value is above

a fixed threshold Thc (e.g., Thc =0.7).

9

If a given blob bi is detected by K cameras, a weighted function is used to obtain a more

accurate object position. Let )ˆ,ˆ,ˆ(ˆjjjj zyxx be 3D blob position estimated by the j-th camera.

The final object position is estimated according to the following relation:

j

K

j

jbKi

xx ˆ1

ˆ1

(8)

where

j

j

jc

c is a weight parameter taking into account the confidence values of the

estimated positions computed by each single sensor.

Figure 4f shows the final MBRs on the image of the running example survived after the object

location phase. The ground plane hypothesis together with a threshold on the minimum MBR

size, allows the system to eliminate false MBR.

B. Laser-data integration

The virtual sensor associated with the laser range finder is aimed at processing only

information about fixed obstacles. Its main objective is to discriminate between trash heaps to be

compacted and other fixed obstacles (e.g., stopped vehicles, sanitary-landfill fixtures, etc.). The

shapes of trash heaps are generally irregular and their surfaces show a high degree of roughness,

whereas man-made objects are generally characterized by regular shapes and smooth surfaces.

Elementary shapes that indicate the presence of man-made objects are line segments [19] and

plane regions [20]. A measure of entropy of a given region Di in the working area is based on a

rough analysis of the region surface and is applied to infer the presence of trash heaps in the

scene. This entropy measure is calculated on the laser data as:

max

log)(N

i

iii hhDEntropy (9)

where ll

nh i

i

is the distance ni of the i-th point of the observed region from the laser divided

by the total number of points in the region (a ll square mask), and Nmax is the number of laser

data available in the considered region. Areas characterized by entropy values above a fixed

threshold (computed experimentally after several tests) represent possible trash heaps.

Moreover, vertical or quasi-vertical planes are searched for in the image in order to find

regular objects. A voting-based mechanism [21] is applied that is similar to the Hough transform

for detecting line segments in edge images. A plane can be described in terms of polar

coordinates by means of two angles (,) and by its distance (z) from the origin of the reference

system centered in the center of mass of the vehicle [22]. More precisely, is the slant angle,

i.e., the angle between the normal to the plane and the laser direction, is the tilt angle, i.e., the

10

angle formed by the projection of the normal to the plane onto the plane normal to the laser

direction, and z is the distance of the plane from the origin of the reference system. The above

three parameters univocally locate a plane into the 3D reference system of the laser sensor.

Geometric transformations can be easily performed to transform laser data (expressed in polar

coordinates (into a 3D Cartesian system (x,y,z) with the origin in the vehicle’s center of

mass:

cos

cos

z

sinsiny

sinx

(10)

A segmentation of laser data can be done by accumulating evidences in the parameter space

(,z). 3D points in the Cartesian reference system (X,Y,Z) vote for a set of buckets in the

parameter space according to a voting equation [22], and at the end of the process, the buckets

receiving a large number of votes can be considered as candidate planes in the observed scene.

The process is implemented on DSP boards in order obtain in real time a rough 3D

reconstruction of the environment. The detection of vertical or quasi vertical planes is associated

with the possible presence of man-made obstacles.

In particular, information about static man-made obstacles to be avoided and about trash heaps to

be compacted is sent to the path-planner module to decide on the vehicle navigation strategies.

C. Ultrasonic-data integration

Ultrasonic sensors are evenly distributed along the ring and they are divided into several

groups, sgj, j=1,..,M, where each group is composed of three neighboring sensors [25]. With this

sensor arrangement, the distance, dgj, measured by the j-th sensor group from the vehicle’s center

of mass to the obstacle is expressed as:

Mjforjjjidddg ioffsetj ,..,13,13,23min (11)

where doffset is the known distance between the vehicle’s center of mass and the sensors. The

estimated distances, dgj, are the features extracted from the ultrasonic sensors.

D. Low-level risk detection

The virtual sensors that make up the third level of the data-fusion system allow the low-level

risk-detection module for the vehicle. Figure 6 shows the general flowchart of the low-level risk-

detection module. The primitive object detector virtual sensor processes data coming from the

ultrasonic sensors. These data are acquired using two different orientations along the y and x axes

(see Figs. 3b and 3c). The first orientation (about 70 with respect to the ground plane) makes it

possible to detect the presence of obstacles in the neighborhood of the vehicle, and the second

11

orientation (about 30 with respect to the ground plane) prevents the vehicle from going down a

step down slope.

If the primitive object detector finds a risky situation, (e.g., the presence of a terrain slope), the

vehicle is stopped immediately. A stop message is sent to the vehicle actuators and to the man

machine interface to alert the operator. The positions of the detected slopes or holes are also sent

to the path-planner module to compute a new vehicle trajectory.

If an obstacle is detected, the obstacle motion determination function uses the vehicle velocity

and direction to discriminate between static (sanitary-field boundaries, trash heaps, etc.) and

mobile objects (other industrial vehicles, trucks, pedestrians, etc.). Moreover, the obstacle

position determination function computes the positions of fixed objects and also estimates the

position of mobile obstacles every T1 sec (e.g., T1=500 msec). This information is sent to the

high-level data-fusion module to improve the object-tracking procedure.

The environment perception virtual sensor processes data coming from the laser range finder

in order to detect the presence of planes or irregular surfaces. If vertical or quasi-vertical planes

are detected, the presence of a man-made obstacle is inferred. The distance of the plane from the

laser sensor (z coordinate) is used as an estimate of the obstacle position. If non-flat regions

characterized by large entropy values are detected, they are considered as trash heaps.

The main objective of the intruder detection and location virtual sensor is the control of the

vehicle safety in the whole working area. This goal is achieved through a hierarchy of safety

levels, such that a level can be achieved only if the lower levels have been completely tested (Fig.

7). At the first level, intruder-detection procedures are performed on the images acquired by the

camera(s) placed at the sanitary-landfill gates. If significant blobs are detected, an alarm signal is

sent to both the operator and the vehicle to inform that intruders are coming into the working

area. The second level deals with the location of an intruder on a 2D top-view map of the

working area. Images are acquired by multiple cameras placed at particular points of the sanitary

landfill (e.g., corners) in such a way that their fields of view cover the whole area to be

monitored. First, the best camera to see and track the intruder is selected according to the known

geometry of the working area, the positions of the cameras, and the estimated intruder position at

the next time instant (computed by the tracking module). However, if partial or total occlusions

are present in the field of view of the selected camera (e.g., the dimensions of the MBR of the

blob have significantly changed, as compared with the previous frame, or the blob has

completely disappeared), the intruder location is computed separately by two contiguous

cameras. Then, significant blobs are searched for in the difference image, and the related MBRs

are computed. The 3D position and direction of the vehicle, computed by the position-

12

determination module (based on the global position system (GPS)), are used to find on the image

plane the blob associated with the vehicle. A 3D-into-2D transformation, based on (a) a-priori

known information about the shape and dimensions of the vehicle, (b) perspective law equations,

and (c) the ground plane hypothesis, is performed. Finally, the position(s) of the intruder(s) is

sent to the operator display and to the high-level data-fusion module.

Each virtual sensor provides information about the estimated number of detected objects and

about their related 3D positions computed in the sensor-based reference system.

V. HIGH-LEVEL DATA FUSION

The main objective of high-level data fusion is to integrate all information coming from the

virtual sensors at the lower level in order to obtain an accurate estimate of the number of objects

present in the observed scene as well as their 3D positions.

The symmetry and perturbation model [26], a probabilistic representation model, and an

integration mechanism, based on the extended Kalman filter (EKF) [27], for uncertain geometric

information are used for multisensor data integration. It is worth noting that both real and virtual

sensors render uncertain geometric information in terms of partiality and inaccuracies. Partiality

refers to the degree of freedom associated with different object locations, and to the way such

locations determine the locations of other objects related to them. Inaccuracy refers to the level of

precision in estimating object locations.

A different set of parameters (expressed in different reference systems) are used to represent

object locations by the different virtual sensors. Moreover, different geometric features used to

represent object locations have different degrees of freedom, e.g., the location of a blob on the

image plane is determined by two degrees of freedom, whereas the location of an object by laser

data is determined by three degrees of freedom. A reference system E is associated with any type

of sensor. For example, the location xWE of an object in the image reference system with respect

to the general reference system W=(X,Y,Z) is given by a transformation tWE composed by two

Cartesian coordinates and an angle [27] (Fig. 8a):

T

W E yx ),,( x where ),(),( zRotyxTranstW E (12)

The degrees of freedom that determine the location of an object are related to its symmetries of

continuous motion [28]. These symmetries are defined by a set SE of transformations that

preserve the object. For example, the symmetries of a straight line on the image plane are the set

of continuous translations (Tx) along the line. (Fig. 8b). It is possible to represent the set of

symmetries using a row selection matrix BE, called binding matrix. According with this

definition, two objects, whose location is represented by the points A and B respectively, have

coincident locations if :

13

0 ABA xB (13)

where BA denotes the binding matrix of both objects. For example, let A and B be two object

locations into two different reference systems whose relative location is given by:

T

AB yx ),,( x

010

001BA BB

According to equation (1) we have: 0),( T

ABA yxxB . Two points coincide if the relative

positions in their associate reference systems are zero.

Sensors give imprecise information, thus one only obtains an estimate of the location of a

given object. Normally, sensors give measurements with uniform distribution of errors. In the SP

model, the estimate of the location of a given object E is denoted by W Ex , and the error

associated to this estimate is expressed using a differential location vector dE, relative to the

reference associated to the object (see Fig. 8a), so that the true location of E is given by

EW EW E dxx ˆ (14)

where indicates the composition of two location vectors. A vector pE formed by non-null

elements of dE is called perturbation vector. These two vectors are related by the binding matrix

BE:

E

T

EE pBd (15a)

EEE dBp (15b)

To this end, the information associated to the estimated location of the object is represented by a

quadruple EEEW EW E BCpxL ,,ˆ,ˆ , where E

T

EW EW E pBxx ˆ , }{ˆEE E pp and }{ EE Cov pC .

In this paper, we use the SP model along with the specialized version of the EKF for the SP

model, the SP filter, to estimate the location of obstacles from a set of partial and imprecise

observations in the vehicle-working environment.

Let x be the state vector whose value is to be estimated, and let there be n independent and partial

observations yk of x, where },..,1{ nk , affected by white Gaussian noise:

kkk uyy ˆ (16)

where ),0( kk SNu

Let each observation yk be related to x by an implicit nonlinear function of the form

)ˆ()ˆ(),( kkkkkkk yyGxxHhyxf (17)

where )ˆ,ˆ( kkk yxfh , )ˆ,ˆ( kyxk

kx

fH

and )ˆ,ˆ( kyx

kk

y

fG

The estimate nx of the state vector and its covariance Pn after integrating the measurements are:

14

nnn MPx ˆ (18)

where

n

k

k

T

kkk

T

kn

1

11 )()( HGSGHP (19a)

n

k

k

T

kkk

T

kn

1

1)( hGSGHM (19b)

In our approach, the state vector x represents the object location in the general reference system

W, and the vector yk contains the observations (object location) estimated by each virtual sensor

in its reference system.

A. Object tracking

An additional objective of the high-level data-fusion module is to estimate the motion parameters

of each mobile obstacle in order to prevent possible collisions with the autonomous vehicle and

decide the best trajectory to be followed. A Kalman-based tracking module, based on visual

observations, has been developed and applied independently to each camera [29]. It receives in

input the displacement rb=(x

b,y

b) over two consecutive frames of center of mass of each

detected blob, and variation (l,h) of the dimensions of the minimum bounding rectangle. The

output is represented by the estimated depth Zb of the detected blob in the general reference

system W, and by the estimated width P and length L of the detected object.

Let us consider the 3D motion of the center of mass R b X Y Zb b b

T

, , of an object moving

with translation and rotation motion parameters, t U V WT

, , and A B CT

, , , related to the

camera reference system. The following relation holds:

R Rb b

t (20)

According to the considered applications, the following constraints can be assumed: (a) local

planar motion of the vehicle V=A=C=0 (ground plane hypothesis) and (b) null rotational vector

=0 (we consider only the motion of the object centroid). As consequence, we obtain:

X U Y Z Wb b b 0 (21)

The third equation Z Wb represents the dynamic model for the quantity of interest Zb. For

what concerns the other QIs, if one considers the object as a rigid body, it is possible write:

0and0 PL .

An Extended Kalman Filter (EKF) is applied to estimate the quantities of interest, Zb, P and L.

The equations of the Kalman filter for the prediction phase are:

q q bk k k

1 1 (22a)

15

q q

k k

1

Q (22b)

where qkk

k k

T

Zb

L P

, , and b k K

T

W , ,0 0 . In order to determine the updating equations,

it is necessary to linearize the measure equation around the predicted value:

m q LI q nk k k k k k kf (23)

where the function f describes the relation among the quantities of interest Zb, P and L and the

system observations x, y, l and h, k and k are Gaussian random vectors with zero means

and covariance matrix Qk and Rk, respectively, and the matrix LIk is a 43 matrix containing the

coefficients of the variables linearized at the k-th step [29].

Finally, we can write the equations related to the updating phase:

K LI LI LI Rkk

k

T

kk

k

T

kq q

1

(24a)

q q

kk k

k

1

I K LI (24b)

q q K d d q K d LI q nk k k k k k k k k k k

(24c)

where Tk dhdlyx ,,,d and Kk is the filter gain. The filter must be initialized by a vector q0

and a variance matrix 0 that represents the accuracy of the initial estimation.

B. High-level risk detection

Data coming from the virtual sensors are integrated to improve the driving capabilities of the

autonomous vehicle and to obtain a complete control of the whole working area in terms of

numbers of fixed and mobile obstacles, their estimated 3D positions and their estimated motion

parameters. The high-level data-fusion module is in charge of the vehicle supervision, and

produces risk messages that are sent to the vehicle motion control, the path planner and the

operator. Two different situations may occur.

First situation One of the virtual sensors for intruder detection and location detects an intruder

vehicle moving in the sanitary landfill. Each intruder displacement is followed by the tracking

module, and estimates of its new position, speed and direction are computed (Fig. 9). On the

basis of the known vehicle position, speed and trajectory, a collision point computation function

is applied to verify if a collision between the intruder and the vehicle may occur. In the

affirmative, an alarm message containing information about the estimated collision point

( , , )X Y Z (i.e., Z=0) and about the estimated time before impact, te , is sent to the man machine

interface to alert the operator and to the vehicle motion control, which evaluates the degree of

16

risk and decides on the best action to face the situation: (a) stopping the compactor vehicle, (b)

slowing down its speed, or (c) ignoring the alarm message. The degree of risk is computed on

the basis of the estimated time before impact. If action (b) is chosen, attention is also focused on

the output of the virtual sensor in charge of detecting the object position. If an obstacle is

detected in front of the vehicle, the vehicle is stopped immediately, and a new alarm message is

sent to the man machine interface.

Second situation Any intruder moving in the sanitary landfill is detected by the intruder

detection and location virtual sensors; the virtual sensor in charge of detecting the object position

finds a moving obstacle (e.g., a driver getting out of the lorry and moving in the sanitary landfill,

an animal, etc.). The vehicle is stopped immediately, and an alarm message containing the 3D

position of the detected obstacle is sent to the intruder detection and location virtual sensors and

to the man machine interface. Such a message is repeated every 500 msec and provides the

updated object position. The thresholds THin and THout of the change-detection algorithm are

relaxed in order to improve the detection capabilities of the system until the moving intruder is

located and then tracked.

The high-level data-fusion module is able to provide updated information about a risky

situation at every time instant (about 1 sec), thus making it possible to avoid useless

computations such as tracking of blobs not related to any interesting object and recognition of

objects to which semantic meanings have already been assigned. This data-reduction process,

combined with the one performed by the change detection module, sharply reduces the

computational load.

VI. EXPERIMENTAL RESULTS

Two different scenarios, representing different working situations of the autonomous vehicle,

have been considered. The first scenario shows the autonomous vehicle and an intruder (i.e., a

lorry) moving in the sanitary landfill and generating possible collision situations, and the second

concerns some tests in bad environmental conditions.

An experimental test site has been arranged in order to simulate the morphological

characteristics of a sanitary landfill for urban solid waste. Such a simulation has been realized by

reproducing the real geometry and by using piled materials. The test site has an area of 3500

square meters and a rectangular shape (50x70 meters); it is characterized by a ground with an

average slope of about 2%. An embankment has been realized to enclose the test site and to

uphold a Control Room [19]. Plinths, complete of anchoring bolts for positioning the video

cameras, have been installed around the site.

A. First scenario: collision situations

17

Figure 10a shows an image sequence acquired by the gate camera at a frequency of 2 frames per

second. The intruder detection virtual sensor detects an intruder and produces a risk message for the

operator and for the vehicle control module. The cameras are switched on and the intruder location

and tracking modules are activated in order to predict possible collision situations. Figure 10b

shows the scene acquired from an internal camera placed in the best position (i.e., the side opposite

to the gate). It is possible to see the autonomous vehicle stopped in the left corner of the test site,

and the intruder moving in a direction perpendicular to the gate side. The MBR of each detected

blob is displayed in the original images. The presence of shadows and noise can erroneously

improve the dimensions of some MBRs. However, the constraints on the maximum rotation and

translation speeds imposed by the EKF model and its prediction capabilities allows the system to

maintain a satisfactory object location accuracy (in the range [-2,+2] meters) and to avoid wrong

estimates of the intruder trajectory. The object’s positions computed by the location virtual sensor

and the vehicle trajectory estimated by the tracking module are displayed on a 2D top-view map of

the site (Fig. 11a). Figure 11b shows the same image sequence as in Fig. 10a after some time. Now

the autonomous vehicle is moving and collision situations may occur. Data coming from ultrasonic

sensors are continuously processed to detect the presence of a terrain slope or holes. In particular,

after few frames, both vehicles are moving toward the same area: the risk of collision is very high

(frames 23 and 24). The collision point computation function estimates the collision point, i.e.,

(10,45,0) and the time to impact, i.e., about 20 sec, on the basis of (a) the known vehicle position

(i.e., (45,45,0)) and speed (i.e., 2 m/sec) and (b) the estimated intruder position (i.e., (30,15,0)). The

collision point is estimated on the basis of the measure of the vehicle speed, which is more accurate

than the estimated intruder speed (computed by the tracking module and equal to about 5 m/sec in

this case). If the estimated time-to-impact is shorter than a given threshold (e.g., 8 sec under the

hypothesis of constant acceleration), the autonomous vehicle is stopped immediately. Some time

instants after (frames 26 to 28), the autonomous vehicle and the intruder are moving in opposite

directions (e.g., the intruder is moving toward the gate), but in a neighborhood area. The risk of

collision is very low, but, due to perspective, an occlusion situation occurs. The vehicle is occluded

by the intruder vehicle: the related binary shapes are superimposed and fused into one MBR (frame

28) on the image plane. The information coming from the visual sensors is not sufficient to locate

the intruder to a satisfactory accuracy. In order to eliminate the ambiguity as to the intruder position,

the high-level data-fusion module uses also the data coming from the primitive object detector

virtual sensor. Figures 12a and 12b show the detected obstacle (frames 27 and 28) in the

reconstructed images coming from the ultrasonic sensors. In frame 28, an example of the utility of

used model is given: the fused data about the position of the obstacle (the real position is

18

(25.2,13.4,0)) can be estimated by the model to a higher accuracy (24.7,12.5,0) than the one

obtained by the single laser sensor. As a big obstacle has been detected in the neighborhood of the

vehicle, this is immediately stopped and a risk-alarm message is sent to the operator. The vehicle

remains stopped as long as until the obstacle in its neighborhood area (frames 29 to 32).

Figure 13a shows another complex situation (frames 29 to 32): the intruder vehicle is coming

out of the camera field of view. The intruder detection and location virtual sensors are unable to

work. The tracking module continues to predict the intruder position and speed according to the last

average five values (see Fig. 11a). After frame 32, the intruder is going out of the vehicle

neighborhood area, so the vehicle restarts. In frame 36, another risky situation occurs: the primitive

obstacle detector finds a hole in the front-right trajectory of the vehicle (Fig. 13b). The autonomous

vehicle is immediately stopped. Information about the position of the hole is sent to both the

operator and the path planner who decides on a new trajectory and restarts the vehicle.

Figure 13c shows another risky situation. The lorry is stationary in the middle area of the test

site. When the autonomous vehicle arrives near the lorry, the primitive obstacle detector virtual

sensor finds the obstacle and stops the vehicle. As the detected obstacle is stationary, the output of

the environment perception virtual sensor is analyzed to assign the obstacle to two different classes:

static man-made object or trash heap to be compacted. A quasi-vertical plane with parameters

(*,

*,z

*)=(14,33,7) has been detected. The classification result (i.e., the text message “man-made

object, no compacting operation”) is sent to the operator.

B. Second scenario: bad environmental conditions

The proposed system must work 24 hours per day. The ultrasonic sensor and the laser range

finder work well under bad environmental conditions (low visibility due to rain or fog, and

uneven lighting), and the visual sensor shows serious problems. Thanks to the cooperation

among the different sensors and to the related data-fusion strategies, the probability of false

object detection can be reduced.

B1. Low visibility

In order to simulate the situation of low visibility (due to heavy rain or fog), the original

images have been corrupted with salt and pepper noise. Up to 30% of the total number of pixels

per image have been modified in terms of gray level values (Figs. 14a and 14b). This kind of

noise, like a real rainy-weather situation, causes a remarkable degradation of the feature-

extraction process, as the size of the minimum bounding rectangle and the center of mass of a

blob change rapidly during the frame sequence. Consequently, the intruder trajectory estimated

by the EKF could not be very accurate. Figures 14c and 14d show the results of the change-

19

detection algorithm applied to the image in Figures 14a and 14b. Figures 14e and 14f show the

estimated trajectories of the detected intruder under different noisy conditions. These results

prove that the proposed approach is able to reduce the contributions of the vehicle shadow and

the corrupting noise, without greatly modifying the characteristics of the intruder trajectory. It is

important to note that, even if the system tracking performances are degraded, the high-level

data-fusion module is always able to detect and locate the intruder with acceptable errors.

B2. Uneven lighting conditions

We now examine the improvements obtained by considering the scene under uneven lighting

conditions: the gray levels of the original images contain insufficient quantities of information

(e.g., the gray-level mean is too low). A pre-processing phase is needed to simplify the

processing task of the visual-based virtual sensors.

Figure 15a represents a test image acquired towards sunset. In this case, we apply the window-

level contrast stretching function [30] (Fig. 15b). The result shown in the image of Fig. 15c,

exhibits a better visual contrast, even though it does not exhibit a noticeable visual clipping.

Figure 16a gives the example of an image acquired after sunset, towards night. In this case, we

apply two different nonlinear point-transformation functions to improve visual contrast: the

logarithmic function [30] (Fig. 16b) and the gamma function [30] (Fig. 16c). Results are

presented in Figs. 16d and 16e, respectively. It is worth noting that a simple contrast

manipulation function can improve the quality of the acquired images. For example, in the

image enhanced by the gamma function, the output is quite similar to an image acquired in

daylight under normal conditions. On the contrary, it is useful to observe that, in some cases, the

logarithmic transformation can make the output image too light. So we use this function in the

case of an image with many dark levels, and the logarithm works quite well.

However, the visual-based sensors should not to be able to run at night; therefore, it is

necessary to install an artificial lighting system within the landfill.

C. System performances

System robustness has been evaluated mainly in terms of detection and location accuracy for a

limited set of objects. In normal conditions and on a large number of real tests (about 104

images), the proposed multisensor data fusion system has provided a percentage of correct

intruder detection of 95%, a mean location accuracy of about 2 meters and a percentage of

correct risk detection in 92% of normal situations and 60% of complex situations showing object

occlusions. In particular, the primitive object detector based on ultrasonic data has provided a

successful object detection rate of about 87% and a successful terrain slope or hole detection rate

of about 76% on a large set of about 103 ultrasonic data. The environment perception virtual

20

sensor has provided a successful rate of detection of man-made static objects of about 85% and a

successful rate of trash heap detection of about 69% on a set of 5104 range data.

D. System limitations

The main limitations on the proposed system are defined in the following points.

(a) In the current system, the recognition capabilities are restricted to distinguishing between

static man-made objects (e.g., stopped trucks, sanitary-landfill fixtures, etc.) and trash heaps

characterized of irregular shapes. Such a recognition process, performed by the environment

perception virtual sensor, is based only on the detection of planar surfaces in 3D-range data [33].

In some situations, it could be more effective to use also visual data (b/w or color images) and

the related features (e.g., edges, regions, straight segments, etc.) to improve the recognition task.

To this end, computer vision methods are currently being studied, where the object recognition

process is performed on both range and visual data.

(b) A further limitation related to object recognition is that the vehicle is not equipped with a

CCD camera that could be used to better discriminate between regular objects and trash heaps.

This solution is under study, but two main problems should first be solved: (1) the high

vibrations generated by the engine of the vehicle produce very noisy images; (2) acquired images

should be transmitted (in real time and in a robust way) to the remote control center. New

representations of the information content should be added to the system.

(c) Very bad environmental conditions (e.g., rain, fog, etc.), low lighting and the presence of

shadows mislead the visual-based virtual sensors, which may erroneously generate false alarms.

(d) The assumption of low-speeds intruder vehicles moving in the sanitary landfill (maximum

speed: 20 Km/h) has been made.

VII. CONCLUSIONS

A multisensor data-fusion risk-detection system designed for an autonomous vehicle operating

in an outdoor environment (i.e., a sanitary landfill) has been presented. The core of the system is

a distributed sensory network organized into a hierarchical tree structure composed of as many

leaf nodes as the number of real sensors employed, e.g., several cameras, a laser range finder, and

sixteen ultrasonic sensors. Nodes at the second level are devoted to extracting features from

sensor data. The third level of the tree (low-level data fusion) is composed of a set of virtual

sensors, which integrate the informative content of one or more features to perform different

tasks: (a) detection of static obstacles, (b) detection of holes or down slopes, (c) recognition of

specific fixed objects (e.g., trash heaps), (d) detection and location of mobile objects (e.g., other

21

vehicles, pedestrians, etc.). The root node (high-level data fusion) integrates the evidence coming

from each virtual sensor in order to obtain fused information (in the general reference system)

about the number of objects, their 3D poses, and estimates of their motion parameters. Finally,

fused information is used to decide on the best action to take, e.g., stopping the vehicle, moving

the vehicle in a given direction, etc.

Results refer to real scenarios representing different working situations for the autonomous

vehicle, and demonstrate the goodness of the chosen solutions. The use of data-fusion strategies,

applied to complementary or redundant multisensor data, considerably helps in detecting risky

situations. Thanks to the capabilities and the robustness of the multisensor data-fusion system,

the autonomous vehicle can operate almost continuously, independently of changes weather

conditions or differences in lighting over time.

Future work will be aimed at overcoming the main system limitations and at testing the

proposed system in different application domains (e.g., airports, nuclear environments, etc.).

REFERENCES

[1] E.D. Dickmanns, B. Mysliwetz, T. Christians, "An integrated spatial-temporal approach to

automatic visual guidance of autonomous vehicles," IEEE Trans. on Systems, Man and

Cybernetics, vol. 20, no. 6, pp. 1273-1284, 1990.

[2] N. Kehtarnavaz, N.C. Griswold, and J.S. Lee, "Visual control of an autonomous vehicle

(BART) - The vehicle following problem," IEEE Transactions on Vehicular Technology, vol. 40,

no. 3, pp. 654-661, 1991.

[3] D.Koller, J.Weber, and J.Malik, "Robust multiple car tracking with occlusion reasoning,"

Proc. of IV European Conf. on Computer Vision, Stockolm, Sweden, pp. 189-196, 1994.

[4] V. Graefe, "Vision for intelligent road vehicles," IEEE Workshop on Intelligent Vehicles,

Tokyo, Japan, pp. 1-6, 1993.

[5] B.S. Rao, H.F. Durrant-Whyte and J.A. Shen, "A fully decentralized multisensor system for

tracking and surveillance," Int. Journal of Robotic Research, vol.12, no.1, pp.20-43, 1993.

[6] D. Corrall, "VIEW: Computer Vision for surveillance applications", IEE Colloquium on

Active and Passive Techniques for 3D Vision, IEE, London, 8/1-3, 1991.

[7] E. Jouseau and B. Dorizzi, “Neural networks and fuzzy data fusion. Application to an on-line

and real time vehicle detection system”, Pattern Recognition Letters, Vol. 20, no. 1, 1999, pp.

97-107.

[8] R. C. Luo, M. Lin, and R. S. Scherp, "Dynamic multi-sensor data fusion system for intelligent

robots", IEEE Journal of Robotic Automation, Vol. 3, No. 4, pp. 386-396, 1988.

[9] A. R. Hanson, E. M. Riseman, and T. D. Williams, "Sensor and information fusion from

knowledge-based constraints", in Proc. SPIE Conference on Sensor Fusion, Vol. 931, C. W.

Weaver, Ed., Orlando, FL, pp. 186-196, April 1988.

[10] L. Pau, "Knowledge representation approaches in sensor fusion", Automatica, Vol. 25, No.

2, pp. 207-214, 1989.

[11] R. C. Luo, "Multisensor integration and fusion in intelligent systems", IEEE Trans. on

System, Man, and Cybernetics, Vol. 19, No. 5, pp. 901-928, September/October 1989.

[12] J.L. Crowley and Y. Demazeau, Principles and techniques for sensor data fusion, Signal

Processing, Vol. 32, no. 1-2, 1993, 5-27.

[13] J.K. Aggarwal and N. Nandhakumar, Special issue on Sensor Fusion, Journal of Applied

Intelligence, Vol. 5, No. 3, pp. 207-290, July 1995.

22

[14] L.A. Klein, Sensor and Data Fusion Concepts and Applications, SPIE, Bellingham, WA,

1993.

[15] S. S. Iyengar, R. L. Kashyap and R. N. Madan, "Distributed Sensor Networks - Introduction

to the Special Section", IEEE Transactions on System, Man, and Cybernetics, Vol. 21, No. 5, pp.

1027-1031, September/October 1991.

[16] R. Hartley, “Calibration of cameras using the essential matrix”, in Proceedings of DARPA

Workshop on Image Understanding, San Diego, CA, January 26-29, 1992 (Morgan Kaufmann,

San Mateo, CA), pp. 911-915.

[17] R.I. Hartley, “Kruppas equations derived from the fundamental matrix”, IEEE Transactions

on Pattern Analysis and Machine Intelligence, Vol. 19, No. 2, 1997, pp. 133-135.

[18] T.N. Tan, G.D. Sullivan and K.D. Baker, Structure from motion using the ground plane

constraint, in Proc. of 2th European Conf. on Computer Vision, S.Margherita Ligure, Italy, May

1992, pp. 254-262.

[19] P. Grossmann, “From 3D line segments to object and spaces”, Proc. IEEE Conf. on

Computer Vision and Pattern Recognition, San Diego, CA, pp. 216-221, 1989.

[20] Z. Zhangh and O. Faugeras, “Finding clusters and planes from 3D line segments with

application to 3D motion determination, Proc. of. 2th European Conf. on Computer Vision, Santa

Margherita Ligure, Italy, May 1992, pp. 227-236.

[21] H. Illingworth, and J. Kittler, "The adaptive Hough Transform", IEEE Trans. on Pattern

Analysis Machine Intell., Vol. 9, No. 5, pp.690-698, 1987.

[22] G.L. Foresti, V. Murino, and C.S. Regazzoni, "Vehicle Recognition and Tracking from

Road Image Sequences", IEEE Transaction on Vehicular Technology, Vol. 48, No. 1, 1999, pp.

301-318.

[23] P.T. Jackway and M. Deriche, “Scale-space properties of the multiscale morphological

dilation erosion”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no.

1, 1996, pp. 38-51.

[24] G.L. Foresti, "Object Detection and Tracking in Time-Varying and Badly Illuminated

Outdoor Environments", Optical Engineering, Vol. 37, No. 9, 1998, pp. 2550-2564.

[25] N.H.C. Yung and C. Ye, “An intelligent mobile vehicle navigator based on Fuzzy logic and

reinforcement learning”, IEEE Transactions on System, Man, and Cybernetics – Part B:

Cybernetics , Vol. 29, No. 2, 1999, pp. 314-321.

[26] J.D. Tardos, “Representing partial and uncertain sensorial information using the theory of

symmetries”, in Proc. of IEEE Intern. Conference on Robotics and Automation, Nice, France,

May 1992, pp. 1799-1804.

[26] C. Brown, H. Durrant-White, J. Leonard, and B. Rao, "Centralized and decentralized

Kalman filter techniques for tracking, navigation, and control", American Control Conference,

Boston, MA, pp. 262-268, 1985.

[27] J.A. Castellanos, J.D. Tardós, J. Neira, "Constraint-Based mobile robot localization",

International Workshops on Advanced Robotics, Intelligent Machines, Salford, Manchester

(U.K.), April 1996.

[28] Arras K.O., Tomatis N., Improving Robustness and Precision in Mobile Robot Localization

by Using Laser Range Finding and Monocular Vision, in Proceedings of the Third European

Workshop on Advanced Mobile Robots (Eurobot 99), Zurich, Switzerland, Sept. 6-9, 1999.

[29] G.L. Foresti, and C.S. Regazzoni, "Localization and Tracking of Multiple Unknown Objects

in Real Environments", Electronics Letters, Vol. 31, No. 5, 1995, pp.355-356.

[30] J.C. Russ, The Image Processing Handbook, CRC Press, Boston, 1995.

[31] V. Murino, G.L. Foresti and C.S. Regazzoni, "Adaptive Camera Regulation for

Investigation of Real Scenes", IEEE Transaction on Industrial Electronics, Vol. 43, No. 5, 1996,

pp. 588-600.