A Road Following Approach Using Artificial Neural Networks Combinations

20
J Intell Robot Syst (2011) 62:527–546 DOI 10.1007/s10846-010-9463-2 A Road Following Approach Using Artificial Neural Networks Combinations Patrick Yuri Shinzato · Denis Fernando Wolf Received: 18 March 2010 / Accepted: 13 August 2010 / Published online: 28 August 2010 © Springer Science+Business Media B.V. 2010 Abstract Navigation is a broad topic that has been receiving considerable attention from the mobile robotic community over the years. In order to execute autonomous driving in outdoor urban environments it is necessary to identify parts of the terrain that can be traversed and parts that should be avoided. This paper describes an analyses of terrain identification based on different visual information using a MLP artificial neural network and combining responses of many classifiers. Experimental tests using a vehicle and a video camera have been conducted in real scenarios to evaluate the proposed approach. Keywords Image processing · Navigation · Machine learning 1 Introduction Autonomous navigation capability is a requirement for most mobile robots. In order to deal with this issue, robots must obtain information about the environment through sensors and thereby identify safe regions to travel [1]. Outdoor navigation in unknown terrain is certainly a complex problem. Beyond obstacle avoidance, the vehicle must be able to identify surface where it can navigate safely. The irregularity of the terrain and dynamics environment are some of the factors that make the robot navigation a challenge task [2]. Usually it is desirable that the mobile robot (vehicle) have the capacity to move along the road and avoid obstacles and non-navigable areas. Since these elements usually have differences in color and texture, cameras are a suitable option to identify P. Y. Shinzato (B ) · D. F. Wolf Institute of Mathematics and Computer Science, University of Sao Paulo, Sao Carlos, SP, Brazil e-mail: [email protected] D. F. Wolf e-mail: [email protected]

Transcript of A Road Following Approach Using Artificial Neural Networks Combinations

J Intell Robot Syst (2011) 62:527–546DOI 10.1007/s10846-010-9463-2

A Road Following Approach Using Artificial NeuralNetworks Combinations

Patrick Yuri Shinzato · Denis Fernando Wolf

Received: 18 March 2010 / Accepted: 13 August 2010 / Published online: 28 August 2010© Springer Science+Business Media B.V. 2010

Abstract Navigation is a broad topic that has been receiving considerable attentionfrom the mobile robotic community over the years. In order to execute autonomousdriving in outdoor urban environments it is necessary to identify parts of the terrainthat can be traversed and parts that should be avoided. This paper describes ananalyses of terrain identification based on different visual information using a MLPartificial neural network and combining responses of many classifiers. Experimentaltests using a vehicle and a video camera have been conducted in real scenarios toevaluate the proposed approach.

Keywords Image processing · Navigation · Machine learning

1 Introduction

Autonomous navigation capability is a requirement for most mobile robots. Inorder to deal with this issue, robots must obtain information about the environmentthrough sensors and thereby identify safe regions to travel [1]. Outdoor navigationin unknown terrain is certainly a complex problem. Beyond obstacle avoidance, thevehicle must be able to identify surface where it can navigate safely. The irregularityof the terrain and dynamics environment are some of the factors that make the robotnavigation a challenge task [2].

Usually it is desirable that the mobile robot (vehicle) have the capacity to movealong the road and avoid obstacles and non-navigable areas. Since these elementsusually have differences in color and texture, cameras are a suitable option to identify

P. Y. Shinzato (B) · D. F. WolfInstitute of Mathematics and Computer Science,University of Sao Paulo, Sao Carlos, SP, Brazile-mail: [email protected]

D. F. Wolfe-mail: [email protected]

528 J Intell Robot Syst (2011) 62:527–546

Fig. 1 Vehicle used for datacollection

navigable regions. Several techniques for visual road following have been developedbased on certain assumptions about the road scene. Detecting road boundariesthrough the use of gradient-based edge techniques is described in [3–5]. Thesealgorithms assume that road edges are clear and fairly sharp. In [6], it has beendeveloped an approach to extracts the texture from road images and use it as afeature for the segmentation of unmarked roads. The approach presented by [7]divides images in slices and tries to detect the path on each one.

A work related to artificial neural network (ANN) applied to road following is theAutonomous Land Vehicle In a Neural Network (ALVINN) [8], where a network isused to classify the entire image and detect the road. Another work that uses ANNis presented by [9]. Both works had only one ANN that should be re-trained inorder to be able to identify a long way. This work presents the use of a set ANNsin order to improve the road identification. More specifically, we present a imageclassification approach based on many ANNs that use different features obtainedfrom images as input and combine all outputs to draw one improved classification.Experimental tests using a vehicle (Fig. 1) and a video camera have been conductedin real scenarios.

The rest of this paper are organized as follows. Section 2 presents techniques andfeatures used to identify the navigable region in the image. Section 3 presents theconcepts of artificial neural networks used in this work. Section 4 the experimentalresults obtained from tests in real environment is presented. At last, Section 5presents conclusion and future work.

2 Block-based Classification Method

Navigation in outdoor spaces is considerably more complex than in structuredindoor spaces. The terrain is composed by a variety of elements like grass, gardens,sidewalks, streets and gravel. These elements usually have different colors andtextures making possible the use of cameras to diferentiate them. The first step tobuild a vision-based outdoor navigation system is to classify outdoor spaces into twoclasses: navigable regions and non-navigable regions. The navigable regions are thesurfaces where a mobile robot can travel safely on. After the terrain classification,other algorithms available in the literature can perform path planning and obstacleavoidance [10].

J Intell Robot Syst (2011) 62:527–546 529

Fig. 2 Blocks generated offrame of road scene

A block-based classification method consists on dividing the image in blocks ofpixels and evaluate them as a single unit. A value is generated to represent this group,this value can be the average of the RGB, entropy and others features from collectionof pixels represented. In the grouping step, a frame resolution (M × N) pixels wassliced in groups with (K × K) pixels, as show Fig. 2.

Supposing an image represented by a matrix I of size (M × N). The ele-ment I(m, n) corresponds to the pixel in row m and collumn n of image, where(0 ≤ m < M) and (0 ≤ n < N). Therefore, block B(i, j) contains all the pixelsI(m, n) such that ( (i ∗ K) ≤ m < ((i ∗ K) + K) ) and ( ( j ∗ K) ≤ n < (( j ∗ K) +K) ). For each block, a feature value is calculated depending on the feature chosen.This strategy has been used to reduce the amount of image elements, allowing fasterprocessing.

2.1 Statistical Measures as Image Features

We use statistical measurements as features, such as mean, probability, entropy andvariance. Their definitions and equations are described below.

2.1.1 Shannon Entropy

In this work, texture analysis consists of calculating pixels entropy. In a simpleway, entropy can be defined as being the degree of regularity of a data set [11].Mathematically, Shannon entropy can be defined as follow:

E(X) = −∑

x∈X

p(x) log p(x) (1)

where p(x) is the probability of pixel x be in the collection. So, in this case, xcorresponds to the pixel and the block corresponds to the collection. Calculationdepends on the space colors and number of channel used.

530 J Intell Robot Syst (2011) 62:527–546

2.1.2 Energy

The energy value measures the presence of high values in relation to other values,and can be defined as:

ε =C−1∑

i=0

(p(x))2 (2)

where p(x) is the probability of pixel x be in the collection and C is the number ofcolors of image. For example energy of channel R of RGB has 256 colors).

2.1.3 Variance

Variance is a very known concept in statistics, it represents dispersion compared tothe average. It can be describe as:

σ 2 =C−1∑

i=0

(x − μ)2 ∗ p(x) (3)

where p(x) is the probability of pixel x be in the collection, μ is mean of the collectionand C is the number of colors in the image.

2.2 RGB Color Space

The RGB is a space color where each color can be defined by values of R (red), G(green) and B (blue) components [12]. The classification based on the color spacegenerates a feature with RGB pixel format. This feature is the weighted average ofthe pixel occurrence in block.

We also use the RGB entropy and energy as features. In order to obtain theentropy and energy value, the frequency of each pixel in the block is calculated. Foreach pixel with value x, p(x) is calculated by dividing the frequency of x by the totalnumber of pixel into block. Note that x and y are pixels in format RGB, x = y if andonly if:

– red of x equals red of y and,– green of x equals green of y and,– blue of x equals blue of y

2.3 HSV Color Space

The HSV color space is composed by hue (H), saturation (S) and value (V)(brightness) [13]. If the component saturation is zero, then hue can be disconsidered.

As in RGB, we generate average, entropy and energy of HSV. Where, x and y area pixel in format HSV, x = y if only if:

– hue of x equals hue of y and,– saturation of x equals saturation of y and,– value of x equals value of y

However, for this space color, we generate entropy, energy and variance fromeach channel independently. In other words, we generate also attributes such as hue

J Intell Robot Syst (2011) 62:527–546 531

Fig. 3 Network 1 × 5 × 1

entropy, saturation entropy, value entropy, addition to other measures previouslycommented. Another attribute generated was HAS, which is (hue + saturation)/2.This attribute has been generated in order to take advantage of the consistency ofthese two channels when they belong to a pixel of the street. The entropy value ofHAS was also used in this work.

3 Artificial Neural Networks as Classifiers

Artificial Neural Networks (ANN) are notorious for presenting very own propertiessuch as: adaptability, ability to learn by examples and ability of generalization. In thiswork, we have used a multilayer perceptron (MLP) [14], which is of a feedforwardneural network model that maps sets of input data onto specific outputs. We used theback propagation technique [15], which estimates the weights based on the amountof error in the output compared to the expected results.

In this work, we used one hidden layer with five neurons as show the Fig. 3. Allnetworks tested have only one neuron on output layer, which is enough to classifythe block as navigable (returning 1) or non-navigable (returning 0). However, thenetworks provided responses in decimal values between 0 and 1. For this reason wedefined responses as follow:

– if result ≤ 0.3 then the region is classified as non-navigable;– if result ≥ 0.7 then the region is classified as navigable;– if result > 0.3 and result < 0.7 then is classified as unknown; Notice that the

unknown classification is actually considered an error value.

The size of the input layer corresponds to the number of image attributes used.Therefore, the differences between the classifiers evaluated are the number ofattributes used and their combination. The networks were evaluated at every 100training cycles until reach 2,000 cycles, which has been enough to guarantee theirconvergence.

4 Experiments and Results

In order to analyse the various attributes combinations, several experiments havebeen carried out at the university campus. We have collected data in realisticenvironments under different conditions. More specifically, we recorded the pathtraversed by a vehicle in a diverse terrain through streets flanked with sidewalks,parking or vegetation. In addition, portions of the street had adverse conditions suchas sand and dirt (shown in Fig. 4).

532 J Intell Robot Syst (2011) 62:527–546

Fig. 4 Example of dirty roadused in the experiments

Our setup for the experiments was a car equipped with an A610 Canon digitalcamera. The image resolution was (320 × 240) pixels with 30 FPS. In order toexecute the experiments with ANNs, we used a Stuttgart Neural Network Simulator(SNNS) [16]. The OpenCV [17] library has been used in the image acquisition and tovisualize the processed results from SNNS. The block size used was K = 10, whichresulted in 768 blocks per frame.

We performed the experiments in two phases, in first phase we trained the ANNswith one simple frame, where the road is flanked with grass and a parking eachside. The frame used for evaluation was similar to the one used in the training step.The second phase was more complex, we used five frames with different conditionsfor training step and evaluated with fifteen frames (the five of training step + tenother frames). We separated the work into two phases because the total numberof classifiers (feature combinations) was very large, approximately 28,000. The firstphase eliminated the combinations of attributes that did not obtain satisfactoryresults, reducing the number of candidates on the second phase, which has a morecomplex analysis.

4.1 Phase 1

In Phase 1 we tested combination of 21 features: average R (red), average G (green),average B (blue), RGB entropy, average H (hue), average S (saturation), averageV (value), HSV entropy, H entropy, S entropy, V entropy, H variance, S variance,V variance, RGB energy, HSV energy, H energy, S energy, V energy, HAS andHAS entropy. Each feature corresponds to a neuron in the input layer of ANN.Therefore, we tested different combinations with one attribute, two attributes, three,four and five attributes, thus, networks with one, two, up to five neurons in inputlayer. Totaling in 27,890 different classifiers evaluated.

The frames used in this phase are showed in Fig. 5. An important detail aboutthis stage is that only the blocks below the horizon line were used to both train andevaluate, thus each frame generated only 480 blocks—can be seen in viewing of the

J Intell Robot Syst (2011) 62:527–546 533

(a) Frame used for training (b) Frame used for evaluation

Fig. 5 Frames used in Phase 1

classifier response (Fig. 6). This is due to the fact that much of the top of the imagerepresents sky, which can be eliminated with a pre-processing [18].

Among the results obtained from all classifiers tested, 16,976 classifiers achievedhit rate between 90 and 98%, being about one thousand classifiers reached approx-imately 98%. It is important to notice that the hit rate is a percentage of the 480blocks from the assessment frame, which means that classifier with hit rate of 98%did not classify correctly around 10 blocks. As the number of feature combination isstill high we executed the second phase reviewing only the classifiers that got hit rate90% or more.

4.2 Phase 2

Based on the results obtained on Phase 1, we evaluated 16,976 classifiers in differentstreet conditions. The evaluation method and settings were the same used in Phase 1.For this experiment, we used patterns generated from five frames in the training

(a) Best result (b) Evaluating of best result

Fig. 6 Classification results: a shows blocks classified non-navigation—as magenta—andnavigation—as cyan—classifier responses. b Shows correct, false-negative, false-positive and un-known classification in green, blue, red and yellow, respectively

534 J Intell Robot Syst (2011) 62:527–546

(a) Frame 1 (b) Frame 2 (c) Frame 3

(d) Frame 4 (e) Frame 5

Fig. 7 Frames used for training of Phase 2

step—the frames can be seen in Fig. 7. For the evaluation stage, we used patternsgenerated from 15 frames. The frames used in evaluating step are shown in Fig. 8.Among this frames, it can be seen scenes of curves, dirty road and streets with nodefined edges.

Among the results obtained in Phase 2, 5,967 feature combinations achieved hitrate 90% or more. One important detail is that in Phase 1 there were classifiers thatreached hit rate 98%, while in Phase 2 the best result obtained was 93%. Amongthese 5,967 classifiers, we made two analysis. We analised the best five results thatreached approx. 93% and we analyzed the number of times that a subset of attributesappeared in these results.

Two analysis have been done to the classification results:

– General analysis is the average error/hit rate obtained from all the test framesevaluated.

– Analysis per frame is the error/hit rate of a sinle frame compared to the error/hitrate and standard deviation of the complete set of test frames.

This allows us to know whether the errors of a given classifier are concentrated infew frames or are spread along all the evaluated frames.

4.3 Analysis of the Best Features Combinations

Analysing the results from all classifiers evaluated, 5,967 achieved hit rate 90% ormore in “general analysis”. We discuss the best five results that reached 93%, others

J Intell Robot Syst (2011) 62:527–546 535

(a) Frame 6 (b) Frame 7 (c) Frame 8

(d) Frame 9 (e) Frame 10 (f) Frame 11

(g) Frame 12 (h) Frame 13 (i) Frame 14

(j) Frame 15

Fig. 8 Frames used for evaluation of Phase 2

classifiers reached more that 90% and less 93%. This classifiers have the followingattributes:

– Classifier 1: B average, RGB entropy, V entropy, S variance, S energy.– Classifier 2: B average, HSV entropy, S entropy, V entropy, S energy.– Classifier 3: R average, B average, H average, V entropy, HSV energy.– Classifier 4: R average, G average, H average, V entropy, HAS entropy.– Classifier 5: R average, H average, H entropy, V entropy (this configuration have

only four attributes).

536 J Intell Robot Syst (2011) 62:527–546

Since the scenes used in the tests are considerably different, it is possible that theclassification of some frames has bad results (a hit rate of less than 90%) while othershave a hit rate near 100%. Therefore, it is convenient to do a visual analysis of theresults of the classifiers for each scene used in the tests. A short description of thebehavior of each classifier is present below:

4.3.1 Classif ier 1

This classifier misclassified almost all the blocks representing the parking space. Itobtained significant errors on the sidewalk and traffic lane. But it got good resultsin the dirty road (Fig. 9a). Overall, the errors were well distributed over the framesexcept in cases of parking and sidewalk (Fig. 9b).

4.3.2 Classif ier 2

This classifier completely misclassified the blocks representing the parking space(Fig. 10a) and walkways (Fig. 10b). In the dirty road, very few blocks were classifiedas unknown (Fig. 10c). The classifier obtained a very good performance in the otherscenes (Fig. 10d).

4.3.3 Classif ier 3

Among all the five classifiers analyzed, this obtained the highest hit rates in theparking areas (Fig. 11a) and sidewalks. Unlike the others, got reasonable results inthe traffic lanes (Fig. 11b). This classifier also had a good performance in the dirtyroads.

4.3.4 Classif ier 4

This classifier obtained several errors in the parking area but most of these errorscorresponds to unknow regions (Fig. 12a). It had as many mistakes as the previousclassifier in the traffic lanes and achieved the second best mark in the dirty roads.The classifier had a good performance in the other scenes (Fig. 12b).

(a) Analysis of frame 14. (b) Analysis of frame 11.

Fig. 9 Results obtained by the Classifier 1 responses: show correct, false-negative, false-positive andunknown classification in green, blue, red and yellow, respectively

J Intell Robot Syst (2011) 62:527–546 537

(a) Analysis of frame 1. (b) Analysis of frame 11.

(c) Analysis of frame 5. (d) Analysis of frame 2.

Fig. 10 Results obtained by the Classifier 2 responses: show correct, false-negative, false-positiveand unknown classification in green, blue, red and yellow, respectively

4.3.5 Classif ier 5

As the second classifier, this classifier completely misclassified the blocks represent-ing the parking spaces(Fig. 13a) and walkways. But obtained good performance in

(a) Analysis of frame 1. (b) Analysis of frame 6.

Fig. 11 Results obtained by the Classifier 3 responses: show correct, false-negative, false-positiveand unknown classification in green, blue, red and yellow, respectively

538 J Intell Robot Syst (2011) 62:527–546

(a) Analysis of frame 1. (b) Analysis of frame 8.

Fig. 12 Results obtained by the Classifier 4 responses: show correct, false-negative, false-positiveand unknown classification in green, blue, red and yellow, respectively

traffic lane and dirty roads (Fig. 13b). Worth remembering that this classifier hasonly four image attributes as input.

All classifiers had problems, in different degrees, at the road edges and trafficlanes due to loss of precision of the block based method and different colors oflanes, curbs and road. Also, many classifiers had problems with parking areas dueto similar color and texture of streets. In general, all classifiers obtained reasonableclassification of the main portion of the street where the car can travel. The graphicshowed in Fig. 14 describes the error rate per frame for the best five classifiers. Thisanalysis shows that classifiers with same hit rate can have differents responses.

Another conclusion to be drawn is that these five classifiers have some attributesin common. Since many classifiers achieved acceptable performance, we can analyzethe frequency which combinations of attributes were used. From this analysis we canknow the real contribution of each attribute in the classifier.

(a) Analysis of frame 7. (b) Analysis of frame 14.

Fig. 13 Results obtained by the Classifier 5 responses: show correct, false-negative, false-positiveand unknown classification in green, blue, red and yellow, respectively

J Intell Robot Syst (2011) 62:527–546 539

Fig. 14 Analysis per frame

4.4 Frequency Analysis

In this analysis we considered acceptable classifiers that achieved a hit rate of 90% ormore, which were approximately five thousand. Instead of analyzing one by one, wedecided to determine the common features among them. The classifiers with thesesubsets were retrained and revaluated. Based on these new results it was possible tosee the contribution over combinations of attributes in the ANN from classifier.

Among all the 5,967 classifiers that have been considered successful, there areclassifiers that use five, four, three, two and even one attribute as input. So, it hasbeen counted how many times a determined subset is used as input for acceptableclassifiers.

Evaluating subsets of one element, the attribute that obtained the best result was“V entropy”, appearing 2,740 times, all others appeared less than two thousand times.We analysed subsets of five, four and three atributes from all classifiers. Table 1shows the three subsets of four elements that appeared more frequently. Thesesubsets appeared 18 times, which is the best result possible, since there are only 17attributes to be the fifth element or not have it. This means that this combinationyielded good performance independently of the fifth attribute used.

Table 1 Attributes thatappeared more frequently

B average Hue average Hue entropy Value entropyHue average Hue entropy Value entropy HSV energyHue average Hue entropy Value entropy HAS entropy

540 J Intell Robot Syst (2011) 62:527–546

Note that these subsets have Hue average, Hue entropy, Value entropy incommon, which is the subset of three elements most used—appeared 146 times.Based on attributes from the Table 1, we reviewed all the classifiers that used theseattributes. More specifically, we retrained the classifiers showed in Table 2 (averageof ten executions), where the columns AT are attributes, the column AVE is averagehit rate from “general analysis”, the column SD is standard deviation for AVE, APFis average error rate from “analysis per frame” and column SDF is standard deviationfor APF.

From the results presented in Table 2, we can notice that the top ten from “generalanalysis” are also in the top ten from “analysis per frame”. Note that the bestclassifier has only 6.93% error rate per frame with a low standard deviation, so forall frames tested, this classifier missed less than 10%, which can be considered agood performance. However, is necessary determine how these errors are displayedin frame, because misclassified blocks grouped are more harmful than the blocksscattered in the region of interest.

For example, the classification of Frame 1 did not obtain satisfactory results withBlue average, H average, H entropy, V entropy and HAS entropy, H average, Hentropy, V entropy. Their error rates for this frame are similar, but the classifier withBlue is better than classifier with HAS entropy because the classifier with Blue hasclassified the region of the park as unknown—show in Fig. 15a—while the classifierwith HAS entropy classified as navigable—see Fig. 15b.

Table 2 Evaluation of the most contributing attributes

Attribute 1 Attribute 2 Attribute 3 Attribute 4 AVE SD APF SDF

Blue average H average H entropy V entropy 92 0.94 6.93 3.32HAS entropy H average H entropy V entropy 91.7 0.82 7.74 3.68Blue average H entropy V entropy 91.6 0.7 7.28 4.49Blue average V entropy 91.5 0.53 7.38 4.59HSV energy H average H entropy V entropy 91.2 0.63 7.74 3.62

H average H entropy V entropy 91.0 0.47 8.18 3.21HAS entropy H average V entropy 91.0 0.0 8.54 3.59HSV energy H average V entropy 90.5 0.53 8.72 3.59

H average V entropy 90.3 0.48 8.56 3.73Blue average H average V entropy 90.2 1.32 7.39 4.3Blue average H average H entropy 89.8 0.63 9.04 3.8

H average H entropy 89.0 0.67 10.0 3.11HAS entropy H average 89.0 0.0 11.93 4.08HSV energy H average H entropy 89.0 0.0 11.22 2.98Blue average H average 88.9 0.99 10.17 3.74Blue average H entropy 88.8 0.79 10.97 4.75HAS entropy H average H entropy 87.7 7.27 9.29 4.25HSV energy H average 87.5 0.85 13.31 4.41HAS entropy H entropy V entropy 78.3 2.71 19.86 5.95HAS entropy H entropy 74.5 0.85 23.82 6.94HAS entropy V entropy 70.6 0.7 28.15 5.45HSV energy H entropy V entropy 67.5 0.53 31.47 8.59

H entropy V entropy 65.7 0.48 33.54 10.38HSV energy V entropy 63.5 0.71 36.92 8.86HSV energy H entropy 59.3 2.11 42.22 7.24

J Intell Robot Syst (2011) 62:527–546 541

(a) Answer of classifier with blue average. (b) Answer of classifier with HAS entropy.

Fig. 15 Classification results: blocks classified non-navigation—as magenta—and navigation—ascyan—classifier responses. The color yellow represents classification unknown

(a) Sidewalk as unknown. With blue average (b) Sidewalk as navigable. Without blue average

(c) Errors in dirty road. With blue average it does not classify dirty road as navigable.

(d) Errors in dirty road. Without blue average.

Fig. 16 Classification results: show blocks classified non-navigation—as magenta—, navigation—ascyan—classifier responses and unknown classification as yellow

542 J Intell Robot Syst (2011) 62:527–546

Analyzing the results of top four classifiers, we concluded that the classifiers with(blue average) tend to classify the parking lot and the sidewalks as unknown—showed in Fig. 16a—while the classifiers without (blue average) classified asnavigable—showed in Fig. 16b. However, these classifiers obtained more errors thanclassifiers without feature blue average on the dirt roads as seen in the Fig. 16c.

Another conclusion to be drawn is that the subset (hue average and value entropy)has a good performance (appearing eight times in the top ten). It can also be seen thatthese two attributes when combined with another, slightly improve its performance.Due to this fact, we reanalyzed all classifiers of up to three elements that have asinput at least these two attributes. Table 3 presents the same columns of the Table 2,but with other classifiers.

From the results shown in Table 3, we can notice that the top six from “generalanalysis” are also in the top six from “analysis per frame”. In addition, all classifiersof Table 3 have achieved good results, except the classifier 20. This demonstrates thatHue average and Value entropy are adequate attributes to be used in classifying animage of a road scene.

In general, classifiers with blue average or hue entropy or saturation entropy,obtained better results in parking lots and sidewalks, however they missed a largeproportion in the dirt roads. It can be concluded that classifiers incorporating the dirtas navigable region also include the sidewalks and parking lots because of the similarcolor and texture. The attribute blue average helps reducing the similarity betweendirty street and sidewalks but it is not enough to classify them as non-navigable. Ifwe assume that all block classified as unknown are non-navigable then the classifiercan be used in road following algorithm with good results.

A good overall classification performance can be seen in Fig. 17, with some errosin the traffic lanes and edges of sidewalks. The most significant error occurred in the

Table 3 Evaluation of Hue average and Value entropy

AT 1 AT 2 AT 3 AVE SD AFP SDF

1 H average V entropy G average 91.3 1.1 7.54 4.02 H average V entropy B average 91.1 0.83 7.33 4.213 H average V entropy H entropy 91.1 0.3 7.54 3.374 H average V entropy V average 90.9 1.58 6.86 4.635 H average V entropy S entropy 90.8 0.4 7.63 3.876 H average V entropy S variance 90.8 0.6 7.82 4.437 H average V entropy HAS entropy 90.8 0.4 9.0 3.328 H average V entropy H variance 90.7 0.9 7.9 4.019 H average V entropy HSV entropy 90.7 0.46 8.14 3.9310 H average V entropy S average 90.6 1.02 8.24 2.6911 H average V entropy RGB entropy 90.6 0.49 8.38 3.9612 H average V entropy HSV energy 90.6 0.49 8.58 3.8513 H average V entropy RGB energy 90.4 0.49 8.51 3.814 H average V entropy H energy 90.4 0.49 8.64 3.7415 H average V entropy R average 90.4 1.5 9.49 6.1616 H average V entropy 90.3 0.46 8.46 3.8917 H average V entropy V energy 90.3 0.64 8.74 3.7618 H average V entropy S energy 90.2 0.4 8.78 4.0119 H average V entropy HAS average 89.8 0.6 11.01 6.2920 H average V entropy V variance 88.0 7.01 7.86 4.38

J Intell Robot Syst (2011) 62:527–546 543

(a) Frame 1 (b) Frame 6 (c) Frame 2

(d) Frame 7 (e) Frame 3 (f) Frame 8

(g) Frame 9 (h) Frame 10 (i) Frame 4

(j) Frame 11 (k) Frame 12 (l) Frame 13

(m) Frame 5 (n) Frame 14 (o) Frame 15

Fig. 17 Classification results: show blocks classified non-navigable—as magenta—, navigable—ascyan—classifier responses and unknown classification as yellow

544 J Intell Robot Syst (2011) 62:527–546

Fig. 18 Combining of fiveresponses of frame 6. Blocksclassified with navigable areain red

dirt road, where the region in the middle of the road is classified as non-navigablewhich can be expected due to the similarity in color to the sidewalk and plats.

4.5 Combined Classification

Based on the analysis presented in the previous section, we can notice that severalclassifiers have almost the same hit rates, but with classification errors in differentparts of the image and in different proportions along the path traveled by the vehicle.This is due to using or not certain attributes as input of the classifier, making it moreor less sensitive to certain parts of the environment. Thus, we can build a more robustclassifier if we combine the responses of the various classifiers.

Results obtained from the average of responses from each of the five bestclassifiers of (Section 4.3) are shown in Fig. 18. This average is calculates for eachblock in the image. The top left image is the result of the average of five best

Fig. 19 Combining of fiveresponses of frame 11. Blocksclassified with navigable areain red

J Intell Robot Syst (2011) 62:527–546 545

Fig. 20 Combining of fiveresponses of frame 12. Blocksclassified with navigable areain red

responses, the top center image shows the degree of certainty about the navegability.In this image, the classification is represented in gray scale where white representsthe value 1 (navigable) and the black represents the value 0 (non-navigable), andvalues between 0 and 1 are shown in grayscale. The top right image is the actualimage. The five others images of the second and third rows are the results obtainedfrom each classifier.

We can see errors in individual responses, for example, blocks considered naviga-ble in the vegetation and in some places in the sky. However, by using the average,these errors are suppressed. This is because, in several cases the different classifiers,did not obtain wrong classification in the same regions of the image.

To measure the efficiency of the combined classifier, we can compare the errorrate per frame of the Fig. 18 which was 4.8%, with error rate of individual classifiersshown in graphic of Fig. 14. We can notice that the result is very close to the bestresponse obtained. The same happens the frame 11 with an 10.6% of error shown inFig. 19 and the frame 12 shown in Fig. 20 with an error rate equal to 4%.

The results obtained can be considered satisfactory and the reason for the behav-ior of the combined response stay very close to the best responses in most cases isthat the errors of the classifiers, for the most part, happens because of uncertaintyof the response—very close to maximum limits set 0.3 and 0.7. When calculates theaverage, the classifier that has more certainty tends to influence the overall response.

5 Conclusions and Future Works

Autonomous navigation is one of the main capabilities of autonomous robots.This paper addresses the problem of identify navigable areas in the environmentusing artificial neural networks and vision information. Different combinations ofattributes have been evaluated in realistic environments.

In general the results were satisfactory, since many classifiers have obtained agood rate of success. Furthermore, classifiers obtained good classification of the mainportion of the street where the car can travel, the region of interest. The block-based

546 J Intell Robot Syst (2011) 62:527–546

method ensures an increase in performance, however it causes errors at the roadedges and the traffic lane in different proportions. By combining the responses ofmultiple classifiers, many of these errors were reduced.

As future work we plan to evaluate others image features and more complexenvironments. We also plan to integrate our approach with laser mapping, whichprovides depth information.

Acknowledgements The authors acknowledge the support granted by CNPq and FAPESP to theINCT-SEC (National Institute of Science and Technology—Critical Embedded Systems—Brazil),processes 573963/2008-9 and 08/57870-9.

References

1. Arkin, R.C.: An Behavior-based Robotics. MIT Press, Cambridge (1998)2. Wolf, D., Sukhatme, G., Fox, D., Burgard, W.: Autonomous terrain mapping and classification

using Hidden Markov models. In: Proceedings of the 2005 IEEE International Conference onRobotics and Automation, 2005. ICRA 2005, pp. 2026–2031 (2005)

3. He, Y., Wang, H., Zhang, B.: Color-based road detection in urban traffic scenes. IEEE Trans.Intell. Transp. Syst. 5(4), 309 (2004). doi:10.1109/TITS.2004.838221

4. Broggi, A., Bert, S.: Vision-based road detection in automotive systems: a real-time expectation-driven approach. J. Artif. Intell. Res. 3, 325 (1995)

5. Rotaru, C., Graf, T., Zhang, J.: Extracting road features from color images using a cognitiveapproach. In: Intelligent Vehicles Symposium, 2004, pp. 298–303. IEEE (2004). doi:10.1109/IVS.2004.1336398

6. Zhang, J., Nagel, H.H.: Texture-based segmentation of road images. In: Proceedings of theIntelligent Vehicles ’94 Symposium, pp. 260–265 (1994). doi:10.1109/IVS.1994.639516

7. Ghurchian, R., Takahashi, T., Wang, Z., Nakano, E.: On robot self-navigation in outdoor envi-ronments by color image processing. In: 7th International Conference on Control, Automation,Robotics and Vision, 2002. ICARCV 2002, vol. 2, pp. 625–630 (2002)

8. Pomerleau, D.: Neural network vision for robot driving. In: Arbib M. (ed.) The Handbook ofBrain Theory and Neural Networks (1995)

9. Foedisch, M.: Adaptive real-time road detection using neural networks. In: Proc. 7th Int. Conf.on Intelligent Transportation Systems. Washington DC (2004)

10. Choset, H., Lynch, K.M., Hutchinson, S., Kantor, G.A., Burgard, W., Kavraki, L.E., Thrun, S.:Principles of Robot Motion: Theory, Algorithms, and Implementations. MIT Press, Cambridge(2005)

11. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423,623–656 (1948)

12. Joblove, G.H., Greenberg, D.: Color spaces for computer graphics. SIGGRAPH Comput.Graph. 12(3), 20 (1978). doi:10.1145/965139.807362

13. Reiter, C.: With J: image processing 2: color spaces. SIGAPL APL Quote Quad 34(3), 3 (2004).doi:10.1145/1127556.1127557

14. Churchland, P.S., Sejnowski, T.J.: The Computational Brain. MIT Press, Cambridge (1994)15. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Neurocomputing: Foundations of Research,

pp. 673–695 (1988)16. University of Stuttgart: http://www.ra.cs.uni-tuebingen.de/SNNS/ (2010). Visited in March 201017. Bradski, G., Kaehler, A.: Learning OpenCV: Computer Vision with the OpenCV Library.

O’Reilly, Cambridge (2008)18. Lee, J., Crane III, C.D., Kim, S., Kim, J. (eds.): Road Following in an Unstructured Desert

Environment using Monocular Color Vision as Applied to the DARPA Grand Challenge.International Conference on Control, Automation and Systems (2005)