A model-based approach to the automatic extraction of linear features from airborne images

A Model-Based Approach to the Automatic Extraction of Linear Features

from Airborne Images, using Mathematical Morphology and MRF theory

A. Katartzis, H. Sahli

Technical Report: IRIS-TR-0062, VUB-ETRO department, 2000.

IRIS: Information Retrieval and Interpretation Sciences

A member of: ETRO(Electronics and Information processing)

IMEC(Interuniversity Micro Electronics Centre)

Address: Antonis Katartzis

VUB(ETRO), Pleinlaan 2, 1050 Brussels, Belgium

Tel.: ++32-2-629 2858

Fax.: ++32-2-629 2883

Electronic mail: [email protected]

0

A Model-Based Approach to the Automatic Extraction of Linear

Features from Airborne Images, using Mathematical Morphology

and MRF Theory.

A. Katartzis, H. Sahli

1 Introduction

The identi�cation of linear features by means of digital image analysis is a generic task in the �eld of

remote sensing. These linear features can be either roads, bridges, vegetation alignments or di�erent

geological formations. Their detection can be used in several applications such as image coregistration,

cartographic applications and geomorphologic studies. Several approaches have been proposed in the

literature, most of them dealing with the problem of road extraction from either synthetic aperture radar

(SAR) images or optic (visible range) images. Usually they are based on two criteria: a local criterion

involving the use of local operators and a global criterion incorporating additional knowledge about the

structure of the objects to be detected. The methods based on local criteria evaluate local properties

on the image by using either an edge or line detector [1] [2] [3] [4] or morphological operators [5]. The

performance of these methods can be greatly increased by using techniques that introduce some global

constraints in the image analysis process. These techniques lead to an optimal solution through the

minimization of a cost function by using either dynamic programming [3] [6], tracking methods [7] or the

Bayesian framework [4] [8] [9].

We propose a method that combines both local and global criteria for the identi�cation of the medial

axis of roads and paths in aerial images [10]. Our work is part of the European pilot project 'Airborne

Mine�eld Detection in Mozambique', and its main objective is the identi�cation of mine�eld indicators

that correspond to linear structures, which designate safe passage areas [11]. The proposed method

is model-based and follows certain assumptions concerning the geometry and radiometry of the linear

structures of interest. It is based on the combination of the concepts of two previously published methods

for road detection using SAR images (Chanussot and Lambert [5], Tupin et al. [4]). Chanussot and

Lambert proposed a simple and fast unsupervised method for road network extraction, based on a series

of morphological processing steps that do not require any threshold. The only parameters that have

to be set correspond to the model dimensions of the feature which has to be extracted. Unfortunately,

the lack of contextual knowledge in [5] results in partial detection of the road network, together with

several spurious detections. On the other hand, the work of Tupin et al. [4] uses both local and global

techniques for linear feature extraction. The �rst part of their algorithm performs a local detection of

linear structures based on the fusion of the results from two line detectors, both taking into account the

statistical properties of speckle in SAR images. The masks of the line detectors have widths ranging

from 1 to a maximal number of 5 pixels [4]. The produced candidate road segments, together with an

additional set of segments that correspond to all possible connections between them, are then organized

as a graph and the road identi�cation is solved by the extraction of the best graph labeling based on a

MRF model for road like structures and a maximum a posteriori (MAP) criterion.

Our approach consists of two steps. During a local analysis step, the detection of elongated structures

is performed using a set of morphological operators, similar to the ones proposed in [5], and a dedicated

algorithm for line segment extraction. We introduce some modi�cations that enhance the performance

of the morphological �ltering in the case of heavily noisy environments and partially disconnected roads.

The local analysis scheme can be extended for a wide range of images and its computational complexity

is relatively small, even in cases where the widths of the roads vary from a few pixels, up to more than

20 pixels. A segment linking process is then performed in the global analysis step of our method. This is

based on the Bayesian framework of Tupin et al [4]. We present some modi�cations that improve the road

model, and make the process more robust and exible. These changes include the incorporation of a new

observation measure that re ects more eÆciently the likelihood value of each segment as belonging or not

to a road, a di�erent formulation of the potential functions that describe the probability distributions of

1

our model, and the reduction of the number of potential parameters.

The report is organized as follows. Section 2 presents a short description of our data set of images.

In the beginning of section 3 we present the main concepts of mathematical morphology. The remainder

of this section describes the morphological approach of [5] for road detection and our local analysis

scheme. In section 4 we present the MRF model-based formulation for road identi�cation and describe

the di�erences between our line grouping scheme and the approach of [4]. The validation of our method,

which includes the parameter setting and some representative results, is presented in section 5. Finally

a discussion and directions for future research are given in section 6.

2 DATA DESCRIPTION

Our work is part of the European pilot project 'Airborne Mine�eld Detection in Mozambique' (REG/661-

97/2, DG VIII). We have investigated the role of image processing for the delineation of mine�elds, by

using indirect methods based on several mine�eld indicators; these can be either man-made objects close

to the mine�eld (sticks, wires, concrete blocks, fences, paths) or the absence of recent human activities

inside the suspected areas [11].

In this report, our main objective is the identi�cation of mine�eld indicators that correspond to linear

structures like road and paths which can designate safe passage areas. The airborne images were taken

from two regions in Mozambique (Songo, Bandua) at two di�erent scales (1:2000 and 1:3000). They were

acquired using two Leica RC30 cameras, equipped with colour and colour IR �lms. For their digitization,

a sampling distance of 15�m and a 24-bit quantization is chosen. The images were then subsampled

(coarse sampling) to form a 512 x 512 matrix, corresponding to a ground resolution of approximately

0.22m/pixel for the scale 1:2000 and 0.31m/pixel for the scale 1:3000, respectively. Our linear feature

detection algorithm is applied on the 8-bit monochromatic version of each type of image. A representative

image from each of the two Mozambique regions, Bandua and Songo, is shown in Fig. 1 and Fig. 2.

Figure 1: Original airborne image (Bandua region in Mozambique: scale 1:2000; ground resolution 0.22m)

3 LOCAL ANALYSIS

3.1 Mathematical morphology.

Mathematical morphology can be de�ned as a theory for the analysis of shape and form of objects, using

set theory, integral geometry and lattice algebra [12]. It is based on a number of non-linear operators

that simplify images, and preserve the main shape characteristics of objects. A morphological operator

is given by the relation of the image (denoted by f) with another small point set B called a structuring

element (SE). There are two classes of structuring elements: at SE's that correspond to two dimensional

unweighted masks and non- at SE's that are small gray tone images, corresponding to weighted masks.

2

Figure 2: Original airborne image (Songo region in Mozambique: scale 1:3000; ground resolution 0.31m)

The shape and size of each SE must be adapted to the geometrical properties of the image objects to

be extracted. For instance, linear structuring elements are suited for the extraction of linear objects. In

the following, we brie y summarize the main morphological operators, based on at SE's, that are used

in the context of our work. Since roads are assumed to be brighter compared to their surrounding, we

describe the morphological operators with respect to their e�ect on bright image structures or regions.

� Dilation

The dilation of an image f by a SE B is denoted by ÆB(f) and is de�ned as the maximum (supre-

mum) of the translations of f by the vectors �b of B: ÆB(f) =Wb2B f�b. In other words, the

dilated value at a given pixel x is the maximum value of the image in the window de�ned by the

SE, when its origin is at x: [ÆB(f)](x) = maxb2Bf(x+ b).

� Erosion

The erosion of an image f by a SE B is denoted by "B(f) and is de�ned as the minimum (in�mum)

of the translations of f by the vectors �b of B: "B(f) =Vb2B f�b. The erosion of an image removes

all structures that cannot contain the SE and shrinks all the others.

� Opening

The opening of an image f by a SE B is denoted by B(f) and is de�ned as the erosion of f by B

followed by dilation with the transposed SE �B: B(f) = Æ �B ["B(f)]. Opening generally smoothes

the contour of an image region, breaks narrow isthmuses, and eliminates thin protrusions.

� Closing

The closing of an image f by a SE B is denoted by �B(f) and is de�ned as the dilation of f by B

followed by erosion with the transposed SE �B: �B(f) = " �B[ÆB(f)]. Closing connects objects that

are close to each other, �lls up small holes, and smoothes the object outline by �lling up narrow

gulfs.

� Top-Hat

The top-hat operator extracts objects that follow a size and shape criterion. It isolates structures

that have not been eliminated by the opening or by the closing.

- top-hat by opening (white top-hat): WTHB(f) = f � B(f).

- top-hat by closing (black top-hat): BTHB(f) = �B(f)� f .

� Geodesic transformation (dilation/erosion)

The previously de�ned morphological operators involve the combination of one input image with

a speci�c SE. The geodesic transformation considers two input images. A morphological operator

is applied on the �rst image (marker). The pixel values of the resulting image are then forced

3

to remain either greater than or smaller than the corresponding pixel values in the second image

(mask). The geodesic dilation of size n of a marker image f with respect to a mask image g (f � g)

is obtained by iterating n elementary geodesic dilations Æ(1)

g;B(f) = ÆB(f) ^ g where ^ stands for

point-wise minimum.

� Morphological reconstruction

- Reconstruction by dilation/erosion.

The reconstruction by dilation, denoted by Rg;B(f), of a mask image g from a marker image f

is de�ned as the geodesic dilations of f with respect to g until stability is reached (i.e. until the

propagation of the marker image is totally impeded by the mask image): Rg;B(f) = Æ(n)g;B(f), where

n is such that Æ(n)g;B(f) = Æ

(n+1)g;B (f). The reconstruction by erosion is equivalent to the complement

of the reconstruction by dilation of gc from fc.

- Opening/closing by reconstruction.

The opening by reconstruction of size n of an image f is de�ned as the reconstruction of f from the

erosion of size n of f : (n)R;B(f) = Rf;B("

(n)B (f))]. Contrary to the morphological opening, opening

by reconstruction preserves the shape of the components that are not removed by the erosion. The

closing by reconstruction is the dual transformation of opening by reconstruction.

� Watershed transformation

The watershed transformation is a widely used tool in the �eld of mathematical morphology to

solve problems related to segmentation. Let us represent the image f as a topographic surface.

We plunge this surface into a lake. The water �rst gets into the holes located at the local image

minima and gradually oods the whole surface. We erect a dam at any point where water coming

from two minima merges. At the end of the plunging procedure, when the surface is completely

immersed, the �nally built dams constitute the dividing lines of the water coming from di�erent

minima. These dividing lines are called the watersheds of f . The di�erent pools separated by the

watersheds cover particular zones of f . Each of these zones is called a catchment basin associated

with a particular minimum. The watershed transformation can be very useful for the extraction of

the skeleton of objects, that is made of the crest lines dividing two disconnected basins.

3.2 Soft mathematical morphology.

The minimum-maximum operations, used in the typical mathematical morphology, are substituted by

weighted order statistics, in the case of soft mathematical morphology. Moreover, the structuring elements

B are non- at and are divided in two parts: the hard center (B1) and the soft boundary (B2). It has been

proven [13][14] that the soft morphological �lters are less sensitive to additive noise and small variations in

the shape of the objects, than the typical morphological �lters. They eliminate the negative and positive

noise and, at the same time, preserve the details in the image.

In the soft morphological operations of erosion and dilation, the pixels of the image are combined with

the pixels of the structuring element, in the same way as in the typical morphology. The results related

to the soft boundary and the results related to the hard center (repeated k-th times) are ordered in a

increasing/decreasing sequence. The k-th element of this sequence is the result of the soft erosion/dilation.

The number k is called the order index. The soft dilation of a gray level image f , using a soft SE [�; �; k]

is [15]:

f � [�; �; k](x) = max(k)(fk � (f(x� y) + �(y))g [ ff(x� z) + �(z)g)

; where (x� y); (x� z) 2 F; y 2 B1; z 2 B2(1)

The soft erosion of f , using [�; �; k] is:

f [�; �; k](x) = min(k)(fk � (f(x+ y)� �(y))g [ ff(x+ z)� �(z)g)

; where y 2 B1; z 2 B2(2)

The notations used in (1) and (2) are the following:

min(k) and max

(k) is the k-th maximum and the k-th minimum element, respectively,

x, y 2 Z2, the spatial coordinates,

f : F ! Z, the gray level image,

� : B1 ! Z, the hard center of the soft SE,

4

� : B2 ! Z, the soft boundary of the soft SE,

F;B1; B2 � Z2, the supports of the gray level image, the hard center and the soft boundary of the SE,

respectively.

The mathematical expression k � (:::) denotes a repetition of the operation (:::) k times.

The results of the soft dilation and soft erosion are more 'abrupt' as the order index k decreases. By

using high values for k, the soft morphological operations retain better the shape characteristics of the

image. Nevertheless, if k > Card(B2) the morphological operations are in uenced only by the hard

center of the SE and the nature of soft morphology is lost. Shin and Pu [15] have proposed the following

restriction for the index order k: k � minfCard(B)=2; Card(B2)g.

3.3 The morphological approach of Chanussot and Lambert for road network extraction

[5]

The approach of Chanussot and Lambert for road detection in SAR images [5] is based on a geometrical

and radiometrical road model. The model is based on three assumptions: 1) the roads appear on the

image as thin, elongated structures with a maximum width wmax, 2) they are locally rectilinear, with

each road pixel belonging to a line segment that is longer than a minimum length `0, and 3) each

road segment is considered as a dark structure with respect to its surroundings. All this information is

integrated and extracted using mathematical morphology. A series of morphological operators is used

in order to retain elongated structures with a speci�c width. The sequence of morphological �ltering

consists of: (a) an opening by reconstruction (removal of non at peaks), (b) a directional closing in

40 successive directions (removal of non-linear valleys), (c) an opening (removal of remaining peaks),

and (d) a closing Top-hat operator (removal of wide valleys). At every step, at SEs have been used,

and their size is speci�ed according to the a priori information about the road's maximum width and

curvature. At a last stage, the roads are extracted by a simple thresholding applied to the response of the

morphological operators. Unfortunately, this approach yields incomplete detection of the road network

and several spurious detections.

3.4 The proposed scheme for local analysis

3.4.1 Morphological �ltering

We retain the advantages of the operators of [5], and at the same time introduce some modi�cations that

�rst enhance their performance under noise, and secondly, produce additional outputs that are used in

the next steps of our method. We suppose that the roads satisfy the �rst two assumptions of the road

model of section 3.3. As far as the third assumption is concerned, because of the nature of our images we

consider the roads as being bright structures with respect to their background. This simply implies that

each morphological operation must be replaced by its dual counterpart (closing is replaced by opening

and vice-versa).

In our case, the structuring elements used in the morphological �ltering steps (a), (c) and (d) are

squares with sizes wmax=4, wmax=4 and wmax, respectively. One of the proposed modi�cations refers

to the directional operator of step (b). According to [5], the elimination of non-linear structures can

be obtained by using a directional opening in a speci�ed number of successive directions. The resulting

value at each pixel should be the supremum of all these directional openings. Unfortunately, the minimum

and maximum operations of the standard morphological opening are very sensitive to noise and small

variations in the shape of the objects (these changes can be small gaps in the road segments). In order

to overcome this problem, we use a soft morphological �lter based on weighted order statistics. This

operator is a soft opening with an order index equal to 5 and a linear, non- at SE of size `0, successively

oriented in 32 di�erent directions. The SE consists of a hard center with a value equal to `0=2+ 1 and a

soft boundary with linearly decreasing values (from `0=2+1 to 1), starting from the center, towards both

ends. These parameters have been chosen empirically, based on several experiments, using a large set of

test images. The result of the soft directional opening is an image where bright structures that do not

belong to any line segment with minimum length `0 have been eliminated, and small gaps between linear

objects are bridged. During this processing step, an orientation image is also produced by assigning to

each pixel the direction where the soft directional opening gave the maximum value.

As an additional last step, we use a closing with a square SE of size wmax=4, in order to homogenize

the regions inside the roads. The �nal result is considered as the response (R) of our morphological road

5

detector. Fig. 3 and Fig. 4 show the morphological road detector's response (negative image) and the

orientation image, respectively. In this particular example we have chosen wmax = 30 and `0 = 35.

Figure 3: The negative image of the morphological road detection response R.

Figure 4: Orientation image.

3.4.2 Line segment extraction

One pixel width line segments are produced by extracting from the response image (R) the pseudo-medial

axis of the roads, over which we apply a dedicated line-following algorithm. The pseudo-medial axis is

found by performing the watershed transformation on the response image. The line-following process

initially creates a list of linear contours with a minimum prede�ned length `min. Each medial-axis pixel

is considered as the starting point of a contour. Pixels are added into a linear contour by using a

tracking algorithm along each medial-axis. Whenever a node point is reached, the tracking continues in

the direction with the minimum angular deviation (as these are indicated by the orientation image) and

the maximum morphological response. The line-following stops when an angle deviation greater than

d� is reached. Finally, the transition from linear contours to line segments is obtained by applying a

line approximation process. Fig. 5 shows the �nal result of this process (detected line segments). The

6

parameters `min and d� are set to `min = `0=2 and d� = 30o, respectively.

Figure 5: Set of detected line segments.

4 GLOBAL ANALYSIS

The second part of our work concerns the reconstruction of the roads from the previously detected line

segments. Several methods have been developed for grouping lines using contextual constraints about the

linear features of interest. Markov random �eld theory provides an eÆcient framework to model these

constraints [8] [16]. Our line grouping approach is based on [4].

4.1 MRF labeling - Theoretical background.

Contextual constraints are ultimately necessary in the interpretation of visual information. A scene is

understood through the spatial and visual context of the objects in it; the objects are recognized in the

context of object features at a lower level representation; the object features are identi�ed based on the

context primitives at an even lower level; and the primitives are extracted in the context of image pixels

at the lowest level of abstraction. The use of contextual constraints is indispensable for a complex vision

system.

Markov random �eld theory provides a convenient and consistent way of modeling context dependent

entities [17] [18] [19]. This is achieved through characterizing mutual in uences among such entities using

MRF probabilities. The theory tells us how to model the a priori probability of contextual dependent

patterns. A particular MRF model favors its own class of patterns by associating them with larger

probabilities than other pattern classes. In the following, we will brie y review the concept of MRF

de�ned on graphs.

Let G=fS,Ag be a graph, where S=fS1; S2; :::; Smg is the set of nodes and A is the set of arcs

containing them. We de�ne a neighbourhood system on G, denoted by:

N = f N(S1); N(S2); :::;N(Sm)g

where N(Si); i = 1; 2; :::;m is the set of all nodes in S that are neighbors of Si, such that:

i) Si 2 N(Si), and

ii) if Sj 2 N(Si) then Si 2 N(Sj)

Let L=fL1; L2; :::; Lmg be a family of random variables de�ned on S, in which each random variable

Li takes a value li in a given set (the random variables Li's can be numerical as well as symbolic, e.g.

interpretation labels). The family L is called a random �eld. L is a MRF on G, with respect to the

neighbourhood system N if and only if

1. P (L = l) > 0, for all realizations l of L;

7

2. P (lijlj ;8Sj 6= Si) = P (lijlj ; Sj 2N(Si))

where P (L = l) = P (L1 = l1; L2 = l2; :::; Lm = lm) (abbreviated by P (l)) and P (lijlj) are the joint and

conditional probability functions, respectively. Intuitively, the MRF is a random �eld with the property

that the statistics at a particular node depend mainly on that of its neighbors.

An important feature of the MRF model de�ned above is that its joint p.d.f has a general functional

form, known as Gibbs distribution, which is de�ned based on the concept of cliques. A clique c, associated

with the graph G, is a subset of S such that it contains either a single node, or several nodes that are

all neighbors of each other. If we denote the collection of all the cliques of G, with respect to the

neighbourhood system N, as C(G,N), the general form of a realization of P (l) can be expressed as the

following Gibbs distribution:

P (l) =1

Ze�U(l) (3)

where U(l) =P

c2C Vc(l) is called the Gibbs energy function and Vc(l) the clique potential functions

de�ned on the corresponding cliques c 2 C(G;N). Finally, Z =P

l2L e�U(l) is a normalizing constant

called the partition function.

In case of a labeling problem, when we have both prior information together with knowledge about

the distribution of our data, the most optimal labeling of the graph G can be obtained based on the

maximum a posteriori probability (MAP)-MRF framework. According to the Bayes rule, the posterior

probability of our system can be computed by using the following formulation:

P (ljd) =p(djl)P (l)

p(d)(4)

where P (l) is the prior probability of labelings l, p(djl) is the conditional probability distribution function

(p.d.f.) of the observations d, also called the likelihood function of l for d �xed, and p(d) is the density of

d which is a constant when d is given. By associating an energy function to p(djl) and P (l) (denoted by

U(djl) and U(l) respectively), we can express the posterior probability as:

P (ljd) / e�U(ljd) (5)

where

U(ljd) = U(djl) + U(l) (6)

The most optimal labeling, given the observation �eld d, can be found by minimizing the energy

function U(ljd).

Due to its unique property of combining both global and local information, the MRF model-based

approach, applied to image interpretation, provides potential advantages in knowledge representation,

learning and optimization. This is the reason why we use this approach.

4.2 The line grouping approach of Tupin et al. [4]

The method of Tupin et al. for line grouping [4] falls within the scope of the Bayesian framework. It is

applied on a set S of line segments, consisting of the set of detected line segments (Sdet) that are the result

of a dedicated line detection algorithm for SAR images, and all the possible connections between them

(Scon). These additional segments are produced by using certain connectivity criteria. The elements of

S are then organized as a graph G = (S;A). To each node i 2 S is associated a normalized segment

length (`i 2 [0; 1]), an observation value di, which corresponds to the mean line detection response value

along the segment, and a label li = 1 if i belongs to a road, li = 0 otherwise. An arc, Aij 2 A, between

two nodes i and j, corresponds to a shared extremity. To each arc Aij is associated the angle �ijmod�

between the two segments. A neighborhood system is de�ned on G, with its cliques being all the subsets

of segments sharing an extremity. The neighborhood Ni of each node i is given by:

Ni =�j 2 S=9(k; p) 2 f1; 2g2;Mk

j =Mpi ; j 6= i

(7)

where Mkj , for k 2 f1; 2g, denote the endpoints of a segment j.

The identi�cation of the roads is carried out with an appropriate labeling of the graph, in accordance

with the observation process d = (d1; d2; :::; dm) (where m is the cardinality of S). A Markov random

�eld is de�ned on the graph and the optimum con�guration (labeling) l = (l1; l2; :::; lm) of the segments

8

of S, given the observation process d, can be estimated based on the Bayes rule and a MAP criterion

that maximizes the posterior probability P (ljd). The conditional probability distribution p(djl) depends

on the observation measurements, whereas the prior probability of labelings P (l) is based on a Markovian

model of road-like objects. From the equivalence between MRF and Gibbs �elds, both of them can be

described with a set of potentials that associate an energy function to the di�erent con�gurations. The

minimization of this energy function gives the optimal solution to the problem.

Under the assumption that each observation di is only conditioned by the corresponding label li, the

conditional probability distribution p(djl) is expressed by:

p(djl) =

mYi=1

p(dijli) / exp

�

mXi=1

(V (dijli) + logZ0)

!(8)

where V (dijli) denotes the conditional potential of segment i and Z0 a normalization factor that ensures

the conditionR 10p(d = xjl)dx = 1. The potentials V (dijli) were chosen experimentally after a manual

segmentation of roads and depend on two parameters t1 and t2:

V (dij0) =

8<:

0 if di < t1di�t1t2�t1

if t1 < di < t2

1 otherwise

and V (dij1) = 0; 8 di (9)

The prior probability of labelings and the corresponding clique potentials re ect three main assump-

tions about the road structure: (i) roads are long structures, (ii) they have low curvature, and (iii)

intersections between roads are rare. For every clique c, the de�ned clique potentials depend on the

current con�guration:

� Null situation : 8i 2 c; li = 0; Vc(l) = 0 (10)

� Assumption (i) : 9!i 2 c=li = 1; Vc(l) = Ke �Kl`i

(11)

� Assumption (ii) : 9!(i; j) 2 c2=li = lj = 1; Vc(l) = �Kl(`i + `j) +Kcsin(�ij) (12)

� Assumption (iii) : in all other cases; Vc(l) = KP

i=i2c li

(13)

Positive values for the parameters Ke and Kl ful�ll assumption (i) and favor long roads, whereas positive

values for Kc and K ful�ll the assumptions (ii) and (iii) of the road model, respectively.

4.3 The proposed scheme for line grouping

We propose a line grouping scheme that is an enhanced version of the method described in the previous

section. The main modi�cations include the incorporation of a new observation measure, the reduction

of the number of potential parameters and the improvement of the clique potential functions.

4.3.1 Graph creation

The graph structure G, associated to the augmented set of segments S, together with the attributes

attached to it are the same as in the scheme of [4], described in the previous section. The neighborhood

Ni of each node i is given in (7). For each line segment i 2 Sdet we de�ne 2 cliques (Cki with k 2 f1; 2g)

that correspond to both of its end points. Each of these cliques contains all the segments that share

the speci�c extremity. An example of such a neighborhood system is shown in Fig. 6. We introduce a

new observation �eld d that depends both on the morphological road detection response and orientation

information, and re ects more eÆciently the likelihood value of each segment as belonging to a road. The

observation di of each segment i is a function of a saliency measure ri de�ned as:

ri = Ri=(j�i � �ij+ 1) (14)

where Ri and �i are the mean values, along the line segment, of the morphological road detection and

orientation responses (as described in section 3.4.1) respectively, and �i is the line segment orientation.

A high value ri for a segment i, together with the presence of other segments with a high saliency in

9

Figure 6: Neighborhood system.

the neighborhood of i, are considered as cumulated evidence that this segment is part of a road. The

observation values di are de�ned as:

di = maxj2Ni

f(ri + rj)=2g (15)

The observation values d1; d2; :::; dm, associated to S, are then normalized between 0 and 1.

4.3.2 Conditional probability distribution

By assuming that the observation di is only conditioned by the corresponding label li, and that the

dependencies between the di�erent observations are exclusively determined by the dependencies between

the labels li (as described by the MRF model), the conditional probability distribution p(djl) can be

derived from equation 8. Using the proposed observations di (equation 15), manual segmentation of road

images showed that road segments may have almost any observation value d, while nonroad segments have

observations with values smaller than a threshold t. As opposed to the conditional potentials proposed in

[4], the chosen potential functions that describe the conditional probability distributions p(dijli) depend

only on one parameter:

V (dj0) =

�dt

if d < t

1 otherwiseand V (dj1) = 0; 8 d (16)

Based on these functions, the normalization factor Z0 in equation 8 is found to be equal to Z0 =

(1� t)(1=e)� t(1=e� 1) with e = exp(1).

4.3.3 Prior probability of labelings

A priori knowledge is introduced with the creation of a geometrical model of road-like structures. In

our case, this model is based on the three assumptions of [4] described in section 4.2. We have modi�ed

the form of the clique potentials proposed in [4] by using a reduced number of potential parameters

and by making a clear distinction between the elements of Sdet and Scon, which provides additional a

priori information about the nature of each segment. The optimal con�gurations have long and collinear

detected line segments with short connections between them. Every clique contains one segment belonging

to Sdet (with length `det), along with segments of Scon (with length `con) that share the same extremity.

For every clique c, the chosen clique potentials have the following form:

� Null situation : 8i 2 c; li = 0; Vc(l) = 0 (17)

� Assumption (i) : 9!i 2 c ^ i 2 Sdet=li = 1; Vc(l) = K1 + 1� `deti + logZ0 (18)

� Assumption (ii) : 9!(i; j) 2 c2^ i 2 Sdet; j 2 Scon=li = lj = 1;

Vc(l) = sin(�ij) + 1� `deti + `

conj + 2logZ0

(19)

� Assumption (iii) : in all other cases, Vc(l) = K2

Pi=i2c li (20)

By choosing K1 > 0 in equation 18 we penalize short roads (assumption (i) in section 4.2), i.e. the clique

potential is high for a clique with only one isolated segment, except when this isolated segment has a high

normalized length `deti (close to 1). High values of K1 favor more connected con�gurations. Equation 19

10

satis�es assumption (ii) of section 4.2 and at the same time penalizes con�gurations with short detected

and long connecting segments. Finally, K2 > 0, in equation 20, has the same properties as the parameter

K in equation 13. The additional parameters logZ0 and 2logZ0, in equations 18 and 19 respectively, are

factors that facilitate the comparison between the clique potential values and the conditional potentials

of the null con�gurations (where all the segments of the current clique are labeled as 0). In the case of a

clique with one segment labeled as 1, the factor K1 + 1 � `deti in equation 18 is directly compared with

the conditional potential component V (dij0) of the current segment i. In the case of a clique with two

segments i, j labeled as li = lj = 1, the factor sin(�ij) + 1� `deti + `

conj of equation 19 is compared with

the sum of the conditional potential components V (dij0), V (dj j0).

4.3.4 Posterior probability - Energy minimization

The posterior probability P (ljd) can be also expressed in terms of a global energy function U(ljd)

(P (ljd) =/ exp(�U(ljd))), which can be deduced from the potentials functions:

U(ljd) =

mXi=1

V (dijli) +Xc2C

Vc(l) (21)

The MAP con�guration of the line segments is estimated by minimizing the energy function U(ljd).

As minimization scheme, a simulated annealing specially adapted to our problem is used. We chose

an eÆcient label generation mechanism in order to speed up the evolution of the system towards the

optimal solution (global minimum). Instead of sequentially updating the label of each node, we consider

three adjacent segments and apply the Metropolis acceptance criterion [20] to each of their eight possible

con�gurations. Based on the conjecture that it is not possible to have a connecting segment with label

li = 1 if one of the adjacent detected segments has label lj = 0, we reject beforehand con�gurations of

this type.

For the annealing process, we used the polynomial-time cooling schedule proposed in [20]. This implies

the generation of homogeneous Markov chains of �nite length for a �nite sequence of descending values of

a control parameter T that corresponds to the temperature of our system. We will denote by hfiTk and �kthe mean value and standard deviation of the distribution of the cost (energy) values of solutions during

the generation of the kth Markov chain. The cooling schedule is based on the following parameters:

� Initial value of the control parameter (T0).

The process starts from a high initial value of the control parameter T0 (temperature of our system),

so that virtually all proposed transitions should be accepted. Assume that a sequence of trials is

generated at a certain value T of the control parameter. Let m1 denote the number of proposed

transitions from i to j for which the energy function U(j) � U(i), and m2 the number of transitions

for which U(j) > U(i). Furthermore, let (�U)+ be the average di�erence in cost over the m2

cost-increasing transitions. By �xing an initial acceptance ratio equal to �0, and an initial value for

T0 equal to 0, we perform a number of trials (m1 +m2), each time updating the value of T0 by:

T0 =(�U)+

ln�

m2m2�0�m1(1��0)

�Numerical experiments indicate that fast convergence to a �nal value of T0 is obtained in this way.

� Decrement of the control parameter.

The decrement of the control parameter is determined by the value of a distance parameter Æ and

the standard deviation �k at each Markov chain.

Tk+1 =Tk

1+Tkln(1+Æ)

3�k

; k = 0; 1:::

� Final value of the control parameter.

Termination of the algorithm is based on a extrapolation of the expected cost hfiTk for Tk # 0. We

may reliably terminate the algorithm if for some k we have:

Tkhfi1

@hfiT@T

jT=Tk < �s

where �s is some small positive number called stop parameter.

In our case, we chose the following values for the parameters �0, Æ and �s: �0 = 0:95; Æ = 0:01; �s = 10�5.

11

5 VALIDATION

5.1 Parameter setting

In this section, we investigate the parameters that in uence the probability distributions and their cor-

responding potential functions. Our scheme for parameter setting is inspired from the one presented in

[4]. As a reference, we will use a set of three connected segments s1, s2, s3 (s1; s3 2 Sdet and s2 2 Scon).

By comparing the energy components of two possible con�gurations of these segments, we can derive the

accepted range of the parameter K1.

� "Connected" con�guration: l1 = l2 = l3 = 1.

Ucon = 2K1 + sin�12 + sin�23 + 4� 2`1 � 2`3 + 2`2 + 6logZ0 (22)

� "Unconnected" con�guration: l1 = l3 = 1 and l2 = 0.

Uuncon = V (d2j0) + 4K1 + 4� 2`1 � 2`3 + 4logZ0 (23)

The energetic variation �U = Ucon�Uuncon should be positive in the case of a long connecting segment

s2 (`2 ! 1) with a poor observation value (V (d2j0) ! logZ0), even if the three segments are perfectly

aligned. This condition limits the connecting power of the a priori model in poor observation areas.

Based on the restriction �U > 0, the following condition should be ful�lled:

K1 <2 + logZ0

2(24)

The parameter K2 has been empirically set to a value equal to 0.1. Finally, the optimal value of the

parameter t (in equation 16), for the type of airborne images used in our application, is found to be

around 0.15 (logZ0 � �0:89).

5.2 Results

This section demonstrates the performance of our method in three cases of airborne images. The �rst

image contains roads with a big variety of widths, while the other two have small paths in a heavily

textured environment. In these examples, the road detection results have been produced using the

following values for the potential parameters: t = 0:15, K1 = 0:5 and K2 = 0:1.

(a) (b)

Figure 7: (a) Road detection result (Bandua region); (b) Road detection result, using the saliency measure

ri = Ri.

12

Fig. 7(a) shows the result of our method related to the airborne image of �gure 1. Most of the false-

alarm detections of Fig. 5 have been suppressed, whereas the linear features corresponding to roads and

paths have been successfully reconstructed, independently of their size. Fig. 7(b) illustrates the in uence

of the angular information in the de�nition of the saliency measure r (equation 14). The choice of a

saliency measure depending only on the mean response value along each segment (ri = Ri) produces

several spurious detections due to imperfections in the response image.

(a) (b)

Figure 8: (a) Road detection result (Songo region); (b) Road detection result, using the morphological

operators of [5] in the local analysis step.

A second example corresponds to the image of Fig. 2. The image has a size of 800x800 pixels and

represents a heavily textured scene, with several small paths that are partially disconnected, mainly

because of image degradation. The parameters used in the local analysis step are the following: wmax = 5,

`0 = 20, `min = `0=2 and d� = 30o. The �nal result, presented in Fig. 8(a), contains most of the linear

features of interest together with a small number of wrongly classi�ed line segments. In Fig. 8(b), we

demonstrate the importance of the introduced soft morphological operators during the local analysis

phase. The �gure shows the detected linear structures from �gure 2, using merely the morphological

operators of Chanussot and Lambert [5], without the proposed modi�cations. The paths in the image

have been partially detected, mainly due to the fact that the result of the morphological �ltering contains

several disconnected linear regions, which are gradually vanished during the ooding process of the

watershed transformation (extraction of the pseudo-medial axis of the roads). Finally, in Fig. 9 we

present a third example of an airborne image, together with the detected linear features.

The computational time of our method is rather demanding, mainly because of the optimization step

(simulated annealing) of section 4.3.4. Nevertheless, the proposed optimization scheme is stable and

converges, in most cases that were investigated, to a global minimum solution, independently of the

initial realization of labelings. For a 800 x 800 image on a Pentium III at 500MHz, the local analysis

phase lasts around 5 min. Due to the eÆciency of our line detector, for our test set of images, the total

number of road segment candidates is not more than 1500. For this number of segments the labeling

stage lasts approximately 10 min.

6 DISCUSSION - CONCLUSIONS

We describe a model-based technique for linear feature extraction, in digitized airborne images, which

combines both local and global criteria, and illustrate its application on the problem of road and path

detection. Its main advantage is the good detection performance in heavily textured environments along

with its ability of identifying elongated structures independently of their size. It can be considered as the

combination and extension of two earlier approaches [5] [4].

As far as the local analysis step is concerned, the improvement of the morphological �ltering scheme of

13

(a) (b)

Figure 9: (a) Original airborne image (Songo region in Mozambique: scale 1:3000; ground resolution

0.31m); (b) Road detection result.

[5], mainly due to the use of a soft operator, together with the proposed line-following algorithm, result

in a better detection of roads with a large variety of sizes, and in the reduction of spurious line segments,

even in the presence of heavy noise. Due to the good performance of our line detector, the produced

candidate road segments are long and not too numerous. This decreases signi�cantly the complexity of

the labeling process. The proposed global analysis stage, although it has many similarities with the one

in [4], contains some important modi�cations that make it more exible and robust. These modi�cations

include the incorporation of a new observation measure that re ects more eÆciently the likelihood value

of each segment, the reduction of the number of potential parameters, and �nally the use of di�erent

potential functions that represent better the properties of the geometrical road model.

One of the most important limitations of our method is that it is not entirely unsupervised, due to

the setting of three parameters (t, K1, K2), all of them concerning the connection step. Nevertheless,

the proposed ranges of these parameters, give very good results for the class of environments illustrated

in Fig. 1, Fig. 2 and Fig. 9, independently of the size of the linear features to be extracted. Further

analysis should be carried out towards the problem of identifying road segments with high curvature,

especially when the curvature is high compared to the maximum road width found in the image, and

towards the choice of a more eÆcient skeletonization process for the extraction of the road medial axis.

Another interesting aspect for investigation is the incorporation of color information as an extra attribute,

in order to further decrease the number of false-alarm detections.

Acknowledgments

This research is part of the European Pilot Project: "Airborne Mine�eld Detection in Mozambique"

(ITC(N), RMA(B), Geograph(P), EOS(UK), Eurosense(B), Aerodata(B), CAE Aviation(L), Recon Op-

tical(UK), Satelitbild(S), Zeiss Eltro Oprtonic(G), NPA(N), CND(Moz), I.G.I LTD(G), VUB(B)). It

is funded by the European Commission (DG8), the Governments of Belgium, Germany, Luxembourg,

Portugal, the United Kingdom and the International Institute for Aerospace Survey and Earth Sciences

(ITC) the Netherlands. Supplementary funding comes from the VUB research council and the Belgian

Ministry of Foreign A�airs.

References

[1] J. Canny, \A computational approach to edge detection," IEEE Trans. Pattern Anal. Machine

Intell., vol. 8, pp. 679{698, Nov. 1986.

14

[2] Y. T. Zhou, V. Venkateswar, and R. Chellapa, \Edge detection and linear feature extraction using a

2-d random �eld model," IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 84{95, Jan. 1989.

[3] M. A. Fischler, J. M. Tenenbaum, and H. C. Wolf, \Detection of roads and linear structures in low

resolution aerial imagery using a multisource knowledge integration technique," Comput. Graph.

Image Processing, vol. 15, pp. 201{223, no 3 1981.

[4] F. Tupin, H. Maitre, J. F. Mangin, J. M. Nicolas, and E. Pechersky, \Detection of linear features

in SAR images: Application to road network extraction," IEEE Trans. on Geoscience and Remote

Sensing, vol. 36, pp. 434{453, Mar. 1998.

[5] J. Chanussot and P. Lambert, \An application of mathematical morphology to road network extrac-

tion on SAR images," Proc. International Symposium on Mathematical Morphology, Amsterdam,

pp. 399{406, 1998.

[6] N. Merlet and J. Zerubia, \New prospects in line detection by dynamic programming," IEEE Trans.

Pattern Anal. Machine Intell., vol. 18, pp. 426{431, Apr. 1996.

[7] D. Geman and B. Jedynak, \An active testing model for tracking roads in satellite images," IEEE

Trans. Pattern Anal. Machine Intell., vol. 18, pp. 1{14, Jan. 1996.

[8] J. L. Marroquin, \A Markovian random �eld of piecewise straight lines," Biological Cybern., vol.

61, pp. 457{465, 1989.

[9] M. Barzohar and D. B. Cooper, \Automatic �nding of main roads in aerial images using geometric-

stohastic models and estimation," IEEE Trans. Pattern Anal. Machine Intell., vol. 18, pp. 707{721,

Jul. 1996.

[10] A. Katartzis, V. Pizurica, and H. Sahli, \Application of mathematical morphology and Markov

random �eld theory to the automatic extraction of linear features in airborne images," Proc. In-

ternational Symposium on: Mathematical Morphology and its Applications to Image and Signal

Processing V, California, USA, pp. 405{414, 2000.

[11] L. van Kempen, A. Katartzis, V. Pizurica, J. Cornelis, and H. Sahli, \Digital signal/image processing

for mine detection part1: Airborne approach," Proc. MINE '99, Euroconference on: Sensor systems

and signal processing techniques applied to the detection of mines and unexploded ordinance, Firenze,

Italy, pp. 48{53, 1999.

[12] P. Soille, Morphological Image Analysis - Principles and Applications, Springer, Germany, 1999.

[13] P. Kuosmanen and J. Astola, \Soft morphological �ltering," Journal of Mathematical Imaging and

Vision, vol. 5, pp. 231{262, 1995.

[14] L. Koskinen, J. Astola, and Y. Neuvo, \Soft morphological �lters," Proc. SPIE Symposium on

Image Algebra and Morphological Image Processing, vol. 1568, pp. 262{270, 1991.

[15] F. Y. Shih and C. C. Pu, \Analysis of the properties of soft morphological �ltering using threshold

decomposition," IEEE Trans. on Signal Processing, vol. 43, pp. 539{544, 1995.

[16] S. Krishnamachari and R. Chellapa, \Delineating buildings by grouping lines with MRFs," IEEE

Trans. on Image Processing, vol. 5, pp. 164{168, Jan. 1996.

[17] S. Z. Li, Markov random �eld modeling in computer vision, Computer Science Workbench. Springer

Verlag, 1995.

[18] R. Kinderman and J. L. Snell, \Markov random �elds and their applications," Amer. Math. Soc.,

1980.

[19] J. Besag, \Spatial interaction and the statistical analysis of lattice systems," J. Roy. Statist. Soc,

vol. 36, pp. 192{226, Series B. 1974.

[20] E. H. L. Aarts, Simulated annealing and Boltzmann machines, John Wiley & Sons Ltd., 1989.

15

A model-based approach to the automatic extraction of linear features from airborne images

Documents

Transcript of A model-based approach to the automatic extraction of linear features from airborne images