A self-organizing map to improve vehicle detection in flow monitoring systems

11
Soft Comput DOI 10.1007/s00500-014-1575-3 FOCUS A self-organizing map to improve vehicle detection in flow monitoring systems R. M. Luque-Baena · Ezequiel López-Rubio · E. Domínguez · E. J. Palomo · J. M. Jerez © Springer-Verlag Berlin Heidelberg 2015 Abstract The obtaining of perfect foreground segmenta- tion masks still remains as a challenging task in video sur- veillance systems, since errors in that initial stage could lead to misleadings in subsequent tasks as object tracking and behavior analysis. This work presents a novel methodol- ogy based on self-organizing neural networks and Gaussian distributions to detect unusual objects in the scene, and to improve the foreground mask handling occlusions between objects. After testing the proposed approach on several traf- fic sequences obtained from public repositories, the results demonstrate that this methodology is promising and suitable to correct segmentation errors on crowded scenes with rigid objects. Keywords Self-organizing neural networks · Postprocessing techniques · Traffic monitoring · Surveillance systems · Object detection Communicated by I. R. Ruiz. R. M. Luque-Baena (B ) Department of Computer Systems and Telematics Engineering, University of Extremadura, University Centre of Mérida, 06800 Mérida, Spain e-mail: [email protected] E. López-Rubio · E. Domínguez · E. J. Palomo · J. M. Jerez Department of Computer Languages and Computer Science, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Spain e-mail: [email protected] E. Domínguez e-mail: [email protected] E. J. Palomo e-mail: [email protected] J. M. Jerez e-mail: [email protected] 1 Introduction In recent years, research on traffic flow monitoring sys- tems has attracted lots of attention. Automatic analysis of traffic activity has increased due, in part, to the additional numbers of cameras and other sensors, enhanced infrastruc- ture, and consequent accessibility of information. Moreover, the advance of analytical techniques for video processing data and the increase of computing power have allowed the development of new applications, where traffic congestions, incidents and violations are the great challenges in most of cities. The objective of the flow monitoring systems is setting up a high-level description of the traffic scenes comprising the position, speed and class of the vehicles. Thus, algorithms for detecting moving objects, tracking and classification have to be implemented in this kind of systems, as well as being designed to be run in real-time on low-cost hardware. Vehicle detection is the first stage in any flow monitoring system. In this first step of video processing, we suppose that vehicles are moving and the lighting conditions are relatively stable. There are numerous objects detection methods based on background modeling (Stauffer and Grimson 1999; Rid- der et al. 1995; Lai and Yung 1998), frame differencing (Lip- ton et al. 1998; Cucchiara et al. 2000) or optical flow (Nagai et al. 1996). However, a reliable and robust foreground vehi- cle detection is difficult to obtain due to several drawbacks, such as strong lighting conditions, incorrect merges in the segmentation mask, vehicle occlusions in dense traffic situa- tions, etc. Thus, a good foreground detection is fundamental for the next stage (vehicle tracking) and the subsequent ones. Self-organizing models (SOM) have shown promising performance in a wide variety of application areas, including hard video applications like some robotic operation, visual inspection, remote sensing, autonomous vehicle driving, 123

Transcript of A self-organizing map to improve vehicle detection in flow monitoring systems

Soft ComputDOI 10.1007/s00500-014-1575-3

FOCUS

A self-organizing map to improve vehicle detection in flowmonitoring systems

R. M. Luque-Baena · Ezequiel López-Rubio ·E. Domínguez · E. J. Palomo · J. M. Jerez

© Springer-Verlag Berlin Heidelberg 2015

Abstract The obtaining of perfect foreground segmenta-tion masks still remains as a challenging task in video sur-veillance systems, since errors in that initial stage could leadto misleadings in subsequent tasks as object tracking andbehavior analysis. This work presents a novel methodol-ogy based on self-organizing neural networks and Gaussiandistributions to detect unusual objects in the scene, and toimprove the foreground mask handling occlusions betweenobjects. After testing the proposed approach on several traf-fic sequences obtained from public repositories, the resultsdemonstrate that this methodology is promising and suitableto correct segmentation errors on crowded scenes with rigidobjects.

Keywords Self-organizing neural networks ·Postprocessing techniques · Traffic monitoring ·Surveillance systems · Object detection

Communicated by I. R. Ruiz.

R. M. Luque-Baena (B)Department of Computer Systems and Telematics Engineering,University of Extremadura, University Centre of Mérida,06800 Mérida, Spaine-mail: [email protected]

E. López-Rubio · E. Domínguez · E. J. Palomo · J. M. JerezDepartment of Computer Languages and Computer Science, Universityof Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Spaine-mail: [email protected]

E. Domíngueze-mail: [email protected]

E. J. Palomoe-mail: [email protected]

J. M. Jereze-mail: [email protected]

1 Introduction

In recent years, research on traffic flow monitoring sys-tems has attracted lots of attention. Automatic analysis oftraffic activity has increased due, in part, to the additionalnumbers of cameras and other sensors, enhanced infrastruc-ture, and consequent accessibility of information. Moreover,the advance of analytical techniques for video processingdata and the increase of computing power have allowed thedevelopment of new applications, where traffic congestions,incidents and violations are the great challenges in most ofcities.

The objective of the flow monitoring systems is setting upa high-level description of the traffic scenes comprising theposition, speed and class of the vehicles. Thus, algorithms fordetecting moving objects, tracking and classification have tobe implemented in this kind of systems, as well as beingdesigned to be run in real-time on low-cost hardware.

Vehicle detection is the first stage in any flow monitoringsystem. In this first step of video processing, we suppose thatvehicles are moving and the lighting conditions are relativelystable. There are numerous objects detection methods basedon background modeling (Stauffer and Grimson 1999; Rid-der et al. 1995; Lai and Yung 1998), frame differencing (Lip-ton et al. 1998; Cucchiara et al. 2000) or optical flow (Nagaiet al. 1996). However, a reliable and robust foreground vehi-cle detection is difficult to obtain due to several drawbacks,such as strong lighting conditions, incorrect merges in thesegmentation mask, vehicle occlusions in dense traffic situa-tions, etc. Thus, a good foreground detection is fundamentalfor the next stage (vehicle tracking) and the subsequent ones.

Self-organizing models (SOM) have shown promisingperformance in a wide variety of application areas, includinghard video applications like some robotic operation, visualinspection, remote sensing, autonomous vehicle driving,

123

R. M. Luque-Baena et al.

automated surveillance, and many others (García-Rodríguezet al. 2011).

This paper aims to address the ability of self-organizingmaps to detect poorly segmented objects in video sequencesof traffic monitoring and improve the output of the segmenta-tion mask. Standard object detection algorithms (Stauffer andGrimson 2000; López-Rubio and Luque-Baena 2011; Luqueet al. 2010) do not provide a perfect foreground mask for eachsequence frame, and difficulties like annoying noise, spuri-ous blobs and incorrect merges among objects can arise. Toaddress these problems, we present an application of a neuralarchitecture based on SOM that is able to adapt the topologyof the network of neurons to the objects that appear in theimages, and can represent the common features of the objectsof interest.

The proposed methodology manages to detect anomalousobjects, discriminate them between less frequent objects andoverlapped objects generated because of segmentation errors,and correct the latter ones using a mixture of Gaussians.These corrections have a two-fold objective: improve theforeground mask which will benefit the subsequent objecttracking algorithms, and cluster the data to determine differ-ent types of vehicles. This technique is specially focusedon scenes with high frequency of moving objects, wheretheir shape is homogeneous and relatively constant through-out their trayectory, regardless the perspective of the cam-era, which is considered in the model. In traffic monitoringapplications, cars are assumed to be the usual objects of inter-est, although other vehicles as trucks, motorcycles, etc. areunusual or anomalous objects due to the fewer occurrencesof these kinds of objects in the video sequences.

Therefore, the substantial benefit of this approach is toimprove the object tracking phase, by correcting some ofthe errors generated in the previous phase (object detection).This strategy is part of the post-processing techniques andcan be applied in combination with other approaches such asmorphological operators.

The rest of the paper is organized as follows: Sect. 2presents the related works of this approach, while Sect. 3sets out the methodology of this approach describing a self-organizing neural network which is adapted to this issue(Sect. 3.1) and the merge handling procedure to discriminateand separate incorrect merge of objects (Sect. 3.2). Section 4shows several experimental results over several well-knownpublic traffic surveillance sequences and Sect. 5 concludesthe article.

2 Related works

Self-organizing models have been used for the representationand tracking of objects. Fritzke (1997) proposed a variationof the GNG (Fritzke 1995) to map non-stationary distribu-

tions that applies to the representation and tracking of people(Frezza-Buet 2008). In Flórez et al. (2002) it is suggested theuse of self-organized networks for human–machine inter-action. In Cao and Suganthan (2003), amendments to self-organizing models for the characterization of the movementare proposed.

In the 90s, several simple background modeling methodswere developed (Lai and Yung 1998). Ridder et al. (1995)introduced a Kalman Filter into an adaptive background esti-mation, which can handle illumination change of daylightand moving clouds. Stauffer and Grimson (1999) proposeda method based on adaptive background mixture model. Tofilter out the noisy and foreground moving pixel values, thealpha-trimmed strategy can be employed (Lai et al. 2010).Lin et al. (2006) extracted color background by exploitingthe appearance probability (AP) of each pixel color for suf-ficient long time.

Another technique based on frame differencing was pro-posed by Lipton et al. (1998), where the moving objects wereextracted using two frames differencing method. Cucchiaraet al. (2000) improved this method using three frames. How-ever, the main drawback is that these methods fail in thesituation with slowly moving objects.

By the relative motion between the observer and the scene,moving objects can be detected. This technique, named opti-cal flow, was used by Nagai et al. (1996) to develop a smartvideo-based surveillance system. Furthermore, special sce-narios such as nigh-time detection are also addressed byGritsch et al. (2009) and Robert (2009).

A common task in image processing is to improve the ini-tial segmentation obtained by applying postprocessing tech-niques, such as morphological operators (Soille 2003). Thisimprovement usually yields better results at a later stage(detection, classification, etc.) (Gonzalez-Castro et al. 2014;Fabrizio et al. 2013; Gangopadhyay et al. 2013). Recently, anevaluation of four different algorithms based on mathemat-ical morphology was performed for automatic classificationof individual micro-calcifications (Diaz-Huerta et al. 2014).Moreover, a morphological approach for distinguishing tex-ture and individual features in images was proposed in Zing-man et al. (2014). However, these techniques are generic andare not totally adapted to the scene analyzed. The appearanceof objects in motion is quite relevant because it is not the sameto improve the initial segmentation when these objects aresmall than when they are larger. Our SOM approach achievemodeling the geometry of the objects in each scene position,even taking into account the perspective of the camera, toavoid incorrect merge of objects in the initial segmentation.

These techniques have also been widely applied to traf-fic or road monitoring (Kastrinaki et al. 2003). In Na andTao (2012), an automatic classification for pavement sur-face images was proposed, whereas the detection of the roadnetwork in very high resolution remote sensing images was

123

A self-organizing map to improve vehicle detection

addressed in Valero et al. (2010). Different approaches fortraffic analysis can be found in Cinque et al. (1999), Fathyand Siyal (1995) and Yu et al. (1992).

3 Methodology

Initially, our approach performs an initial segmentation ofmoving objects, by detecting the foreground of the scenefrom the background. There are many techniques to addressthis problem. Some of the most outstanding techniques corre-sponds to a model based on mixture of Gaussians which usesa set of normal distributions for modeling both the back-ground and the foreground (Stauffer and Grimson 2000;Zivkovic and van der Heijden 2006); a stochastic approx-imation method for representing the background and fore-ground with a multivariate Gaussian and uniform distribu-tions, respectively (López-Rubio and Luque-Baena 2011);and the application of clustering techniques to group pix-els belonging to the background and pixels belonging to theforeground (Luque et al. 2010).

In this work, this initial segmentation is obtained from themethod proposed in López-Rubio and Luque-Baena (2011)and then it is improved by applying some basic postproces-sor operators, such as erosion and dilation to reduce noiseand spurious objects present in the initial segmentation, asshown in Fig. 1. In this figure, a division of our methodologyinto two stages can be appreciated. The first stage consistsof a detection of anomalous objects using a self-organizingneural network to model the common shape objects of thescene, whereas the second one contains the incorrect mergehandling to deal with unexpected objects that can be con-sidered as merged objects or another kind of objects witha fewer occurrence in the scene (trucks, bicicly, …). Next,these stages are further described.

3.1 Detection of anomalous objects

In this stage, a model of usual objects has been built to detectthe anomalous objects. A feature vector y ∈ R

D is extractedfrom each object where D is the number of chosen features.In this case, the geometric shape of the objects is modeled in

each position of the scene to distinguish between anomalousobjects from the common ones. It should be noted that thescenes require to have a high frequency of objects in motion tolearn the common features on each position. Thus, geometricinformation such as the area of the object, the height andwidth of its bounding box and the orientation are obtainedby each object using the segmentation mask. These valuesform the feature vector.

Then, a reduced representation of the set of these vec-tors representing usual objects should be obtained. Note thattypical object features vary smoothly from one position toanother, as in the case of an approaching object that looksbigger while it gets close to the camera. For this reason, themain task is to learn a smooth function

F : R2 → R

D (1)

y = F (x) (2)

where x is a pixel location in the video frame. Also, to beable to detect anomalous data, the variability among typicaldata has to be stored, for example, by binning the typicaldata according to their position in the scene. Therefore, dif-ferent processing units spread across the scene that are ableto cooperate among them are needed. The self-organizingmap (SOM) can perform this task since the units of thismap become spatially, globally ordered (Kohonen 1982a, b,1990; Kohonen et al. 2001). In fact, this neural model hasbeen widely applied to clustering problems (Kohonen 2013),although the problem of the standard SOM is the lack of out-put vectors.

The SOMs belonging to the family of parametrized andinterpolating maps can be used for this application, whichare commonplace in robot inverse kinematics controllers(de Angulo and Torras 2008; Göppert and Rosenstiel 1997;Padoan Jr et al. 2003; Walter and Ritter 1996). Unlike stan-dard SOMs, these neural networks models are suitable heresince they have input x ∈ R

2 (pixel coordinates) and outputvectors y ∈ R

D , (typical object features at pixel coordinatesx), where each unit i has two prototypes, one for input vec-tors wi ∈ R

2 and one for output vectors vi ∈ RD . The input

prototype is used to compute the winning unit, whereas theoutput prototype is used to estimate the smooth function F .

Sequence of frames

Initial Motion Segmentation

[López-Rubio et al. 2011]

Basic postprocessingMorphological operations, spurious

objects removal...

Detection of anomalous objects

(according to their shape)

Occlusionhandling

Our methodology approachEnhance objects (blobs) Object

Tracking

Object Detection

Fig. 1 Framework for a video surveillance system, where our methodology has been included in the object detection phase. Our approach resultsare the inputs of the object tracking phase

123

R. M. Luque-Baena et al.

Fig. 2 Self-organizing neural network initially distributed in the sceneas a grid (left image) and finally adapted to the most frequent positionsof the vehicles in the scene (right image). The size of the neurons, which

represents the size of the vehicles in their corresponding zones of thescene, is also learned from the initial state

Let M be the number of units of the self-organizing map,which are arranged in a rectangular lattice of size a × b,where M = ab (see the right image of Fig. 2). This topologydoes not need to have a specific shape (polygonal, circular,ellipsoidal, …), since the SOM adapts itself to the featuresof the scene (positions where traffic is more frequent). Thetopological distance between the units i and i ′, located atlattice positions (p1, p2) and

(p′

1, p′2

), is given by:

d(i, i ′

) = ∥∥(p1, p2) − (p′

1, p′2

)∥∥ (3)

where ||.|| stands for the L2 (Euclidean) norm.When a new input sample x (n) and the corresponding

output sample y (n) are presented to the network at time stepn. For every incoming frame, a new time step is necessary foreach detected foreground object. A winning unit is computed:

Winner (x (n)) = arg minj∈{1,...,M}

∥∥x (n) − w j (n)∥∥ (4)

where again ||.|| stands for the L2 (Euclidean) norm. Sincethe learning strategy of the network is online, only samplesfrom one frame are analyzed at each step, so that the inputand output prototypes of all the units are adjusted, for i ∈{1, . . . , M}:wi (n + 1) = wi (n) + η(n)�(i, Winner(x(n)))

×(x(n) − wi (n)) (5)

vi (n + 1) = vi (n) + η(n)�(i, Winner(x(n)))

×(y(n) − vi (n)) (6)

where η (n) is a decaying learning rate and the neighborhoodfunction � varies depending on the time step n and on adecaying neighborhood radius �(n):

η(n + 1) ≤ η(n) (7)

�(i, j) = exp

(

−(

d(i, j)

�(n)

)2)

(8)

�(n + 1) ≤ �(n) (9)

There are two phases in the learning process: first theordering phase where η and � experience a linear decay; andthen the convergence phase where both η and � remain con-stant at a small value. In our experiments, the ordering phaseincludes the first 100 frames of the video, and the conver-gence phase spans the rest of the frames, no matter how longthe video is. This is because the ordering phase is required forthe warm-up of the algorithm only, and after that the systemruns for an indefinitely long time. The receptive field of uniti , i.e. the region of the input space that is represented by i , isdefined as:

Fi = {x | i = Winner (x)} (10)

If we are presented a test sample x and we wish to estimateF (x), then we compute the winning unit j = Winner (x)

by Eq. (4) and the estimated output is taken as the outputprototype of the winning neuron (Eq. 11).

y = F (x) = v j (11)

In the right image of Fig. 2, the topology of our SOMapproach adapted to a specific scene is shown. Note thatthe size of the units are learned to represent the size of themovement objects.

At this point, a procedure to detect anomalous samples isneeded. First, in each unit i a log (diary) Si of the distances∥∥y − y

∥∥ is kept for all the anomalous test samples that belongto its receptive field Fi , where y is the observed feature vectorof an anomalous object and y is the closest typical featurevector computed by Eq. 11. This log is initialized with theinter-unit distances

∥∥vi − v j∥∥ for all the neighbors j of unit i .

Then, whether an input sample belonging to Fi is anomalousor not is decided as follows:

(x, y) is anomalous ⇔ ∥∥y − y∥∥ > γ Pi,q (12)

where Pi,q is the q-th percentile of Si , and q ∈ {1, . . . , 99},γ > 0 are fitting parameters.

123

A self-organizing map to improve vehicle detection

3.2 Incorrect merge handling

The previous subsystem manages to model the normal shapeof the objects in the scene by learning a set of prototypesv which represent typical feature vectors corresponding tonormal objects, detecting those ones which are regarded asstrange or unusual. Thus, some of the abnormal objects maybe due to overlapped objects caused by the segmentationalgorithm. This kind of objects will be analyzed to undothe overlap and extract the individual objects. The previousSOM approach considers as normal objects those ones whoseoccurrence is more frequent (cars), while trucks, motorcy-cles or vehicles whose shape differs slightly from normalare considered as unusual objects. However, according to therequirements of traffic monitoring problem, these infrequenttypes of vehicles should be distinguished from overlappedobjects. Therefore, in this subsection a subsystem is pro-posed to process each abnormal object to find out if it is anoverlapped object or a single vehicle that happens to differfrom the normal ones (e.g. truck or motorcycle). This task canbe seen as a decision process which must determine whetheran object can be split into two, i.e. it is an overlapped object.Several approaches have been proposed in different stages,depending on the binary mask or using tracking information(Álvarez et al. 2014; Cancela et al. 2013; Zulkifley and Moran2012). In this case, the input is a region of interest (ROI)which includes the unusual object, where each pixel x ∈ R

2

is labeled with the estimated probability that it belongs to theobject (foreground), P (Fore | x). Next, a probabilistic modelis developed to carry out the above-mentioned decision.

Let us assume that two vehiclesV1 andV2 are present in theinput. Then, we can model each of them with a multivariateGaussian mixture component, so that the probability that aforeground pixel exists at position x is given by:

p (x | Fore) = P (V1) p (x | Fore,V1)

+P (V2) p (x | Fore,V2) (13)

∀i ∈ {1, 2} , p (x | Fore, Vi )

= (2π)−1 (det (Ci ))−1/2 exp

(−1

2

(x − µi

)T C−1i

(x − µi

))

(14)

where µi is the mean vector (centroid) of vehicle i , and Ci isthe associated covariance matrix. It is worth noting that theprincipal axis of vehicle i can be computed as the eigenvectorcorresponding to the largest eigenvalue of Ci , while det (Ci )

is proportional to the area of the vehicle.To train the mixture of Gaussians model of (13), the expec-

tation maximization (EM) algorithm might be used. How-ever, the standard version of the algorithm is not applicablehere, since there is an inherent uncertainty about the trainingsamples x to be used because it is not known for sure which

of them belong to the foreground object. Under these condi-tions, the estimated foreground probabilities P (Fore | x) canbe used as importance weights for an importance samplingweighted EM algorithm (Cappé et al. 2008; Hoogerheide etal. 2012; Van Deusen and Irwin 2012). The resulting para-meter update equations are as follows:

Rik = P (Vi |Fore, xk)

= P (Vi ) p (xk | Fore,Vi )

P (V1) p (xk | Fore,V1) + P (V2) p (xk | Fore,V2)

(15)

P (Vi ) =∑K

k=1 Rik P (Fore | xk)∑K

k=1 P (Fore | xk)(16)

µi =∑K

k=1 Rik P (Fore | xk) xk∑K

k=1 Rik P (Fore | xk)(17)

Ci =∑K

k=1 Rik P (Fore | xk)(xk − µi

) (xk − µi

)T

∑Kk=1 Rik P (Fore | xk)

(18)

where P (Vi ), µi , Ci are the updated estimations of P (Vi ),µi , Ci , respectively; and K is the number of pixels in the ROIwith P (Fore | x) > 0.2. It has been found that including thepixels in the ROI with P (Fore | x) ≤ 0.2 in the learningprocess leads to poor fits, i.e. the pixels which most likelybelong to the background must be excluded. It must be notedthat Rik is the posterior probability that foreground pixel xk

belongs to vehicle i .After the EM algorithm has converged, the boundary

between the two vehicles is given by:

P (V1 |Fore, x) = P (V2 |Fore, x) (19)

To obtain a computationally amenable form of (19), loglikelihoods are employed:

log P (V1 |Fore, x) = log P (V2 |Fore, x) (20)

log P (V1) + log p (xk | Fore,V1)

= log P (V2) + log p (xk | Fore,V2) (21)

log P (V1) − 1

2log det (C1) − 1

2(x − µ1)

T C−11 (x − µ1)

= log P (V2)− 1

2log det (C2)− 1

2(x−µ2)

T C−12 (x−µ2)

(22)

Please note that (22) is a second order equation in thecomponents of x, i.e. the boundary is a plane algebraic curveof degree 2, also called conic section.

When two vehicles exist in the ROI, the pixels close totheir boundary should belong to the background. To check

123

R. M. Luque-Baena et al.

Occlusion handling

Fig. 3 Incorrect merge handling which analyzes the unusual objects detected in Sect. 3.1. The overlapped objects are split by the proceduredescribed in Sect. 3.2 (color figure online)

this condition in a robust way, the set of pixels S to be checkedis found this way:

S = {xk | abs (P (V1 |Fore, x) − P (V2 |Fore, x)) < ϕ}(23)

where ϕ ∈ [0, 0.5] is a tunable parameter which controls thewidth of the boundary region, and abs stands for the absolutevalue function. The larger ϕ, the broader the boundary region;in the experiments ϕ = 0.35 has been used. Finally, thesystem decides that two vehicles are present if and only ifthe following condition holds:

card({

xk ∈ S | P (Fore | xk) > 0.5})

card (S)< θ (24)

where θ ∈ [0, 1] is another tunable parameter, and cardstands for the cardinal of a set. In the experiments, it hasbeen found that the best results are attained with θ = 0.6.Condition (24) is fulfilled whenever only a small fraction(lower than θ ) of the pixels in the boundary region belong tothe foreground.

The following steps summarize the overlapped objectsdetection and splitting subsystem:

1. From the initial image segmentation, a ROI of the unusualobject is extracted.

2. A probabilistic mixture is trained with the pixels in theROI, according to Eqs. (16)–(18).

3. A possible boundary region between two vehicles isdetermined by (23).

4. The ROI is declared to contain two vehicles if and onlyif (24) is fulfilled.

5. If two vehicles are present, then (22) is used to split onefrom the other.

In principle, this process could be repeated several times tosplit an object into more than two vehicles. However, it hasbeen found in practice that this does not lead to any improve-ment of the system performance.

Figure 3 shows a representation of the incorrect mergehandling. An unusual object is detected and colored in white(middle image on the left) because two vehicles are veryclose to each other (color image on the top left).The Gaussianmixture model over the raw ROI which covers the two vehi-cles separates the overlapped object into two vehicles, repre-sented in blue and green colors, respectively. The final resultis shown in red in the bottom image on the right. It must benoted that one of the divided vehicles, is still considered bythe SOM model as a uncommon object since it is displayed inwhite. It happens because this vehicle is a truck, with a littlehigher dimensions than the normal cars (see the raw frame atthe top-left of Fig. 3), so it is considered anomalous becauseof its size. The topology map at this point of the sequence canbe found in the bottom image on the left, where the color ofthe vehicles in the rest of the images represent their associa-tion to a specific neuron of the map. The gray color indicatesthat the vehicles are leaving or entering at the scene.

4 Experimental results

Our approach has been tested over a set of traffic sequencesprovided by a video surveillance online repository, gener-ated by the Federal Highway Administration (FHWA) underthe Next Generation Simulation (NGSIM) program.1 Theobjective is to detect efficiently the vehicles in motion in each

1 Datasets of NGSIM are available at http://ngsim-community.org/.

123

A self-organizing map to improve vehicle detection

Fig. 4 Two different topologies of our SOM model which adapt automatically to the analyzed scenes. The left image shows the US-101 Highwaysequence whereas the right one displays the Lankershim scene from a top view. The camera perspective is also compensated by our approach

frame of the scene, which are considered as rigid objects. Wehave used several scenes related to two key places in Amer-ican vehicular traffic, Lankershim and US-101 highways.Four several sequences are analyzed, namely: the Lanker-shim scene from a perspective view and from a top view(size of 640×480 and 11,220 frames for both sequences),and the US-101 scene also recorded from a perspective viewand from a top view (size of 640×480 and 9,551 frames,both).

Some difficulties such as occlusion handling, aerial per-spective or overlapping objects caused by errors in the initialobject segmentation phase should be handled. Additionally,it is possible to find other kind of objects like trucks or motor-cycles which should not be detected as overlapped vehicles.

According to the first phase of our methodology, the SOMmodel is adjusted with the following parameters empiricallyselected: a map topology of 4 × 4 neurons, which can bemodified without any degradation of the model; the neigh-borhood radius and the learning rate at the convergence phaseare � = 0.5 and η = 0.01 respectively; finally, the factor γ

of Eq. (12) is assigned to 2. The map topology is flexible forany type of traffic application, taking into account that thegreater the number of neurons in the SOM network, greatercomputation time is required. In the Gaussian mixture modelfor the incorrect merge module, we have ϕ = 0.35 (see Eq.23) to control the width of the boundary region and θ = 0.6(see Eq. 24), which is the threshold to decide the presenceof two overlapped objects. It should be emphasized that themodel works in real-time, since it follows an online learningstrategy.

It is possible to observe in Fig. 4 how the neurons of theSOM approach are distributed on the US-101 Highway andLankershim scenes. Noted that they manage to situate in areaswhere the movement of the vehicles is more frequent. Addi-tionally, the perspective of the camera is captured success-fully, since the area of two neurons in different zones of thescene represent the size of the objects flowing through thatregion. Thus, in the left image of Fig. 4, vehicles in the bot-

tom part of the image are larger than in the central part, afeature which is learned by the neural model proposed. Thesame also occurs in the right image of Fig. 4, where the sizeof the objects in the left part of the scene is greater than inthe right one.

Besides recognizing when objects are anomalous withrespect to their shape, it is necessary to determine whetherthese objects correspond to incorrect merges caused by errorsin the initial segmentation stage. Figure 5 shows severalexamples of incorrect merges handled by the Gaussian mix-ture model. Thus, images on the left part of the figure repre-sent merges correctly solved in different positions, while onthe right part it is possible to observe vehicles considered asuncommon objects, which correspond to vehicles with higherdimensions or trucks. They are out of the usual shape in thatregion of the scene, according to the SOM model.

In Fig. 6 we can observe several frames where variouspossibilities are presented. In the first row, it is possible tonotice how the system detects multiple anomalous objects(middle image). Using the incorrect merge handling modulethe overlap is avoided and the vehicles are extracted properly(red objects in the right image). In the second row, thereare two unusual objects belonging to different categories;an overlapped object which integrates two cars and a truckwhose shape differs from the normal one in the analyzedscene (there is a higher frequency of light vehicles). In thiscase, our approach correctly detects the overlapped objectand manages to split it, while the truck is not changed becauseit does not qualify as an incorrect merge (right image). Insome occasions, (bottom row) the incorrect merge modulefails to determine the overlap due to an object (truck) covers,to some extent, another one (vehicle).

Table 1 presents a quantitative comparison according tothe number of vehicles for several sequences in a period oftime. The aim is to measure the benefits of the usefulness ofour approach. For that, the real number of vehicles in dif-ferent frames has been manually obtained for periods rang-ing from one to four seconds in each sequence. Only these

123

R. M. Luque-Baena et al.

Fig. 5 Several examples about the handling of incorrect merges. On the left part, several vehicles correctly divided. On the right part, vehiclesconsidered as anomalous (truck or objects with higher dimensions) which the incorrect merge module does not detect as overlapped objects

Fig. 6 Several examples of anomalous objects detected. The first rowof images shows the US-101 Highway sequence from a perspectiveview. Three anomalous objects are corrected handled. The second rowdisplays the Lankershim scene from a perspective view. One truck and

two overlapped objects are detected as unusual, and are discriminatedaccurately. The last row represents the US-101 Highway sequence froma top view. Two of the three overlapped objects are correctly handled(color figure online)

123

A self-organizing map to improve vehicle detection

Table 1 Quantitative comparison about the application of our approach according to the number of detected vehicles in different sequences

Sequence # real objs Without SOM approach With SOM approach

# estimated objs Error (%) # estimated objs Error (%)

Total Frame Total Frame Total Frame

US-101 highway—top view 1,278 28.40 ± 4.85 812 18.04 ± 3.30 36.46 834 18.53 ± 3.45 34.74

US-101 highway—perspective 2,254 19.43 ± 3.30 1,909 16.46 ± 2.99 15.31 1,987 17.13 ± 3.08 11.85

Lankershim—perspective 659 11.17 ± 7.69 602 10.20 ± 6.40 8.65 654 11.08 ± 7.20 0.76

The total number of vehicles in the evaluated period and the mean and standard deviation for each frame are indicated in the table. The error of theestimation of the number of vehicles is also presented

115117120

154157

161 165

168

178

115117

157153156

Fig. 7 Qualitative comparison of the trajectories obtained after apply the tracking algorithm. The left column shows the version without ourapproach, while the right column shows the influence of the proposed methodology

frames, which include a counter of the number of vehicles,are selected in the comparison. The total number of vehicles,as the sum of all the objects in the evaluation frames, andthe mean and standard deviation of the number of vehiclesfor each frame are represented in the second and third col-umn. For an application without this approach, the fourth,fifth and sixth columns show the estimated total number ofvehicles, its mean and standard deviation per frame and thepercentage of error of the estimated number with regard tothe real one. This measure is obtained by subtracting oneto the division of the number of vehicles estimated and thereal ones. Seventh, eighth and ninth columns represents thesame measures according to the inclusion of our approach inthe object detection model. It can be noticed that the SOMmodel improves the detection of the vehicles by reducingthe overlapping between them (error columns in the table).Thus, subsequent phases in the behavior analysis will havemore reliable and detailed information to track vehicles anddetermine suspicious events.

In the tracking phase, some improvements using thisapproach are also observed. A version of the Kalman fil-ter for multiple objects has been implemented, to carry outthe tracking process. This method has as input the outputobtained by this methodology. Thus, Fig. 7 shows a qualita-tive comparison between several trajectories obtained withour approach (right column) and without it (left column) for

the US-101 sequence from a perspective view. The use ofthis methodology produces straighter lines in the trajectorywithout too variations in the centroid of the objects within thedriving lane paths. However, without this proposal, the initialsegmentation causes many overlaps between objects whichmakes the tracking algorithm generates sometimes, only onepath for multiple objects (object with ID 115), or even thatthere is a interchange of trajectories between objects (ID 161and 168), fact which is totally inadvisable.

5 Conclusions

In this work, a methodology to improve the results of theobject detection algorithms is presented. It consists of a post-processing method based on a self-organizing neural network(SOM) to detect anomalous objects according to their shape,and an occlusion handling procedure which evaluates andidentifies simple objects integrated in an overlapped one. Thisproposal is applied on sequences with rigid objects and highmovement frequency.

Several traffic scenes have been tested to check the feasi-bility of the system, obtaining suitable and successful results.The SOM approach manages to adapt to the movement ofobjects and captures the variability of the object form in everyzone of the sequence. Furthermore, the occlusion handling

123

R. M. Luque-Baena et al.

correctly discriminates in most cases between overlappedobjects and other type of objects with fewer occurrence inthe scene (trunks, motorcycle). This overlapping is correctedby extracting the simple objects after applying a Gaussianmixture model in the binary image.

It is remarkable that improving the segmentation of theforeground of the scene and, accordingly, detecting mov-ing objects as accurately as possible, makes the subsequentstages, object tracking and behavior analysis, much morereliable and robust. Therefore, it is possible to infer moreplausible facts from the scene.

Acknowledgments This work is partially supported by the ProjectsTIN2011-24141 from MEC-SPAIN and TIC-6213 and TIC-657 (Juntade Andalucía). Additionally, the authors acknowledge support throughGrants TIN2010-16556 from MICINN-SPAIN and P08-TIC-04026(Junta de Andalucía). All of them include FEDER funds.

References

Álvarez S, Llorca D, Sotelo M (2014) Hierarchical camera auto-calibration for traffic surveillance systems. Expert Syst Appl 41(4,Part 1):1532–1542

Cancela B, Ortega M, Fernández A, Penedo MG (2013) Hierarchicalframework for robust and fast multiple-target tracking in surveil-lance scenarios. Expert Syst Appl 40(4):1116–1131

Cao X, Suganthan P (2003) Video shot motion characterization based onhierarchical overlapped growing neural gas networks. MultimediaSyst 9:378–385

Cappé O, Douc R, Guillin A, Marin JM, Robert CP (2008) Adap-tive importance sampling in general mixture classes. Stat Comput18:447–459

Cinque L, Foresti GL, Lombardi L (1999) Multiresolution approach forimage coding and transmission for traffic scene monitoring via theweb. Proc SPIE 3964:344–349

Cucchiara R, Piccardi M, Mello P (2000) Image analysis and rule-basedreasoning for a traffic monitoring system. Intell Transp Syst IEEETrans 1(2):119–130

de Angulo V, Torras C (2008) Learning inverse kinematics: reducedsampling through decomposition into virtual robots. IEEE TransSyst Man Cybern Part B: Cybern 38(6):1571–1577

Diaz-Huerta CC, Felipe-Riveron EM, Montaño Zetina LM (2014)Quantitative analysis of morphological techniques for automaticclassification of micro-calcifications in digitized mammograms.Expert Syst Appl 41(16):7361–7369

Fabrizio J, Marcotegui B, Cord M (2013) Text detection in street levelimages. Pattern Anal Appl 16(4):519–533

Fathy M, Siyal MY (1995) An image detection technique based on mor-phological edge detection and background differencing for real-time traffic analysis. Pattern Recognit Lett 16(12):1321–1330

Flórez F, García J, García J, Hernández A (2002) Hand gesture recogni-tion following the dynamics of a topology-preserving network. In:Proceedings of the 5th IEEE international conference on automaticface and gesture recognition, pp 318–323

Frezza-Buet H (2008) Following non-stationary distributions by con-trolling the vector quantization accuracy of a growing neural gasnetwork. Neurocomputing 71:1191–1202

Fritzke B (1995) A growing neural gas network learns topologies. AdvNeural Inf Process Syst 7:625–632

Fritzke B (1997) A self-organizing network that can follow non-stationary distributions. In: Proceedings of the international con-ference on artificial neural networks. Springer, pp 613–618

Gangopadhyay A, Chatterjee O, Chatterjee A (2013) Hand shape basedbiometric authentication system using radon transform and col-laborative representation based classification. In: IEEE 2nd inter-national conference on image information processing (ICIIP), pp635–639

García-Rodríguez J, Domínguez E, Angelopoulou A, García-chamizoJM (2011) Video and image processing with self-organizing. LectNotes Comput Sci 6692:98–104

Gonzalez-Castro V, Debayle J, Pinoli JC (2014) Color adaptive neigh-borhood mathematical morphology and its application to pixel-level classification. Pattern Recognit Lett 47(0):50–62

Göppert J, Rosenstiel W (1997) The continuous interpolating self-organizing map. Neural Process Lett 5(3):185–192

Gritsch G, Donath N, Litzenberger M (2009) Night-time vehicle classi-fication with an embedded, vision system. In: Proceedings of theIEEE international conference on intelligent transportation sys-tems, pp 1–6

Hoogerheide L, Opschoor A, van Dijk HK (2012) A class of adaptiveimportance sampling weighted EM algorithms for efficient androbust posterior and predictive simulation. J Econom 171:101–120

Kastrinaki V, Zervakis M, Kalaitzakis K (2003) A survey of videoprocessing techniques for traffic applications. Image Vis Comput21(4):359–381

Kohonen T (1982a) Clustering, taxonomy, and topological maps ofpatterns. In: Proceedings of the 6th international conference onpattern recognition, pp 114–128, IEEE

Kohonen T (1982b) Self-organized formation of topologically correctfeature maps. Biol Cybern 43(1):59–69

Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

Kohonen T, Schroeder MR, Huang TS (eds) (2001) Self-organizingmaps, 3rd edn. Springer, New York

Kohonen T (2013) Essentials of the self-organizing map. Neural Netw37(0):52–65 Twenty-fifth anniversay commemorative issue

Lai J, Huang S, Tseng C (2010) Image-based vehicle tracking and clas-sification on the highway. In: Proceedings of the IEEE ICGCS, pp666–670

Lai A, Yung NHC (1998) A fast and accurate scoreboard algorithmfor estimating stationary backgrounds in an image sequence. In:Circuits and systems, 1998. ISCAS ’98. In: Proceedings of the1998 IEEE international symposium on, vol 4, pp 241–244 (1998)

Lin SP, Chen YH, Wu BF (2006) A real-time multiple-vehicle detectionand tracking system with prior occlusion detection and resolution,and prior queue detection and resolution. Proc Pattern Recognit1:828–831

Lipton A, Fujiyoshi H, Patil R (1998) Moving target classification andtracking from real-time video. In: Applications of computer vision,1998. WACV ’98. In: Proceedings of the 4th IEEE workshop on,pp 8–14

López-Rubio E, Luque-Baena RM (2011) Stochastic approximation forbackground modelling. Comput Vis Image Underst 115(6):735–749

Luque R, Domínguez E, Palomo E, Muñoz J (2010) An art-type networkapproach for video object detection. In: European symposium onartificial neural networks, pp 423–428

Nagai A, Kuno Y, Shirai Y (1996) Surveillance system based on spatio-temporal information. In: Image processing, 1996. In: Proceedingsof the international conference on, vol 1, pp 593–596

Na W, Tao W (2012) Proximal support vector machine based pave-ment image classification. In: IEEE 5th international confer-ence on advanced computational intelligence (ICACI), pp 686–688

123

A self-organizing map to improve vehicle detection

Padoan A Jr, De A Barreto G, Araújo A (2003) Modeling and pro-duction of robot trajectories using the temporal parametrized selforganizing maps. Int J Neural Syst 13(2):119–127

Ridder C, Munkelt O, Kirchner H (1995) Adaptive background esti-mation and foreground detection using kalman-filtering. In: Pro-ceedings of the international conference on recent advances inmechatronics, ICRAM, pp 193–199

Robert K (2009) Night-time traffic surveillance: a robust frameworkfor multi-vehicle detection, classification and tracking. In: Pro-ceedings of IEEE AVSS, pp 1–6

Soille P (2003) Morphol Image Anal: Princ Appl, 2nd edn. Springer,New York

Stauffer C, Grimson W (2000) Learning patterns of activity using realtime tracking. IEEE Trans Pattern Anal Mach Intell 22(8):747–767

Stauffer C, Grimson W (1999) Adaptive background mixture modelsfor real-time tracking. In: IEEE computer society conference onComputer vision and pattern recognition, vol 2, pp 246–252

Van Deusen PC, Irwin LL (2012) A robust weighted EM algorithm foruse–availability data. Environ Ecol Stat 19:205–217

Valero S, Chanussot J, Benediktsson J, Talbot H, Waske B (2010)Advanced directional mathematical morphology for the detectionof the road network in very high resolution remote sensing images.Pattern Recognit Lett 31(10):1120–1127

Walter J, Ritter H (1996) Rapid learning with parametrized self-organizing maps. Neurocomputing 12(2–3):131–153

Yu X, Beucher S, Bilodeau M (1992) Road tracking, lane segmenta-tion and obstacle recognition by mathematical morphology. In:Proceedings of the intelligent vehicles symposium, pp 166–172

Zingman I, Saupe D, Lambers K (2014) A morphological approach fordistinguishing texture and individual features in images. PatternRecognit Lett 47(0):129–138

Zivkovic Z, van der Heijden F (2006) Efficient adaptive density estima-tion per image pixel for the task of background subtraction. PatternRecognit Lett 27(7):773–780

Zulkifley MA, Moran B (2012) Robust hierarchical multiple hypoth-esis tracker for multiple-object tracking. Expert Syst Appl39(16):12,319–12,331

123