Unsupervised Video Analysis for Counting of Wood in River during Floods

10
Unsupervised Video Analysis for Counting of Wood in River during Floods Imtiaz Ali and Laure Tougne Universit´ e de Lyon, CNRS Universit´ e Lyon 2, LIRIS, UMR5205, F-69676, France Abstract. This paper presents a framework for counting the fallen trees, bushes and debris passing in the river by monocular vision. Automatic segmentation and recognition of wood in the river is relatively new field of research. Unsupervised segmentation of the wooden objects moving in the river has been developed. A novel method is developed for the separation of wood from water waves. The counting of number of fallen trees in the river is realized by tracking them in the consecutive continu- ous frames. The algorithm is tested on multiple videos of floods and the results are evaluated both qualitatively and quantitatively. 1 Introduction Automatic video surveillance addresses the challenge to perform real-time analy- sis and constant monitoring of activity [1]. This automation helps in the improve- ment of the safety in our surroundings. The remote surveillance of unattended environments is often done in places like airports, highways, railway infrastruc- tures, parking lots and on the roads. In most of the cases the surveillance sys- tems detect the potentially threatening incidents. The monitoring of rivers using cameras is done from many years. During the floods, there are large numbers of fallen trees, debris, branches and roots of trees carried by water. These fallen trees and bushes block the flow of water in mountains. Moreover they threat the bridges and dams as these fallen trees are accumulated over the period of time during floods. The monitoring systems installed over the rivers are usually manually supervised. Automatic detection of these trees will help to take pre- ventive measures during floods. The statistics of the fallen trees carried every year with floods will help in finding the maximum number of wood passing in the river every year and the time of the year during which one could expect the flooding. The number of fallen trees and wood in the river requires image segmentation and motion tracking of the fallen trees inside the water. The de- tection of wood inside the river is the example detection of object motion within moving background. The videos we study in this paper are from the camera installed on the river Ain (France). Figure 2 gives some examples of extracted images from such videos in the first row. The complex natural environments often have many constraints. Such con- straints can be classified in two groups, the constraints for detection and recog- G. Bebis et al. (Eds.): ISVC 2009, Part II, LNCS 5876, pp. 578–587, 2009. c Springer-Verlag Berlin Heidelberg 2009

Transcript of Unsupervised Video Analysis for Counting of Wood in River during Floods

Unsupervised Video Analysis for Counting of

Wood in River during Floods

Imtiaz Ali and Laure Tougne

Universite de Lyon, CNRSUniversite Lyon 2, LIRIS, UMR5205, F-69676, France

Abstract. This paper presents a framework for counting the fallen trees,bushes and debris passing in the river by monocular vision. Automaticsegmentation and recognition of wood in the river is relatively new fieldof research. Unsupervised segmentation of the wooden objects movingin the river has been developed. A novel method is developed for theseparation of wood from water waves. The counting of number of fallentrees in the river is realized by tracking them in the consecutive continu-ous frames. The algorithm is tested on multiple videos of floods and theresults are evaluated both qualitatively and quantitatively.

1 Introduction

Automatic video surveillance addresses the challenge to perform real-time analy-sis and constant monitoring of activity [1]. This automation helps in the improve-ment of the safety in our surroundings. The remote surveillance of unattendedenvironments is often done in places like airports, highways, railway infrastruc-tures, parking lots and on the roads. In most of the cases the surveillance sys-tems detect the potentially threatening incidents. The monitoring of rivers usingcameras is done from many years. During the floods, there are large numbersof fallen trees, debris, branches and roots of trees carried by water. These fallentrees and bushes block the flow of water in mountains. Moreover they threatthe bridges and dams as these fallen trees are accumulated over the period oftime during floods. The monitoring systems installed over the rivers are usuallymanually supervised. Automatic detection of these trees will help to take pre-ventive measures during floods. The statistics of the fallen trees carried everyyear with floods will help in finding the maximum number of wood passing inthe river every year and the time of the year during which one could expectthe flooding. The number of fallen trees and wood in the river requires imagesegmentation and motion tracking of the fallen trees inside the water. The de-tection of wood inside the river is the example detection of object motion withinmoving background.

The videos we study in this paper are from the camera installed on the riverAin (France). Figure 2 gives some examples of extracted images from such videosin the first row.

The complex natural environments often have many constraints. Such con-straints can be classified in two groups, the constraints for detection and recog-

G. Bebis et al. (Eds.): ISVC 2009, Part II, LNCS 5876, pp. 578–587, 2009.c© Springer-Verlag Berlin Heidelberg 2009

Unsupervised Video Analysis for Counting of Wood in River during Floods 579

nition, and the constraints of tracking of moving wood in the river. The detectionand recognition of wood depend on the luminosity difference between wood andwater. The flow of water in rivers contains turbulences and waves that are moreprominent in case of floods. In addition to that the cloud movement in sky thatcauses changes in the brightness over the surface of river. The difference of theluminosity of the waves and the wood is not very important. Moreover, theshadows of surrounding trees and buildings make the situation more difficult forcorrect foreground/background extractions. The image segmentation is not easyin the presence of some moving tree braches in front of the surveillance camera.The bridges in the monitored scene also produce strong shadows over water sur-face. Consequently, in the moving background the objects can only be detectedby virtue of their existence in the multiple consecutive frames. Furthermore,counting of the number of fallen trees that are passing through some strategi-cally important places during flood requires that the waves present in the rivermust be separated from the fallen tree or wood. The tracking of the foregroundobjects in this case has some constraints too. The water waves and wood thatare moving with the same speed make it difficult to distinguish between the twoones. The motions of wood and water waves inside the river are not linear. Forgood tracking of the moving objects it is necessary that the objects should bepresent in the multiple consecutive frames. The water waves during the floodsare so large that they submerge the fallen tree branches, and the size of theobjects does not remain the same in consecutive frames. In case of small woodpieces or debris, these water waves totally submerge them and they appear inone frame and remain submerged in the next two or three frames. Finally, due toremote location of the monitoring scene and limitations of transfer rate of datanetworks, the frame rate (fps) in the video is very low (∼4 fps). Consequently,the object motion is larger in consecutive frames.

This paper is organized as follows. Section 2 presents the review of relevantworks in similar situations and highlights the constraints and technical difficultiesin our case. In section 3, the proposed methodology for detection is described. Insection 4 the experimental results and comparison of the results with statisticaldata obtained manually is presented.

2 Related Works

Automatic segmentation and recognition of wood in the river is relatively newfield of research. There are not many articles in the literature in such type ofapplication. In this section we present the previous works on the detection offoreground objects in the non-stationary backgrounds.

For foreground detection the adaptive background model is proposed fornon-stationary backgrounds [2]. The background model plays the role of ref-erence image in background subtraction techniques. It is constructed by adapt-ing the changes during the training period. The construction of backgroundmodel is based on the different image features (spectral features, spatial featuresand temporal features). For construction of background model based on the

580 I. Ali and L. Tougne

spectral characteristics, Gaussian Mixture Model (GMM) method is used bymost of the researchers [3,4,5], where one or more Gaussians are used to representthe spectral features at each background pixel. All these methods are used insituations of very small dynamic background movements. The GMM methodleads to misclassification when the background scene is complex [6], [7].

In [1] the method of spatio-temporal filtering is proposed for compensationof the limitations of region-based blocks of images. This method is applied fordetection of swimmers in the swimming pool. The spatial features are extractedby gradient analysis, which gives the information of movements in the images.The mixture of spatial features with spectral features extracted from the imagefor foreground extraction is used by [8]. Our method is inspired from these butgradient analysis alone is not sufficient in our case because the water waves andwood both have strong gradients.

For moving object detection in the video temporal characteristics are veryimportant. The optical flow technique proposed by [9] is largely used for thispurpose. Many researchers use this technique. [10,11] used the estimation of theconsistency of optical flows over a short duration of time. But the consistencyof local optical flows requires small displacements from one frame to another. Inour case, the videos are having very low frame rate (∼4 fps), due to which thereis large displacement of wood from one frame to other and also the motion ofthe wood is not linear.

Hence for object segmentation and recognition, spectral and spatial featuresmust be incorporated with temporal features. Notice that due to dynamic natureof our application we cannot construct background model. As a matter of fact thebackground is dynamic with water waves and wood moving with the same speed.A framework is proposed in next section, which uses the spectral and spatialfeatures for detection and segmentation and temporal features for tracking theobjects in the video for counting the number of fallen tree, branches or stemdepending on their appearance in the videos.

3 Proposed Methodology

The detection of wood in river contains two steps: image segmentation and recog-nition of wood. The outdoor environments have constraints of sudden appear-ance/disappearance of sunshine. This fact is shown in Figure 2. First row rep-resents the original images from the video of flood. The presence of bridge (topleft corner of images), moving branches of tree before camera (right middle por-tions of images) and the shadows of surrounding trees over the river are evidentfrom these images. The proposed methodology for detection of wood in rivercomposed of two major parts: 1) detection and recognition of wood in river, 2)separation of wood and water waves by tracking them in consecutive sequenceof images, with the architecture as presented in Figure 1. The following twosubsections describe the proposed methodology in details.

Unsupervised Video Analysis for Counting of Wood in River during Floods 581

Temporal difference (df2)

Intensity Mask Gradient Mask Intensity Mask Gradient Mask

Resulting contoursResulting contours

Resumed Image

(MI) (MG) (MI) (MG)

MI MG df1 MI MG df2

Find barycenter of center of mass Find barycenter of center of mass

Frame 1 Frame 2 Frame 3

Temporal difference (df1)

Fig. 1. Outline of proposed methodology for detection and segmentation of wood inthe river water

3.1 Detection and Recognition of Wood

The automatic detection of wood begins from automatic segmentation of image.The flow chart in Figure 1 shows that each frame is treated for two segmentationprocesses. One is named as intensity mask (MI) and other is gradient mask(MG). They are the result of images segmentation based on intensity histogramthresholding and edge-based gradient technique respectively.

Intensity Mask (MI). Gray-level histograms of image intensity are calculatedfor every incoming frame. Histogram thresholding is among the most populartechniques for segmenting gray-level images and several strategies have beenproposed to implement it [12], [13]. In fact, peaks and valleys of the 1D brightnesshistograms can be easily identified, respectively, with objects and backgroundsof gray-level images. In the absence of sun shine the water in the river and woodhas a difference intensity levels. But, the intensity of water waves and woodresemble one another both in gray level and in color RGB values. This fact isshown in Figure 2.

The Fisher linear discriminate technique is used for histogram thresholding.This technique produces very good segmentation of images in the absence ofsunshine. In Figure 2, the first two images in second row are the results of ouralgorithm in the presence of sunlight. The last two images in second row are theresults of intensity based segmentation in the absence of sunshine that showsthe efficiency of this technique Figure 2.

Gradient Mask (MG). The spectral analysis, as described above, is work-ing well in the absence of sunshine. In the presence of sunshine the shadowsof surrounding trees and building over the river make the segmentation basedon the histogram thresholding very difficult. Therefore it is necessary to inte-grate spatial features of the image with spectral features to avail meaningful

582 I. Ali and L. Tougne

Fig. 2. The representation of various steps involved in the segmentation, images in firstrow represent original images of moving wood in water, the images in the second roware intensity masks, the images in the third row show gradient mask of correspondingimages and resulting combinations of all segmentations are shown in the last row

segmentation. By this it means that wood must be separated from the water.The branches and debris moving under the shadows of the surrounding treescannot be separated from each other. So for this reason, segmentation by de-tecting the edges among regions is applied with intensity histogram thresholding.This approach has been extensively investigated for gray-level images [12], [13].Algorithms have also been proposed for the detection of discontinuities withincolor images [14]. This technique gives the image segmentation based on spatialfeatures. The resulting image is named as gradient mask (MG) in the Figure 1.The resulting images of this method are shown in third row of Figure 2.

Temporal Difference (df). The image segmentation is done by two differentmethods. The histogram thresholding technique based on the spectral analysisseparates the wood from water in the absence of sunshine but fails to detectwoods under the shadows of surrounding trees in the presence of sunshine. Thegradient analysis separates the objects in motions from the rest of the scene. Asin our case, both water waves and moving wood have strong gradients, resultingimage contains both of them. The advantage of using gradient mask is that itdetects the objects under the shadows of surrounding trees and buildings. Thewood and water waves can only be separated from one another by virtue of their

Unsupervised Video Analysis for Counting of Wood in River during Floods 583

existence in the consecutive frames of video. The majority of water waves thatare dispersed in two consecutive frames are automatically suppressed by takingsuch inter-frame differences.

The Resulting Combination. The intersection of the spectral segmentationbased intensity mask, the edge based gradient mask and temporal inter-framedifference are combined in a manner to give a resulting image. This image is abinary image that represent the detected wood along with some water waves.The combination images are shown in the last row of Figure 2.

Fig. 3. The moving fallen tree as original video, the combination image showing thedetected contours of tree

3.2 The Separation of Wood and Water Waves

Here the main goal is to detect and count the number of fallen trees and debrisin the flood that passes through the river. The water waves in the flood, in theabsence of sunshine, resemble the wood. So, for counting the number of fallentrees and debris, the decision cannot be made on single image segmentation.The water waves and wood forming contours must be tracked in the consecutiveframes of video. First constraints of tracking the wood is that the floating fallentrees are not having the same length from one frame of video to another. Thewater waves in the flood often submerge the wood. Secondly, the movementsof fallen trees are not linear and also the water waves exist for longer durationin the videos, therefore, sometimes detected as wood in the many consecutiveframes. So to avoid loosing the counts of the wood it is important to find somemechanism than minimizes the false detection.The method of counting the fallentrees is explained in this section. Figure 3 shows the fallen tree in the river withcorresponding combination image.

The Barycentre of Mass Centers. The fallen trees in the river have manybranches and appear in the video as different closed contours as shown in Fig-ure 3. To cope with first constraint the multiple contours of the same objectmust be grouped together to avoid false detection. Every resulting contour hassome area and center of mass. So the centers of mass of the contours are groupedon the basis of closeness of them in the image to give barycenter of mass centers.These barycenters of mass centers are stored in the summery image.

584 I. Ali and L. Tougne

Counting the Number of Woods. In order to count the number of fallentrees, bushes, stems of trees, roots and debris that are passing through the river,we propose to represent the presence of barycenters in a “summary image”. Thebarycenter of the object (wave or wood) that is present in the consecutive twoframes make a pair of barycenters in the summery image and a trace is formed onthe summery image. If the object is wood then these centers of masses must bepresent continuously from left to the right of the screen (as motion of river wateris from left to right). This means that if the object is not totally submerged inthe water it will be present in more than four continuous frames. So the wood isdetected and counted on this basis.(see an example of such image in Figure 4).

Fig. 4. Example of resumed image

4 Experimental Results

A monitoring system has been set up on the river Ain France. The videos ofthe flood during recent years are recorded. The number of fallen trees, bushes,branches and roots of trees are counted manually by Geographers. The resultsare qualitatively evaluated by visual inspection. The quantitative evaluation iscomputed as the true positive, false negative and false positive of the wooddetected in the videos. Figure 5 shows a glimpse of some difficult situations.The first scenario presents two very small wood pieces moving at the same time.These two pieces are segmented and counted as two different objects. The secondone shows that the detection is done even if there is shadow.

In addition to qualitative evaluation, Table 1 shows the quantitative evalua-tion in terms of wood pieces actually present and counted as wood, the number ofwood pieces that are not counted by our algorithm and the number of waves thatare detected as wood pieces. The separation of wood and water waves dependson the presence of wood in the consecutive frames. The parameters are tested fordifferent type of situations and different length of wooden objects. To count thewood pieces present in the videos the number of continuous frames are optimizedto five. Geographers obtain the ground truth through visual inspection. Theyhave manually gone through 5400 frames to derive the reported detection rate.

Unsupervised Video Analysis for Counting of Wood in River during Floods 585

Fig. 5. The segmentation of wood on sample frames captured from different challengingscenarios at different time intervals in the absence and presence of sun light. Odd rows:Samples frames captured. Even rows: Corresponding segmentation results.

The algorithm is applied to seven videos of flood. Total duration of seven videosis thirty-six minutes. The number of wood detected is clearly higher percentagethan the number of false detection of waves as wood. The brightness of waves arevery close to wood pieces and have strong gradient, moreover the water waveslast for more than five frames in some cases. If the water waves are continuouslypresent in five frames false detection occurs but the percentage of such falsedetection is not very important. Moreover, the wood pieces sometimes appear insome number of consecutive frames and disappear for one or two frames. Suchtype of wood pieces cannot be detected. The detection rate is nearly 98% whilesuccessful counting rate is 90%. The number of detected wood (Nd), numberof non-detected wood (Npd), number of water waves detected as wood (Nw) ofthe seven videos of total duration of thirty six minutes are summarized in Table1. The numbers of non-detected wood (Npd) are those wooden objects thatappeared in the videos for less than five and more than two consecutive frames.The results are shown in Figure 6, which clearly indicate that the algorithmcount the wooden objects in difficult scenario with high success rate.

586 I. Ali and L. Tougne

Table 1. Quantitative evaluation of proposed algorithm in terms of number of truedetection Nd, number of non-detected wood Npd and number of waves detected aswood Nw

Total frames Duration Nd Npd Nw(min) (%) (%) (%)

Video 1 650 4’00 95 5 6Video 2 900 5’23 91 9 13Video 3 860 5’36 81 19 19Video 4 750 5’11 90 10 7Video 5 550 4’02 76 24 2Video 6 800 5’52 93 7 14Video 7 880 6’05 91 9 19

Total 5390 36’05 90 10 15

Fig. 6. Results of counting the number of wood in videos, white bars represents numberof true wood pieces, number of wood non-detected are represented by black bars,number of waves detected as wood are represented by grey bars

5 Concluding Remarks

In this paper, the problem of automated monitoring based on video surveillancein highly dynamic environment of river has been discussed. The nature of prob-lem is such that a background model can not be created. There is a need of analgorithm that detects the wood by using different features of images. In par-ticular this paper has addressed the two fundamental issues: 1) unsupervisedsegmentation of wood in river 2) the method to count the number of woodenmaterial in river during floods. The first issue has been addressed by using thespectral features of images with spatial features. The two types of features helpin great deal in unsupervised segmentation of wood and water waves in the riverfrom rest of the water. As the water waves and wooden objects both are presentin the segmented image, the separation of wood from water waves need trackingthe wooden objects in the consecutive frames. The fallen tree or bushes can onlybe detected if some part of it remains above the water level in the river. If thewood submerges in some frames and appear in next frame then such wooden

Unsupervised Video Analysis for Counting of Wood in River during Floods 587

objects cannot be detected. Moreover, during heavy cloudy environment the wa-ter waves resemble the wood in color. The water waves during flood stays longertime, produces false detection of them as wood. The experimental results indi-cate that the proposed algorithm detect and count the number of wood withreasonably good percentage.

References

1. Eng, H.L., Wang, J., Wah, A.H.K., Yau, W.: Robust human detection within ahighly dynamic aquatic environment in real time. IEEE Tran. on Image Process-ing 15, 1583–1600 (2006)

2. Li, L., Huang, W.M., Gu, I.H., Tian, Q.: Statistical modeling of complex back-ground for foreground object detection. IEEE Trans. Image Process. 13, 1459–1472(2004)

3. Wren, C., Azarbayejani, A., Darrell, T., Pentland, A.: Pfinder: realtime trackingof the human body. IEEE Trans. Pattern Anal Machine Intell. 19, 780–785 (1997)

4. Vacavant, A., Chateau, T.: Realtime head and hands tracking by monocular vision.In: IEEE International Conference on Image Processing 2005, ICIP 2005 (2005)

5. Stauffer, C., Grimson, W.: Learning patterns of activity using real-time tracking.IEEE Trans. Pattern Anal. Machine Intell. 22, 747–757 (2000)

6. Boult, T.: Frame-rate multi-body tracking for surveillance. In: DARPA Image Un-derstanding Workshop (1998)

7. Gao, X., Boult, T., Coetzee, F., Ramesh, V.: Error analysis of background adoption.In: IEEE Conf. Computer Vision and Pattern Recognition, June 2000, pp. 503–510(2000)

8. Mittal, A., Paragios, N.: Motion-based background subtraction using adaptive ker-nel density estimation. In: CVPR, pp. 302–309 (2004)

9. Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artificial Intelligence 17,185–203 (1981)

10. Iketani, A., Nagai, A., Kuno, Y., Shirai, Y.: Deteching persons on changing back-ground. In: Int. Conf. Pattern Recognition, vol. 1, pp. 74–76 (1998)

11. Wixson, L.: Detecting salient motion by accumulating directionary-consistent flow.IEEE Tran. Pattern Anal. Machine Intell. 774–780(22) (August 2000)

12. Fu, K., Mui, J.: A survey on image segmentation. Pattern Recognition 13, 3–16(1981)

13. Rosenfeld, A., Kak, A.: Digital picture processing, 2nd edn., vol. 2. Academic Press,New York (1982)

14. Zhao, A.: Robust histogram-based object tracking in image sequences. Digital Im-age Computing Techniques and Applications, 45–52 (2008)