A Bayesian framework for online interaction classification

6
A Bayesian Framework for Online Interaction Classification S. Maludrottu, M. Beoldo, M. Soto Alvarez and C. Regazzoni Department of Biophysical and Electronic Engineering University of Genova Via All’Opera Pia 11A - Genova, Italy {{gandalf,mbeoldo,soto,carlo}@dibe.unige.it} Abstract Real-time automatic human behavior recognition is one of the most challenging tasks for intelligent surveillance systems. Its importance lies in the possibility of robust de- tection of suspicious behaviors in order to prevent possible threats. The widespread integration of tracking algorithms into modern surveillance systems makes it possible to ac- quire descriptive motion patterns of different human activi- ties. In this work, a statistical framework for human inter- action recognition based on Dynamic Bayesian Networks (DBNs) is presented: the environment is partitioned by a topological algorithm into a set of zones that are used to de- fine the state of the DBNs. Interactive and non-interactive behaviors are described in terms of sequences of significant motion events in the topological map of the environment. Finally, by means of an incremental classification measure, a scenario can be classified while it is currently evolving. In this way an autonomous surveillance system can detect and cope with potential threats in real-time. 1. Introduction Modern surveillance systems can be defined as ”intelli- gent” if they are capable of integrating the ambient intel- ligence paradigm into safety or security applications [11]. A lot of works have been devoted in the last decade to link traditional computer vision tasks to high-level context aware functionalities such as scene understanding, behavior analysis, interaction classification or recognition of possible threats. The problem of automatic human behavior recognition is one of the most important and widely investigated areas in modern surveillance systems. A real-time detection of suspicious behaviors can be useful to prevent possible dan- gerous situations. Thus, a model for the different human activities has to be built taking into account the natural variability of human behaviors. Since modern tracking applications can provide robust and reliable information about moving entities in the environment the use of motion patterns for behavior anal- ysis and anomaly detection is a common and effective ap- proach: usually the trajectories are grouped according to specific clustering techniques, upon which different human behavior models can be constructed. In [4] a K-means fuzzy trajectories clustering method is proposed. Grounding on the clustered trajectories, an au- tonomous learning algorithm can be instantiated in order to represent specific motion patterns. The work presented in [12] uses connected global trajectories in a multi-camera scenario to detect abnormal behaviors according to the like- lihood of input tracks to the path regions associated to dif- ferent human activities. This work focuses on the recognition of possible inter- actions between couples of persons in an outdoor environ- ment. An interaction between entities is established if the behavior of each involved entity is constrained and affected by those of other entities in the environment. Therefore, in order to achieve a reliable recognition of human behaviors explicit or implicit correlations between activities of inter- acting persons must be correctly learned and identified.In [5] is described a probabilistic syntactic approach for the detection and recognition of interactions between multiple agents using a stochastic activity parsing framework capa- ble to handle parallel input streams. In [10] the task of la- beling interactions is treated as a problem of inference of dependency structures within a model constituted by mul- tiple vector time-series. In [9] multiple body parts of mov- ing persons are analyzed in a bottom-up fashion in order to build an attribute relational graph as a recognition sys- tem for human interactions. In [3] multi-agent behaviors are decomposed into single-agent action threads and clas- sified using a Bayesian framework that encodes shape and trajectory features for moving objects. In [2] it has been proved that a Dynamic Bayesian Net- work model can effectively be adopted to recognize people interactions by analyzing their trajectories in the environ- 1

Transcript of A Bayesian framework for online interaction classification

A Bayesian Framework for Online Interaction Classification

S. Maludrottu, M. Beoldo, M. Soto Alvarez and C. RegazzoniDepartment of Biophysical and Electronic Engineering

University of GenovaVia All’Opera Pia 11A - Genova, Italy

{{gandalf,mbeoldo,soto,carlo}@dibe.unige.it}

Abstract

Real-time automatic human behavior recognition is oneof the most challenging tasks for intelligent surveillancesystems. Its importance lies in the possibility of robust de-tection of suspicious behaviors in order to prevent possiblethreats. The widespread integration of tracking algorithmsinto modern surveillance systems makes it possible to ac-quire descriptive motion patterns of different human activi-ties. In this work, a statistical framework for human inter-action recognition based on Dynamic Bayesian Networks(DBNs) is presented: the environment is partitioned by atopological algorithm into a set of zones that are used to de-fine the state of the DBNs. Interactive and non-interactivebehaviors are described in terms of sequences of significantmotion events in the topological map of the environment.Finally, by means of an incremental classification measure,a scenario can be classified while it is currently evolving.In this way an autonomous surveillance system can detectand cope with potential threats in real-time.

1. Introduction

Modern surveillance systems can be defined as ”intelli-gent” if they are capable of integrating the ambient intel-ligence paradigm into safety or security applications [11].A lot of works have been devoted in the last decade tolink traditional computer vision tasks to high-level contextaware functionalities such as scene understanding, behavioranalysis, interaction classification or recognition of possiblethreats.

The problem of automatic human behavior recognitionis one of the most important and widely investigated areasin modern surveillance systems. A real-time detection ofsuspicious behaviors can be useful to prevent possible dan-gerous situations.

Thus, a model for the different human activities has tobe built taking into account the natural variability of human

behaviors. Since modern tracking applications can providerobust and reliable information about moving entities in theenvironment the use of motion patterns for behavior anal-ysis and anomaly detection is a common and effective ap-proach: usually the trajectories are grouped according tospecific clustering techniques, upon which different humanbehavior models can be constructed.

In [4] a K-means fuzzy trajectories clustering method isproposed. Grounding on the clustered trajectories, an au-tonomous learning algorithm can be instantiated in orderto represent specific motion patterns. The work presentedin [12] uses connected global trajectories in a multi-camerascenario to detect abnormal behaviors according to the like-lihood of input tracks to the path regions associated to dif-ferent human activities.

This work focuses on the recognition of possible inter-actions between couples of persons in an outdoor environ-ment. An interaction between entities is established if thebehavior of each involved entity is constrained and affectedby those of other entities in the environment. Therefore, inorder to achieve a reliable recognition of human behaviorsexplicit or implicit correlations between activities of inter-acting persons must be correctly learned and identified.In[5] is described a probabilistic syntactic approach for thedetection and recognition of interactions between multipleagents using a stochastic activity parsing framework capa-ble to handle parallel input streams. In [10] the task of la-beling interactions is treated as a problem of inference ofdependency structures within a model constituted by mul-tiple vector time-series. In [9] multiple body parts of mov-ing persons are analyzed in a bottom-up fashion in orderto build an attribute relational graph as a recognition sys-tem for human interactions. In [3] multi-agent behaviorsare decomposed into single-agent action threads and clas-sified using a Bayesian framework that encodes shape andtrajectory features for moving objects.

In [2] it has been proved that a Dynamic Bayesian Net-work model can effectively be adopted to recognize peopleinteractions by analyzing their trajectories in the environ-

1

ment.This paper employs a probabilistic human interaction

model as in [2], but unlike their work an incremental classi-fication measure is implemented in order to produce a real-time online classification of evolving interactions. Cou-ples of interacting trajectories are processed by the Instan-taneous Topological Map algorithm to automatically createa partitioning of the environment. This partitioning is sub-sequently used to define the states of the DBN. The trajec-tories of interacting persons, along with time and speed in-formation, are encoded into the model where conditionalprobability densities are learned in order to obtain a sta-tistical description of human interactions. The addition ofspeed information to the model has been found useful toachieve higher classification rates. The proposed frame-work is capable of a correct recognition of trajectories ofvariable length.

The paper is organized as follows: in Section 2 the topo-logical partitioning algorithm is briefly introduced; in Sec-tion 3 the proposed Bayesian framework for human behav-ior learning and recognition is presented; in Section 4 theproposed accumulative metric for interaction recognition isdescribed; the experimental results are shown in Section 5.Finally the conclusions are drawn in Section 6.

2. Topological PartitioningIn this work a context generator (described in Sect. 5)

has been used to automatically produce realistic tracks ofinteractive entities in a parking lot scenario. In order to pro-vide a manageable description of the virtual environment, ithas been decomposed into a set of subparts or zonesZ1...Zn

(see Fig.1); a partitioning of the environment using a regu-lar grid can be easy to implement but it does not take intoaccount specific topological elements such as walls, gates orobstacles of unknown shape in the environment. A topolog-ical algorithm is used to represent spatial information intothe abstract domain of a map. In this way dimensionalityreduction can be achieved; at the same time the topology ofthe data distribution remains unaltered. Various techniquescan be adopted to produce topology-representing structures,such as Self-Organizing Maps [7] or Growing Neural Gas[8].

The Instantaneous Topological Map (ITM) algorithmhave been selected for this task (as in [2]) since it has beendemonstrated that ITM is the most suitable approach for theformation of maps of state spaces whose exploration occursalong regular trajectories [6]. Given as input a set of simu-lated trajectories t1...tn, the ITM algorithm produces a de-scription of the virtual environment as a graph G = (N,E)where N is the set of nodes describing different parts of theenvironment and E is the corresponding set of links betweennodes (edges) that takes into account the spatial connectionsof the different subparts of the environment.

From this graph a map is produced as a set of polygonalzonesZ1...Zn whose average dimension is related to the pa-rameter τ that defines the minimum possible distance valueused by the ITM algorithm for the creation of new nodes.Thus, a relatively low value of τ (τ = 400) produces a finegranularity of the output map, whose nodes can be consid-ered corresponding to 2m×2m cells in the real world whilean higher value of τ (τ = 2000) produces a more coarse map(see Figure 1) roughly corresponding to 6m× 6m cells.

(a) τ = 400

(b) τ = 2000

Figure 1. Topological partitioning of the environment at differentcoarseness levels using ITM algorithm

3. Probabilistic FrameworkExploiting the partitioning of the environment into a set

of n zones Z1...Zn, the movement of each entity perceivedin the scene can be described in terms of events, defined aszone transitions; events are defined by a time label t thatregisters the time instant at which the zone change has beendetected, by the local labels i, j = {1...n} of the two neigh-boring zones interested by the movement of the entity (ori-

gin and destination zone) and by a global label L = {P,C}that takes into account which specific entity performed thezone change. Adopting the formalism of [1] the first entityentered in the scene is labeled as the proto-entity (P )andthe second entity is labeled as the core-entity (C). There-fore, if the proto-entity moves from zone Zi to zone Zj attime tP the correspondent event will be recorded as εP (i,j)

tP .Similarly, if the core-entity moves from zone Zm to zoneZp at time tC the correspondent event will be recorded asεC(m,p)tC .

The states of the DBN are two random variables EPt andECt whose values are respectively proto-events εP (i,j)

tP andcore-events εC(m,p)

tC . Therefore EPt and ECt are discrete ran-dom variables which can assumeNE =

∑ni=1 cardi values,

where cardi is the number of nodes connected to the i-thzone.

The proposed DBN model aims to provide a compactstatistical description of interactions by storing the internalrepresentation of cause-effect relationships between eventsproduced by moving entities in the environment. These re-lationships, according to the neurophysiological model ofsecond order neural patterns[1] are collected as triplets ofevents {εPtP , ε

CtC , ε

Pt } or {εCtC , ε

PtP , ε

Ct }: in this way a causal

relationship between two entities takes into account the ini-tial state of one entity (proto εPtP or core εCtC ) and how thisstate changes (εPt ,εCt ) as a consequence of the behavior ofthe other interacting entity (εCtC ,εPtP ). Since the triplets storeinformation about causal relationships between interactingentities, the events must be ordered according to the timelabel: it must hold that t ≥ tC ≥ tP for the proto-entityand t ≥ tP ≥ tC for the core-entity. This collection ofpast causal relationships between the system and the envi-ronment can be used to obtain non parametric estimates ofprobability density functions:

p(εPt |εCt−tC , εPt−tP ) (1)

p(εCt |εPt−tP , εPt−tC ) (2)

Those causal relationships between interacting entitiesare encoded into the DBN in the following way: eachtraining sequence is composed by a vector of proto andcore events E = {εPtP1

, εCtC1, εPtP2

, εCtC2, ..., εPtPN

, εCtCN}

where tP1 ≤ tC1 ≤ tP2 ≤ tC2 ≤ ... and εPtPN, εCtCN

arethe last proto and core events before one of the objectsleaves the scene. Every time three consecutive events{εP (i,j)

t−tP , εC(m,n)

t−tC , εP (j,k)t } are observed, the frequency of

occurrence of the triplet is updated.Then, the estimate of the probability of occurrence of an

event εP (j,k) given that it has been preceded by two eventsεC(m,n) and εP (i,j), i.e. p(εP (j,k)|εC(m,n), εP (i,j)), is ob-tained by normalizing the frequency of occurrence of eachtriplet over the last event εP (i,j). Time and speed informa-tion are extracted during the training process. The temporal

occurrence of the events is stored for the triplet of eventsto derive the temporal conditional probability distributions(CPDs). In particular, the time from the latest event per-formed by the other person is memorized into a time his-togram (i.e. ∆tC = t − tC if the triplet is {εP , εC , εP }and ∆tP = t − tP if the triplet is {εC , εP , εC}) for eachgroup of events observed. These two temporal distributionare necessary to represent the reaction time of one entity toan action of the other interacting entity and they are mod-eled with Gaussian Mixture Models (GMMs), i.e.:

p(∆tC |εP (i,j), εC(m,n), εP (j,k)) =

Nm∑i=1

πiN(∆tC |µi,Σi)

(3)

p(∆tP |εC(m,n), εP (i,j), εC(n,p)) =

Nm∑i=1

πiN(∆tP |µi,Σi)

(4)whereNm is the number of modes of the GMM. The GMMis estimated from observed data using the algorithm pro-posed by Figueiredo and Jain in ??.

In a similar way the velocity of each entity during the in-teraction with other entities in the environment is also storedinto a speed histogram to produce speed CPDs. This canbe done by considering the time ∆tP = t − tP for the{εP ,εC ,εP } triplet and ∆tC = t − tC for {εC , εP , εC}.These temporal intervals can be used to respectively rep-resent the speed of the proto-entity and the core-entity andthey can be described (similarly to Equations 3 and 4) withtwo GMMs as:

p(∆tP |εP (i,j), εC(m,n), εP (j,k)) =

Nm∑i=1

πiN(∆tP |µi,Σi)

(5)

p(∆tC |εC(m,n), εP (i,j), εC(n,p)) =

Nm∑i=1

πiN(∆tC |µi,Σi)

(6)In Figure 3 are shown some examples of time and speedhistograms derived from training data and the correspond-ing GMM models.

4. Interaction RecognitionAn accumulative measure is proposed for the online in-

teraction classification task. This measure exploits the in-formation encoded in the proposed DBN model. The modelparameters Θi of each interaction i : i = 1...NI , where NI

is the number of considered interactions, are derived usinga set of couples of trajectories. After that the measure µi

j(t)

is computed every time a new event εPt or εCt is detected attime t:

µi1(t) = µi

1(t−tC) + p(ePt−tC , eCt−tC , e

Pt |Θi) (7)

(a) Time histogram (motion) (b) Time histogram (run/walk)

(c) Speed histogram(Meet/Leave behavior)

(d) Speed histogram(Run/Walk behavior)

Figure 2. Temporal distributions (a), (b) and speed distributions(c),(d) for different behaviors described in Section 5

µi1(t) = µi

1(t−tP ) + p(eCt−tP , ePt−tP , e

Ct |Θi) (8)

where µi1(t−tC), µ

i1(t−tP ) are the measures computed at

the time of occurrence of the previous proto and core-events and the expression p(ePt−tC , e

Ct−tC , e

Pt |Θi) indicates

the probability that the observed triplet of events belongs tothe i-th interaction model.

The metric µ1 takes into account only informationabout zone transitions Zi → Zj . In order to includean evaluation of temporal information into the classifica-tion step, a temporal CPD can be estimated by multiply-ing p(εP (j,k), εC(m,n), εP (i,j)) with the probability densityof (3) that accounts for the temporal confidence betweenevents εP (j,k) and εC(m,n). A specular procedure is appliedfor core-entities for which p(εP (n,p), εC(i,j), εP (m,n)) and(4) are used. In this way a new accumulative measure µ2

can be defined as follows:

µi2(t) = li2(t−tC) + p(ePt−tC , e

Ct−tC , e

Pt ,∆t

C |Θi) (9)

µi2(t) = li2(t−tP ) + p(eCt−tP , e

Pt−tP , e

Ct ,∆t

P |Θi) (10)

Similarly, to encode velocity information, a speed CPDcan be estimated by multiplying p(εP (j,k), εC(m,n), εP (i,j))and p(εC(p,q), εP (j,k), εC(m,p)) with the densities of (5),(6) that represent the speed confidence for the givenevents {εP (j,k), εP (i,j)} and {εC(p,q), εC(m,p)} respectively.Therefore, a third accumulative measure µ3 can be definedas follows:

µi3(t) = li3(t−tC) + p(ePt−tC , e

Ct−tC , e

Pt ,∆t

P |Θi) (11)

µi3(t) = li3(t−tP ) + p(eCt−tP , e

Pt−tP , e

Ct ,∆t

C |Θi) (12)

The accumulative metrics µj previously described willbe compared in terms of correct recognition of specific

behaviors in Section 5.

Given an accumulative metric µj , at each time t, a clas-sification result for the current interaction can be obtainedaccording to:

i∗t = argmaxiµij(t) (13)

In order to detect, using the proposed online measure,if an interaction is taking place between moving entities inthe environment, interactive models Θi (representative ofspecific behaviors) can be compared to ’null models’ Θ0

i

corresponding to non-interactive behaviors.

5. Experimental results

A typical scenario for an outdoor surveillance applica-tion is composed of people that walk or run in the environ-ment; sometimes an interaction between two or more peo-ple can take place: we define a behavior as interactive (IB)when a relationship between changes of the speed/trajectoryof an entity and the presence or the movements of an-other entity can be established. Similarly, we define non-interactive behaviors (NIBs) if each entity moves indepen-dently from other entities. Interactions between entities canbe further subdivided into dangerous behaviors and safe be-haviors as they involve or not a potential security threat.

Therefore, the goal of an autonomous framework for hu-man interaction analysis is to recognize between differentkinds of interactions but specifically to detect dangerous be-haviors.

In order to validate the proposed framework six differ-ent behaviors have been taken into account: five safe be-haviors (Motion, Running, Run-walk, Meeting, Meet-and-leave) and one dangerous behavior (Guard-intruder). Theconsidered behaviors can be subdivided according to thepresence or the absence of interactions between the entitiesin the environment:

NIBs can be classified according to the speed of movingobjects in the environment: Motion (W) if both entities walkin the environment, Run-walk (R/W) if one entity is walkingand the other one is running, Running (R) if both entities arerunning.

IBs are defined as Meeting (M) if two people walk onetowards the other, and as they meet they can stay together orleave together the scene; Meet-and-leave (M/L): two peoplemeet as before but, after spending some time together theypart from each other; Guard-intruder (G/I): a guard and asuspect are present within the scene; as soon as the guardand the suspect see each other both start to run and the for-mer tries to escape from the guard while the latter tries tocatch the suspect.

Since it can be difficult to obtain a sufficient number ofinteractions from real-world surveillance videos, the DBNs

have been trained using realistic data from a context gen-erator (based on data collected from real videos) that is in-tended to simulate the complete context evolution through astatistical engine in order to create realistic tracks descrip-tive of the different behaviors listed above. The motion ofthe entities in the scene is simulated according to a linearGaussian dynamic model such that xt = xt−1 + vt−1 ·T + at−1 · T 2/2 where xt is the map position, vt and at arerespectively the Gaussian random speed and acceleration. AGaussian error has been added to the simulated trajectoriesin order to produce a realistic tracker output. The entranceof entities into the scene is randomly determined accordingto a Poisson distribution.

Several aspects of the proposed framework have been in-vestigated in order to assess the performances in terms ofcorrect classification of interactions:

• topological coarseness: the parameter τ has been mod-ified in order to investigate which level of coarsenessof the topological partitioning of the environment is themost suitable for the interaction classification task inthe considered outdoor scenario. In Table 1 an exam-ple of how a different partitioning can affect the recog-nition capability of the proposed method is presented.A fine partitioning (τ = 400) have been chosen for theproposed scenario.

• learning: the training of the DBNs has been per-formed using different training sets composed byan increasing number of trajectories nT (nT ={200, 500, 1000, 2500}). If the training time of theDBNs (that grows proportionally to the dimension ofthe training set) is not a critical issue the largest train-ing set (nT = 2500) should be chosen since it pro-duces - in average - better classification results (as canbe seen in Tables 2(a)-(c)).

• recognition measure: three recognition measures, asdefined in Section 4, have been compared. Those mea-sures take into account, respectively, only zone transi-tion Zi → Zj (µ1), zone transition and a time confi-dence measure (µ2) or zone transition and speed con-fidence measure (µ3).

In this experiment at least 300 interactions per type havebeen used to test the proposed framework. The classifica-tion results, presented in Tables 2(a)-(c), show the capabilityof the proposed framework to recognize different interac-tions between entities in the environment. In Table 3 an ex-ample of confusion matrix of classification results is shown.

The recognition of previously defined behaviors consid-ering only zone transition Zi → Zj (µ1) produces goodresults only for the classification of IBs but has poor perfor-mances on NIBs because those behaviors produce similartrajectories.

µ1 µ2 µ3

τ = 400 91.1% 86.3% 99.7τ = 2000 87.2% 79% 99.6

Table 1. Correct recognition of guard-intruder interaction usingtopological maps at different coarseness levels (nT = 1000)

Adding to the recognition metric the temporal confi-dence (µ2) lowers the recognition performances on IBswhile the correct classification rate for NIBs correspond-ingly increases. The most discriminative measure appearsto be µ3 since it is capable of high recognition rates for bothIBs and NIBs.

In Fig. 3 an example of the trend of the proposed classifi-cation measure µ3 for samples of Guard-intruder and Meet-ing interactions can be seen. In Fig. 3(a) (Guard-intruderbehavior) the interaction is correctly recognized from thebeginning since it is quite distinctive compared to otherconsidered behaviors. Run-walk and Meet-and-leave mod-els produce low classification values while other interac-tion models have near-zero vales. In Fig. 3(b) a notablecase is shown: a meeting interaction is given as input to theproposed classification framework; since in the first part ofthe interaction (the two persons are approaching each other)two of the considered behaviors are quite similar (Meetingand Meet-and-leave) the system cannot fully discriminatebetween those behaviors. Only from t = 29, in the sec-ond part of the interaction (the two entities are leaving to-gether the scene), the Meet-and-leave interaction model isdiscarded and the correct behavior is identified.

6. ConclusionsIn this paper a Bayesian framework for human inter-

action recognition has been presented. The proposed ap-proach correctly recognizes different typologies of inter-actions by analyzing the trajectories of entities perceivedin the environment. The impact of the coarseness of thetopological partitioning in terms of correct classification ofinteractions has been investigated. Different accumulativemeasures for real-time classification of evolving interac-tions have been defined and evaluated. Future work willbe devoted to add to the proposed system more interactionmodels.

References[1] A. Damasio. The feeling of what happens-body, emotion and

the rise of consciousness. Harvest Books, 2000. 3[2] A. Dore and C. Regazzoni. Bayesian bio-inspired model

for learning interactive trajectories. Proceedings of Inter-national Conference on Advanced Video and Signal-basedSurveillance (AVSS09), September 2009. 1, 2

[3] S. Hongeng, R. Nevatia, and F. Bremond. Video-based eventrecognition: Activity representation and probabilistic recog-

(a)

µ1 200 500 1000 2500DBN W 53.1% 50.7% 52.9% 60.4%DBN G/I 88.8% 90.1% 91.1% 92.5%DBN M 72.7% 83.3% 87.7% 92.7%

DBN M/L 72.9% 82.1% 88% 93.4%DBN R 37.5% 56.5% 62.3% 71.7%

DBN R/W 54.8% 71.5% 75% 88.1%Average 63.3% 72.4% 76.17% 83.13%

(b)

µ2 200 500 1000 2500DBN W 54.3% 41.9% 58.7% 64.5%DBN G/I 85.8% 86.7% 86.3% 87.1%DBN M 70.9% 72.6% 72.8% 80.9%

DBN M/L 63.8% 80.4% 82.5% 87.4%DBN R 48.5% 41.3% 46% 55.6%

DBN R/W 51.9% 63.9% 65.4% 83.2%Average 62.5% 64.5% 68.6% 76.5%

(c)

µ3 200 500 1000 2500DBN W 69.4% 78.7% 80.7% 85.2%DBN G/I 95.9% 98.7% 99.7% 99.3%DBN M 75.2% 87.8% 83.1% 86.3%

DBN M/L 72.4% 84.9% 91.8% 91.7%DBN R 74.7% 81.8% 86.6% 83.9%

DBN R/W 75.1% 89.9% 83.8% 85%Average 77.1% 87% 87.6% 88.6%

Table 2. Correct recognition rate of the considered behaviors us-ing different accumulative metrics (µ1 (a), µ2 (b) and µ3 (c)) anddifferent training sets (nT = 200, 500, 1000, 2500)

W G/I M M/L R R/WW 85.2% 0.4% 6.2% 2% 0.6% 0.6%G/I 6.9% 99.3% 1.8% 1.3% 10.9% 11.3%M 1.6% 0.3% 86.3% 5% 0% 0%

M/L 4.4% 0% 6% 91.7% 0% 0%R 0.9% 0% 0% 0% 83.9% 3.1%

R/W 1.9% 0% 0% 0% 4.6% 85%Total 100% 100% 100% 100% 100% 100%

Table 3. Confusion matrix of classification results (µ3, τ =400, nT = 2500)

nition methods. Computer Vision and Image Understanding,96(2):129–162, 2004. 1

[4] W. Hu, X. Xiao, Z. Fu, D. Xie, T. Tan, and S. Maybank.A system for learning statistical motion patterns. IEEETransactions on Pattern Analysis and Machine Intelligence,28:1450–1464, 2006. 1

[5] Y. Ivanov and A. Bobick. Recognition of multi-agent inter-action in video surveillance. Computer Vision, 1999. TheProceedings of the Seventh IEEE International Conference

(a) Guard-intruder interaction

(b) Meeting interaction

Figure 3. Example of the proposed accumulative classificationmeasure µ3 for different input interactions

on, 1:169 –176 vol.1, 1999. 1[6] J. Jockhusch and H. Ritter. An instantaneous topological

map for correlated stimuli. Proceedings of Int. Joint Con-ference on Neural Networks, pages 529–534, 1999. 2

[7] T. Kohonen. Self-organized formation of topologically cor-rect feature maps. Biological Cybernetics, 3, Jan 1982. 2

[8] T. Martinetz and K. J. Schulten. Topology representing net-works. Neural Networks, 7:507–522, 1994. 2

[9] S. Park and J. K. Aggarwal. Simultaneous tracking of mul-tiple body parts of interacting persons. Computer Vision andImage Understanding, 102:1–12, 2006. 1

[10] M. Siracusa and J. Fisher. Interaction analysis using switch-ing structured autoregressive models. Signals, Systems andComputers, 2008 42nd Asilomar Conference on, pages 827–832, 26-29 2008. 1

[11] M. Valera and S. Velastin. Intelligent distributed surveillancesystems: a review. IEE Proceedings - Vision Image and Sig-nal Processing, 152:192–204, April 2005. 1

[12] E. Zelniker, S. Gong, and T. Xiang. Global abnormal be-haviour detection using a network of cctv cameras. The 8thInternational Workshop on Visual Surveillance (VS2008),2008. 1