Real-time reconfiguration of PTZ camera networks using motion field entropy and visual coverage

Real-time reconfiguration of PTZ camera networks usingmotion field entropy and visual coverage

Krishna Reddy KondaDepartment of Engineering and Computer

Science - DISIUniversity of Trento

Trento, [email protected]

Nicola ConciDepartment of Engineering and Computer

Science - DISIUniversity of Trento

Trento, [email protected]

ABSTRACTIn this paper we propose a novel dynamic camera reconfigu-ration algorithm for multi-camera networks. The algorithmrelies on the analysis of the entropy of the acquired scene.Based on the obtained values it assigns the cameras to act asglobal or target sensors. While global sensors aim at guar-anteeing the overall coverage of the observed space, targetsensors focus on moving objects. To this aim, the entropyinformation is used differently according to the operationmode of each camera. The visual entropy of the capturedimage is used to control the configuration of the global sen-sors, whereas the disorder of the motion field is adopted asa metric to drive the reconfiguration of the target sensors.In order to achieve real-time operational capabilities, the al-gorithm works in the compressed video domain, rather thanthe traditional pixel domain. The algorithm has been testedin vitro, and also by deploying a camera network in a realindoor surveillance environment. In particular, reconfigura-tion capability of the system has been tested with respect tothe presence of people moving in the observed environment.

Categories and Subject DescriptorsDistributed camera networks [compressed domain]: re-configuration

General TermsAlgorithms,Systems

KeywordsSelf-reconfiguring camera networks,Distributed computer vi-sion,Smart camera and network architectures

1. INTRODUCTIONThe advancement in camera production technology and

the increased sophistication of the devices, significantly con-

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Request permissions from [email protected] ’14 November 04-07 2014, Venezia Mestre, ItalyCopyright 2014 ACM 978-1-4503-2925-5/14/11 $15.00.http://dx.doi.org/10.1145/2659021.2659051 .

tributed to the diffusion of pan-tilt-zoom (PTZ) cameras, of-ten replacing, or complementing the camera networks, usu-ally consisting of ordinary static cameras. In fact, the capa-bility of repositioning the sensors to satisfy specific coveragerequirements, tremendously increases the flexibility of thenetwork. The freedom to change the camera pan and tiltalso after the physical deployment, may also help in sim-plifying the topology of the network, since with a reducednumber of devices capable of being reconfigured, it is possi-ble to satisfy the requirements in coverage that would oth-erwise imply the use of a large number of static cameras toperform the same task. PTZ cameras can also be utilized todesign an intelligent distributed smart camera network, inwhich information is shared between the cameras in order toperform collective tasks, including reconfiguration, but alsodetection and tracking. These aspects do not only increasethe monitoring capabilities of the network, but they alsocontribute to a better observation of the events that takeplace in the area, since a reconfigurable camera system canbe utilized to track moving objects by continuously chang-ing the camera parameters according to the rules defined bythe system architecture.

For the reasons mentioned above, cameras reconfigurationis an active and relevant area of research. In this paper wepresent a cooperative and distributed camera reconfigura-tion algorithm for PTZ camera networks, using motion fieldentropy and visual coverage as the metrics to be optimized.In fact, while motion field entropy can be used to repre-sent the status variations of the monitored environment, itis however important to guarantee the maximum visual cov-erage of the space. Furthermore, camera reconfiguration al-gorithms require to be as reactive as possible, reducing thecomputational burden to analyze the video, and satisfyingreal-time operation. In order to achieve this, we base ouralgorithm on the H.264 [1] stream, directly provided by thecamera, and carry out the analysis in the compressed stream,instead of acting in the traditional pixel domain. To reducethe computational load required by the algorithm, we utilizesome existing features available in the H.264 bitstream, soas to skip the video decoding and video feature extraction.

2. RELATED WORKResearch in camera reconfiguration is in a nascent stage,

Micheloni et al. summarized the current state of the researchin [2]. In general, camera reconfiguration is performed withrespect to a specific task. One of the earliest works to con-sider PTZ cameras is [3]. In this work PTZ cameras were

specifically used for tracking. In another instance Quaritschet al. [4] adopt reconfiguration to achieve better trackingover multiple cameras. Scotti et al. [5] utilize PTZ cam-eras along with omnidirectional cameras in order to achievetracking of objects at higher resolution. Another work com-bining omnidirectional and PTZ cameras for tracking is [6].Here the authors approach the problem analyzing the spa-tial correlation, in order to map the targets across two typesof cameras. Similarly, Picarelli et al. [7] apply reconfigu-ration to avoid occlusions, which may occur in presence ofchanges in the environment. Karuppiah et al. [8] proposea smart camera reconfiguration algorithm exploiting the apriori knowledge of floor plans, and drive the reconfigurationprocess based on the changes that take place over time. An-other work, which specifically deals with the reconfigurationof PTZ cameras is presented in [9], but in this case the modelof the camera is fixed and does not well apply to the case ofPTZ cameras. One of the few works to approach the prob-lem of PTZ camera reconfiguration in a general sense ratherthan in an application specific scenario, is [10]. The authorspropose a decentralized algorithm for reconfiguration basedon game theory. Another similar work is presented in [11].

We have observed that a common deficiency of all theabove algorithms is that they often do not consider zoom asa reconfigurable parameter. An early work that considersalso zoom as a parameter for reconfiguration is presented byKonda and Conci [12], where the authors use a virtual 2Dmodel to perform reconfiguration. However, although thealgorithm has a limited complexity, and is suitable for realtime reconfiguration, it requires a prior set up to detect thechange in the observed targets. The authors have furtherextended their model in 3D [13] so as to match with max-imum accuracy the virtual and real environments. Also inthis case, the method suffers the same disadvantages of itspredecessor.

It is also worth noting that another important limitationin the state of the art is in the fact that most algorithmsaddress the issue of reconfiguration as a separate problemfrom change detection. In fact, ideally, a reconfigurationalgorithm should also include a methodology for change de-tection to trigger the reconfiguration when needed. Thisis one of the major aspects that we include in the currentpaper, by proposing a low-complexity reconfiguration algo-rithm, which also includes the capability of detecting thechanges in the environment. Another very important fact,which is often overlooked in smart camera systems, is theoverall complexity of the system.

3. CONTRIBUTIONIn order to optimize the reconfiguration of the camera

system we define a metric capable of measuring the amountand extent of information present in the video. To this aimwe propose to base our analysis on the H.264 bitstream thatis being directly provided by the cameras.

The motivation for carrying out the analysis in the com-pressed domain, is mainly in the requirement of speedingup the reconfiguration time. In fact, by operating in thecompressed domain, we remove the overhead related to thedecoding of the video and also the traditional feature ex-traction methods, necessary to compute the motion field inthe acquired stream, and that may be computationally de-manding. Our algorithm ensures that the reconfigurationprocess is completed with minimum delays, thereby provid-

ing maximum visibility on the events that take place in theenvironment. Furthermore, the proposed solution requiresonly a modest bandwidth achieving good performances atbit rates ranging from 500 Kbps to 1 Mbps.

Owing to the low complexity of the algorithm, it can bedirectly embedded at the camera node using any low powerprocessor. Such an arrangement gives us the flexibility totreat each camera as an active element of a cooperative cam-era network and propose a distributed algorithm for reconfig-uration, unlike most of the state of the art solutions, whichuse centralized information collection and decision making.This setup would be highly immune to systemic failures,as the system can continue monitoring the area even inpresence of malfunctions. Distributed processing also savesbandwidth, paving the way for a totally customizable andconfigurable deployment using low bandwidth wireless pro-tocols for data exchange.

4. MOTION DESCRIPTORS

Figure 1: Motion vectors extracted from a frameof the standard video sequence Stefan. The red ar-rows highlight the regions in which the motion fieldexhibits strong disorder.

In order to measure and monitor the movement of the ob-jects in the camera view, we propose a descriptor based onthe disorder, or entropy, of the motion vectors of the video.The standard for video coding H.264, as most of its predeces-sors, achieves compression through a block-based algorithm,where blocks have variable size from 4× 4 to 16× 16 pixels[1]. Motion vectors are calculated for individual blocks inorder to remove the temporal redundancy of the video. Thedistribution of the motion vectors throughout the framesgives us a very accurate insight about the analytics of thevideo. The distribution of these motion vectors tends toexhibit more disorder whenever there is any moving objectin the video frame; this is not noticeable, for example, inpresence of uniform global motion associated with cameramovement, in case the camera is static with no moving ob-ject in the frame. An example is shown in Figure 1, andrepresents a frame in a video sequence in presence of a mov-ing camera; motion vectors are overlaid on the picture. Ascan be seen, the motion vectors show a coherent behavioralong most of the video frame, as it is expected in case ofmoving camera. However, the motion vectors distribution

at the edges of the moving object tend to have higher disor-der. We propose to utilize this aspect in order to measurethe amount of information in the video frame.

4.1 Motion entropy measureAs mentioned in the previous section, we choose to oper-

ate in compression domain to achieve real time operationalcapabilities. Motion vectors are chosen as the main featuresfor analysis, as they are immune to changes in bitrate andquantization parameters (QP) of the encoded H.264 videostream. The disorder in the motion field represents the in-formation content in the video. In H.264, standard motionvectors are computed at 4× 4 and the block size is based onthe observed variance. Each motion vector consists of twocomponents representing distances in pixels along X and Ydirection from the best match found in the reference frame.In this context we represent the pixel difference along X andY as MVx(i, j) and MVy(i, j), respectively, where i and jrepresent the location of a 4 × 4 block in the video frame.After reading the motion vectors from the H.264 stream, wegroup MVx(i, j) and MVy(i, j) into a 8 × 8 matrix, there-fore each of these blocks represents the motion vectors of aregion corresponding to an area of 32× 32 pixels. On thesesuper-blocks the 8× 8 DCT transform is performed accord-ing to Eqs. (1) and (2). After the transform, each blockdescribes the motion pattern of the 32 × 32 pixel region inX and Y directions, respectively, which becomes our motiondescriptor. In the equation (c, d) represent the block loca-tion of 32× 32 in the frame, (a, b) represent the location ofthe 4× 4 block within the 32× 32 block.

The choice for a block size of 32×32 pixels is made to en-sure minimum variability of motion vectors which occurs inthe case of 16× 16 mode in H.264 bit stream. The obtainedresult is a 2D DCT transform of 8× 8 blocks of motion vec-tors. Inferring from the properties of the DCT transform

we can notice that DC values MD(c,d)x (0, 0), MD

(c,d)y (0, 0)

represent the localized global motion and AC coefficientsrepresent the variation in motion vectors. The frequencyof variation increases as we move towards the bottom-rightcorner. We propose to accumulate the AC coefficients toarrive at a measure of motion disorder. However, higher fre-quencies represent more disorder in comparison to the lowerones, hence the accumulation has to be done in a weightedmanner. This is exactly the opposite of what happens inimage and video compressions, where lower frequencies areusually more important. Therefore, we calculate the entropyvalues along X and Y as EX and EY from the equations Eq.(3) and Eq. 4, respectively. The unified entropy measure isgiven by Eq. (5)

EX(c, d) =7∑

a=0

7∑b=0

MD(c,d)x (a, b) ∗ [2a−8 + 2b−8] (3)

EY (c, d) =

7∑a=0

7∑b=0

MD(c,d)y (a, b) ∗ [2a−8 + 2b−8] (4)

EU (c, d) =√E2

X + E2Y (5)

The aggregated entropy gives us a generalized measureof information present in the video frame. This measure is

independent of the number of objects and patterns of motionin the scene.

4.2 Object detection and segmentation

Figure 2: Moving object segmentation achieved us-ing on video frames taken from the iLids dataset.

The expressions for EX and EY described in the previousparagraph correspond to the extent of disorder for MVx(i, j)and MVy(i, j), respectively. Since we have defined a quan-titative measure for the disorder, we now have to identifythe blocks, which exhibit high EX and EY . Initially, bothEX and EY are computed to obtain a frame level metricfor disorder as shown in Eq. (6). Then, Algorithm. 1 isapplied. The algorithm iteratively checks the values for EX

and EY in each block against the threshold, which variesfrom 100% to 50% of their respective mean values. If bothconditions are met, the block is identified as a contour blockand its contribution (EX + EY ) is accumulated. The algo-rithm is terminated when the ratio between the disorder ofthe contour region and GXY reaches the value K, or in casethe adaptive threshold goes below 0.5. In this way only theblocks with significant motion along X and Y are identified.The value of K is a user-defined parameter, and determinesthe extent of contour around te moving object. High valueswill result in extended contours around the moving objects,while low values will shrink the thickness of the contouraround the object. This parameter is data dependent andshould be adjusted to fit the scenario requirements. Resultsof segmentation achieved by algorithm are shown in Figure2. The videos used for testing are selected from the iLidsdataset [14], which represents a typical surveillance scenario.

GXY =

Width32∑c=0

Height32∑

d=0

[EX(c, d) + EY (c, d)] (6)

To further refine the extracted information, a 3 × 3 ma-jority filter is applied across the whole frame; all the blockshaving at least 4 neighbouring blocks labelled as showing ahigh level of disorder, are classified as part of the movingobjects.

MD(c,d)x (a, b) = [

1

4

7∑a=0

7∑b=0

MVx[(c− 1) ∗ 8 + a, (d− 1) + b] ∗ cos(2a+ 1) ∗ π

16∗ cos

(2b+ 1) ∗ π16

] (1)

MD(c,d)y (a, b) = [

1

4

7∑a=0

7∑b=0

MVy[(c− 1) ∗ 8 + a, (d− 1) + b] ∗ cos(2a+ 1) ∗ π

16∗ cos

(2b+ 1) ∗ π16

] (2)

input : Entropy measures EX and EY

input : Global disorder measure GXY

input : Segmentation measure Koutput: Contour region CR

EX ; % Variation metric for MVx

EY ; % Variation metric for MVy

GXY ; % Combined measure of disorder for thewhole frame

CR = φ ; % Union of contour blocks

C = 1 ; % GradientBuffer = 0 ; % buffer variablewhile Buffer <= K ∗GXY do

for i← 1 to width32

dofor j ← 1to Height

32do

if EX(i, j) > C ∗ mean(EX)&&EY (i, j) >C ∗ mean(EY )&&Buffer <= K ∗GXY then

CR = CR⋃

Region (i,j) ;Buffer = Buffer + EX (i,j)+EX (i,j) ;

endend

endC = C− 0.1 ;if C <= 0.5 then

Break ;end

endReturn CR;

Algorithm 1: Motion segmentation.

4.3 Area metricThe absolute value of motion entropy described above

works very well for target monitoring when the targets arefar from the camera. However, if the target is close to thecamera, it is likely that it will cover a large portion of theimage plane, thus decreasing the automatic observability ofthe event, mainly due to over segmentation, and makingthe image less suitable for feature extraction and analysis.Furthermore, the entropy metrics obtained for such situa-tions are considerably high, leading to improper handling ofcamera reconfiguration. In order avoid such situations wepropose to combine the motion entropy metric with the in-formation about the area occupied by the moving object (orset of objects), in order to obtain a more balanced metricas a basis for reconfiguration. The objects observed by acamera are defined as the largest bounding box in the videoframe that include all the detected and segmented objectsusing the method proposed in the previous section. Thisrectangle is then projected onto the real environment usingthe camera model and its current parameters. After the areahas been obtained, it is normalized and combined with theentropy metric, which is used for camera reconfiguration in

Figure 3: Segmented set of objects projected ontothe environment using the camera model and cur-rent parameters.

target mode (see Eq. (7)).

TA = W ∗B (7)

W and B in the equation are calculated as shown in Fig-ure. 3

5. FORMULATIONIn this section we describe the core of the algorithm. In

order to efficiently handle and monitor the given surveillancescenario, we propose to operate the cameras in two modes,namely target and global mode. During operation, camerasin the network switch between the two modes based on theinformation coverage metrics. The detailed description ofeach mode is given in following subsection.

5.1 Camera modes

Global Mode.The primary function of the camera in this mode is to

ensure maximum coverage and visibility on the scene. Allthe cameras in this mode are utilized to maintain at leastthe minimum amount of coverage as specified by the user.The total number of cameras in this mode can never bezero. These cameras also perform the role of scouting forany moving object detection. A given camera can switchfrom global mode to target mode only if certain conditionsspecified by the algorithm are met. One of the conditions tobe met is that the coverage provided by the residual camerasin the network is greater than or equal to global coveragerequirements.

Figure 4: Various stages in transition of cameras from global to target mode.

Target Mode.In this mode the primary function of the camera is to ex-

tract as much information as possible of to the object or setof objects, to which it has been assigned. While the objectsare moving,target mode camera changes its PTZ parametersto attain the best possible view of the targets. The camerawill switch back to global mode once the target moves outof range of the camera field of view; at this point the globalcamera in the network, which has the highest informationmetric for that target will then switch to target mode tocontinue monitoring the moving object.

Figure 5: Target mode operation of the camera.

Monitoring is achieved by continuously adjusting the pan,tilt, and zoom of the camera. This process continues untilthe camera is assigned either to another object or switchesback to global mode. The principle criterion for reconfigu-ration is to maintain the midpoint of the defined rectanglealong the camera axis (through pan and tilt). Another ob-jective is to ensure that the height of the rectangle occupiesaround 60-80 percent of the video frame height (throughzooming).

Since the rectangle may consist of a set of objects poten-tially showing rapid transitions in size, the reconfigurationis carried out according to the average width height of therectangle over a time window. This ensures that the cameradoes not track random motion in short bursts. Figure. 5

5.2 Camera network and operation

5.2.1 ArchitectureLet us assume there are N cameras in the camera net-

work that is to be deployed in the given environment. Eachcamera acts as an independent node in a mesh network andcan communicate with any other camera using a predefinedprotocol. Initially the best configurations for various combi-nations of cameras varying from K to N are calculated ac-cording to the positioning algorithm proposed in [13]. The

total number of configurations is given by:

Nc = NCK + NCK+1 + NCK+2............+NCN (8)

where K is the minimum number of cameras, which areto be maintained in global mode, either owing to visual cov-erage requirements or user specification. All these configu-rations are stored in each of the camera node. When thesystem is turned on all the cameras initially set into globalmode. Now, let there an be a moving object i detected bya camera; its physical world location is given by

OX,Y,Z = P (x, y) ∗ T (θ, φ, f) (9)

where (X,Y, Z) represents the physical world location of ob-ject (x, y) is the pixel location as seen by the camera. Thetransformation T (θ, φ, f) is based on pan, tilt, and zoom ofthe camera at that instant and the pinhole camera modelpresented in [13]. If there are more objects in view, thelargest rectangle which encompasses all the objects is taken.From the spread of objects in the environment, the areametric in Eq. 7 is calculated and combined with the unifiedentropy as defined by Eq. (6) according to:

Mi = 1− exp[−T iA ∗GXY ] (10)

Successively, the camera transmits the object location andthe combined metric to all other cameras. In order to re-move noise and also due to the moving nature of the objects,location and entropy are calculated as a moving average overa period of time. The transmission interval is application-dependent. Each camera then compares the combined met-ric Mi of the other cameras observing the same object orset of objects. The camera with the highest entropy willswitch to target mode and will be assigned to that particu-lar target. The switching only happens if the total number ofglobal mode cameras in the network is greater than K. Afterswitching, the remaining cameras in global mode switch tothe corresponding configuration referring to that particularcombination. An example is shown in Figure. 4. As can beseen from the figure, all three cameras are initially in globalmode (represented in blue); once the objects appear in thescene, cameras are classified into groups based on the visibil-ity of the objects. In the second stage, the combined measuredefined earlier is compared across two cameras with respectto object visibility. This transition stage is represented inyellow. Finally, in the third stage, cameras with high metricare assigned to respective targets, while the others revert to

input : Object locations OL across Camera networkinput : Combined Camera Measures for all cameras

Cmode

input : Minimum Global Cameras GCmin

output: Camera modes Cmode

CMC ; % Combined Camera Metric using Area TA andEU

OL ; % Object Locations using pinhole Camera ModelOL

Nobj ; % Number of object sets across camera networkNobj

CSobj = φ ; % Set of Cameras per object

Cmode = Global ∀ Ncameras ; % All cameras areinitially in global mode

TC =0 ; % Cameras in Target Modefor i← 1 to Nobj do

for j ← 1 to Ncameras doif Visibility(i, j) == 1 then

CSobj(i) = CSobj(i, :)⋃

j ;end

endendfor i← 1 to Nobj do

DescendingSort(CSobj(i, :), CMC) ;endfor i← 1 to Nobj do

while Ncameras − TC >= GCmin doCmode (CSobj(i, 1))= Target;TC =TC +1;

endend

Return Cmode;

Algorithm 2: Stage transition of cameras from globalto target and vice versa.

global mode. The transition stage is repeated at equal timeintervals in order to verify the assignments. The technicalaspects of this transition are shown in Algorithm. 2. Whileassigning the targets algorithm assumes equal importancefor all the targets, irrespective of their location unless spec-ified by the user. The Algorithm is repeated at the end ofthe time interval as set by the user.

6. TESTING AND RESULTS

Figure 6: Global mode configurations of Camera 1and Camera 2.

6.1 Implementation and testing scenario

Figure 7: Path followed by people with respect im-ages shown in Figures. 8 and 9.

In order to test the algorithm, we have deployed a cameranetwork composed of two PTZ cameras. The cameras wehave selected are Sony EP521, because of their wide opti-cal zoom (36x). Cameras have been deployed in the uni-versity wired network and accessed via IP address. Thevideo stream obtained from the camera has a resolution of720 × 576 pixels and a frame rate of 25 frames per second.The H.264 bit stream obtained from the camera is encodedin the baseline profile. In order to access the NAL packetsfrom the camera we have used the functions available in theffmpeg library[15]. Motion field entropy calculation and ob-ject segmentation are accomplished using the motion vectorsextracted from the H.264 (JM 18.6 version) [16] decoder. Inorder to control the camera automatically the curl libraryfunctions [17] are adopted. The whole set up is implementedon an Intel i5 processor, 3.10 GHz. In terms of complexityalgorithm requires 5.2K, 16K, 48K, 106K computations perframe for CIF, VGA, HD, full HD resolutions, respectively,which is negligible when compared against the video encodercomplexity. Hence the proposed algorithm can be seamlesslydeployed in a video encoder embedded in the camera. Cam-eras are deployed in a long corridor of about 15m in lengthand 3m wide in the university building. Cameras are de-ployed in the best positions in accordance with the algorithmproposed in [13] and this constitutes global mode configu-ration of the camera. Evaluation is performed by observingthe change in configuration of the cameras with respect tomovement of the people in the corridor. Observed changesin configuration are compared with expected behaviour asdefined by the proposed algorithm. In this particular de-ployment, minimum number of global cameras is fixed atone and target camera tracks the objects in such a mannerthat they occupy 70 percent of the video frame height.

6.2 Evaluation of entropy metricIn order to evaluate the metric based on the motion field

disorder, we plot the variation of the entropy for both cam-eras, in presence of motion of people across the deployedcorridor from one end to another end (see Figure. 10). Ascan be seen, the entropy metric oscillates in the range 20-40with a random nature, whenever there is no motion. How-ever, as a person moves by the camera, a noticeable increasein entropy is visible. For example, as the person moves bycamera 1 (marked in Red) there is sudden increase in entropyand camera 1 switches to target mode. After the target hasmoved away from camera, entropy gradually reduces, andswitches back to global mode. We can notice a similar be-haviour from camera 2 (in blue), which takes over for cameraone, when the target comes closer, thus switching to targetmode.

Figure 8: Camera 1 switches to target mode; as the target moves out of range, it transits back to globalmode.

Figure 9: Camera 2 switches to target mode; as the target moves out of range, it transits back to globalmode.

0 500 1000 15000

200

400

600

800

1000

1200

1400

Frame Number

Ent

ropy

Camera1 Camera2

Figure 10: Variation of entropy metric with move-ment of people.

6.3 Algorithm evaluationIn order to evaluate the algorithm we observe the camera

behaviour in light of a predefined movement of a movingobject. As a part of this the person walks from one end of

the corridor to the other and then back to the initial startingpoint. The path followed by people and the point at whichthe reconfigurations happen are shown in Figure. 7

We observe the state transitions during these process. Ini-tially, in Figure. 6 the global mode of both cameras is pre-sented. The series of images where camera 1 transits fromglobal mode and then follows the moving object in targetmode, are shown in Figure. 8 A the target moves out ofrange of the camera, it reconfigures itself back to globalmode. Similarly also camera 2 follows the same routine asthe target approaches it and moves out of range (Figure. 9).On the whole, the setup performs satisfactorily with respectto reconfiguration. The only limitation we have experiencedis in the robustness to rapid changes in the reconfiguration,since the total time required for repositioning the sensorsis about 3s due to the network delay issues. This could beovercome by deploying the algorithm in the H.264 encoderembedded in the camera.

7. CONCLUSIONSIn this paper we have proposed a novel distributed recon-

figuration algorithm for video camera networks, which worksin real time and with limited bandwidth requirements. Todrive the reconfiguration of the sensors, we have proposeda metric based on the motion field entropy, which can beentirely derived from the compressed video bit stream. Thealgorithm is based on the paradigm, for which cameras canoperate in two modes depending on different environmental

conditions and requirements, thereby achieving smart allo-cation of resources across the network. The experimentalvalidation has been carried out by an actual deploymentof cameras in a real scenario and validating the responseof the cameras to the movement of people in their natu-ral behaviour. The real time operational capability of thealgorithm has also been demonstrated.

8. REFERENCES[1] Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard,

and Ajay Luthra, “Overview of the h. 264/avc videocoding standard,” Circuits and Systems for VideoTechnology, IEEE Transactions on, vol. 13, no. 7, pp.560–576, 2003.

[2] Christian Micheloni, Bernhard Rinner, and Gian LucaForesti, “Video analysis in pan-tilt-zoom cameranetworks,” Signal Processing Magazine, IEEE, vol. 27,no. 5, pp. 78–90, 2010.

[3] Don Murray and Anup Basu, “Motion tracking withan active camera,” Pattern Analysis and MachineIntelligence, IEEE Transactions on, vol. 16, no. 5, pp.449–459, 1994.

[4] Markus Quaritsch, Markus Kreuzthaler, BernhardRinner, Horst Bischof, and Bernhard Strobl,“Autonomous multicamera tracking on embeddedsmart cameras,” EURASIP Journal on EmbeddedSystems, vol. 2007, no. 1, pp. 35–35, 2007.

[5] G Scotti, L Marcenaro, C Coelho, F Selvaggi, andCS Regazzoni, “Dual camera intelligent sensor for highdefinition 360 degrees surveillance,” IEEProceedings-Vision, Image and Signal Processing, vol.152, no. 2, pp. 250–257, 2005.

[6] Chung-Hao Chen, Yi Yao, David Page, Besma Abidi,Andreas Koschan, and Mongi Abidi, “Heterogeneousfusion of omnidirectional and ptz cameras for multipleobject tracking,” Circuits and Systems for VideoTechnology, IEEE Transactions on, vol. 18, no. 8, pp.1052–1063, 2008.

[7] Claudio Piciarelli, Christian Micheloni, and Gian LucaForesti, “Occlusion-aware multiple camerareconfiguration,” in Proceedings of the FourthACM/IEEE International Conference on DistributedSmart Cameras. ACM, 2010, pp. 88–94.

[8] Deepak Karuppiah, Roderic Grupen, Allen Hanson,and Edward Riseman, “Smart resource reconfigurationby exploiting dynamics in perceptual tasks,” inIntelligent Robots and Systems, 2005.(IROS 2005).2005 IEEE/RSJ International Conference on. IEEE,2005, pp. 1513–1519.

[9] Claudio Piciarelli, Christian Micheloni, and Gian LucaForesti, “Ptz camera network reconfiguration,” inDistributed Smart Cameras, 2009. ICDSC 2009. ThirdACM/IEEE International Conference on. IEEE, 2009,pp. 1–7.

[10] Bi Song, Cristian Soto, Amit K Roy-Chowdhury, andJay A Farrell, “Decentralized camera network controlusing game theory,” in Distributed Smart Cameras,2008. ICDSC 2008. Second ACM/IEEE InternationalConference on. IEEE, 2008, pp. 1–8.

[11] Chong Ding, Bi Song, Akshay Morye, Jay A Farrell,and Amit K Roy-Chowdhury, “Collaborative sensingin a distributed ptz camera network,” Image

Processing, IEEE Transactions on, vol. 21, no. 7, pp.3282–3295, 2012.

[12] Krishna Konda Reddy and Nicola Conci, “Global andlocal coverage maximization in multi-camera networksby stochastic optimization,” InfocommunicationsJournal, vol. 5, no. 1, pp. 1–8, 2013.

[13] Krishna Reddy Konda and Nicola Conci, “Optimalconfiguration of ptz camera networks based on visualquality assessment and coverage maximization,” inDistributed Smart Cameras (ICDSC), 2013 SeventhInternational Conference on. IEEE, 2013.

[14] i LIDS Team, “Imagery library for intelligent detectionsystems (i-lids); a standard for testing video baseddetection systems,” in Carnahan Conferences SecurityTechnology, Proceedings 2006 40th Annual IEEEInternational, Oct 2006, pp. 75–80.

[15] Open source multiple contributions., “trans standardmultimedia framework for media manipulation,”March 2014, http://www.ffmpeg.org/.

[16] HHI., “H.264 reference decoder from heinrich hertzinstitute,” January 2014,http://iphome.hhi.de/suehring/tml/.

[17] Open source multiple contributions., “command linetool for transferring data with url syntax,” March2014, http://curl.haxx.se/.

Real-time reconfiguration of PTZ camera networks using motion field entropy and visual coverage

Documents

Transcript of Real-time reconfiguration of PTZ camera networks using motion field entropy and visual coverage