Privacy Protection in a Video Surveillance System

8
170 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 2, FEBRUARY 2011 Privacy Protection in Video Surveillance Systems: Analysis of Subband-Adaptive Scrambling in JPEG XR Hosik Sohn, Wesley De Neve, and Yong Man Ro, Senior Member, IEEE Abstract —This paper discusses a privacy-protected video surveillance system that makes use of JPEG extended range (JPEG XR). JPEG XR offers a low-complexity solution for the scalable coding of high-resolution images. To address privacy con- cerns, face regions are detected and scrambled in the transform domain, taking into account the quality and spatial scalability features of JPEG XR. Experiments were conducted to investigate the performance of our surveillance system, considering visual distortion, bit stream overhead, and security aspects. Our results demonstrate that subband-adaptive scrambling is able to conceal privacy-sensitive face regions with a feasible level of protection. In addition, our results show that subband-adaptive scrambling of face regions outperforms subband-adaptive scrambling of frames in terms of coding efficiency, except when low video bit rates are in use. Index Terms—JPEG XR, ROI, scrambling, video surveillance. I. Introduction P RESENT-DAY video surveillance systems are often re- quired not to intrude upon the privacy of the general public. This paper proposes a privacy-protected video surveil- lance system using the JPEG extended range (JPEG XR) coding format [1]. JPEG XR allows for the scalable coding of still images—it was approved as an International Standard (formally designated as ISO/IEC 29199-2) and also as an ITU-T recommendation (formally designated as T.832) in the course of 2009. To mitigate privacy concerns [2], [3], our video surveillance system is able to detect and protect face regions. Face regions, which are considered privacy sensitive, are protected using different scrambling techniques operating in the transform domain. The proposed scrambling strategy takes into account the scalability provisions of JPEG XR and preserves format compliance. The research presented in this paper is an extension and an improvement of the research discussed in [4]. In [4], we provide an initial analysis of region of interest-based Manuscript received November 8, 2009; revised June 1, 2010; accepted September 19, 2010. Date of publication January 17, 2011; date of current version March 2, 2011. This work was supported by the National Research Foundation (NRF) of Korea, under Grant NRF-D00070. This paper was recommended by Associate Editor Q. Sun. The authors are with the Image and Video Systems Laboratory, De- partment of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2011.2106250 (ROI-based) scrambling for scalable surveillance video content coded using JPEG XR. In particular, two issues were discussed in [4]: first, a subband-adaptive scrambling strategy that takes into account the scalability provisions of JPEG XR, and second, the file size overhead caused by the use of tiling. In this paper, we provide a more detailed analysis of these issues, providing additional background information that better motivates our design decisions. We also propose an improved scrambling technique for DC subbands that offers a higher level of protection. Further, we address the need for scrambling of the high pass subbands, something not considered in [4]. Finally, the analysis presented in this paper uses representative surveillance video content. The rest of this paper is organized as follows. In Sec- tion II, we briefly review JPEG XR. Section III outlines the proposed video surveillance system. In addition, we explain our subband-adaptive technique for scrambling surveillance video content encoded with JPEG XR. Experimental results are presented in Section IV, illustrating the performance of subband-adaptive scrambling for both complete frames and ROIs. Finally, Section V concludes this paper. II. Image Coding Using JPEG XR Intra coding is often used in video surveillance systems as it allows for low-complexity and low-delay implementations. Also, intra coding is beneficial when CPU power is more expensive than bandwidth and storage. These considerations are important when designing cost-effective video surveillance systems that are required to process a high number of simul- taneous video streams with a high spatial resolution [5]. The cameras in our surveillance system encode each video frame using JPEG XR. The main technical benefit of using JPEG XR as an intra video codec can be found in its low computational complexity, while offering image quality and scalability provisions that are, from a practical point of view, similar to that of motion JPEG 2000 and the scalable high intra profile of H.264/AVC scalable video coding (SVC) [6]. Due to place constraints, we would like to refer the reader to references for further information regarding the coding tools of JPEG XR, and in particular its transform design [1], provisions for ROI coding [4], and scalability tools [7], [8]. ROI coding allows for selective scrambling, implying that non- authorized users are still able to recognize the background and 1051-8215/$26.00 c 2011 IEEE

Transcript of Privacy Protection in a Video Surveillance System

170 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 2, FEBRUARY 2011

Privacy Protection in Video Surveillance Systems:Analysis of Subband-Adaptive Scrambling

in JPEG XRHosik Sohn, Wesley De Neve, and Yong Man Ro, Senior Member, IEEE

Abstract—This paper discusses a privacy-protected videosurveillance system that makes use of JPEG extended range(JPEG XR). JPEG XR offers a low-complexity solution for thescalable coding of high-resolution images. To address privacy con-cerns, face regions are detected and scrambled in the transformdomain, taking into account the quality and spatial scalabilityfeatures of JPEG XR. Experiments were conducted to investigatethe performance of our surveillance system, considering visualdistortion, bit stream overhead, and security aspects. Our resultsdemonstrate that subband-adaptive scrambling is able to concealprivacy-sensitive face regions with a feasible level of protection. Inaddition, our results show that subband-adaptive scrambling offace regions outperforms subband-adaptive scrambling of framesin terms of coding efficiency, except when low video bit rates arein use.

Index Terms—JPEG XR, ROI, scrambling, video surveillance.

I. Introduction

P RESENT-DAY video surveillance systems are often re-quired not to intrude upon the privacy of the general

public. This paper proposes a privacy-protected video surveil-lance system using the JPEG extended range (JPEG XR)coding format [1]. JPEG XR allows for the scalable codingof still images—it was approved as an International Standard(formally designated as ISO/IEC 29199-2) and also as anITU-T recommendation (formally designated as T.832) in thecourse of 2009. To mitigate privacy concerns [2], [3], ourvideo surveillance system is able to detect and protect faceregions. Face regions, which are considered privacy sensitive,are protected using different scrambling techniques operatingin the transform domain. The proposed scrambling strategytakes into account the scalability provisions of JPEG XR andpreserves format compliance.

The research presented in this paper is an extension andan improvement of the research discussed in [4]. In [4],we provide an initial analysis of region of interest-based

Manuscript received November 8, 2009; revised June 1, 2010; acceptedSeptember 19, 2010. Date of publication January 17, 2011; date of currentversion March 2, 2011. This work was supported by the National ResearchFoundation (NRF) of Korea, under Grant NRF-D00070. This paper wasrecommended by Associate Editor Q. Sun.

The authors are with the Image and Video Systems Laboratory, De-partment of Electrical Engineering, Korea Advanced Institute of Scienceand Technology, Daejeon 305-701, Korea (e-mail: [email protected];[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2011.2106250

(ROI-based) scrambling for scalable surveillance video contentcoded using JPEG XR. In particular, two issues were discussedin [4]: first, a subband-adaptive scrambling strategy that takesinto account the scalability provisions of JPEG XR, andsecond, the file size overhead caused by the use of tiling.In this paper, we provide a more detailed analysis of theseissues, providing additional background information that bettermotivates our design decisions. We also propose an improvedscrambling technique for DC subbands that offers a higherlevel of protection. Further, we address the need for scramblingof the high pass subbands, something not considered in [4].Finally, the analysis presented in this paper uses representativesurveillance video content.

The rest of this paper is organized as follows. In Sec-tion II, we briefly review JPEG XR. Section III outlines theproposed video surveillance system. In addition, we explainour subband-adaptive technique for scrambling surveillancevideo content encoded with JPEG XR. Experimental resultsare presented in Section IV, illustrating the performance ofsubband-adaptive scrambling for both complete frames andROIs. Finally, Section V concludes this paper.

II. Image Coding Using JPEG XR

Intra coding is often used in video surveillance systems asit allows for low-complexity and low-delay implementations.Also, intra coding is beneficial when CPU power is moreexpensive than bandwidth and storage. These considerationsare important when designing cost-effective video surveillancesystems that are required to process a high number of simul-taneous video streams with a high spatial resolution [5].

The cameras in our surveillance system encode each videoframe using JPEG XR. The main technical benefit of usingJPEG XR as an intra video codec can be found in its lowcomputational complexity, while offering image quality andscalability provisions that are, from a practical point of view,similar to that of motion JPEG 2000 and the scalable highintra profile of H.264/AVC scalable video coding (SVC) [6].

Due to place constraints, we would like to refer the readerto references for further information regarding the codingtools of JPEG XR, and in particular its transform design [1],provisions for ROI coding [4], and scalability tools [7], [8].ROI coding allows for selective scrambling, implying that non-authorized users are still able to recognize the background and

1051-8215/$26.00 c© 2011 IEEE

SOHAN et al.: PRIVACY PROTECTION IN VIDEO SURVEILLANCE SYSTEMS 171

Fig. 1. Overall architecture of the proposed video surveillance system.

event context. Scalability enables video surveillance systemsto monitor public and private places anytime and anywhere,using a wide range of network technologies and devices.

III. Enabling Privacy Protection in JPEG XR

In this section, we first describe the architectural details ofour surveillance system. Next, we discuss subband-adaptivescrambling for video content encoded with JPEG XR.

A. Proposed Video Surveillance System

1) System Architecture and Application Scenarios: Asshown in Fig. 1, our surveillance system consists of multipleIP cameras that are connected to a central management andstorage server (CMSS). The IP cameras and the CMSS areconnected by means of a wired LAN that has a speed of1 Gb/s. In addition, video analysis and encoding are performedby the IP cameras. Specifically, video analysis is responsiblefor detecting face regions, while encoding is responsiblefor compressing the video data using our privacy-enabledJPEG XR encoder (see Section III-B).

Our surveillance system is able to accommodate clientsthat use diverse network connections and devices. Threeapplication scenarios can be defined to reflect this diversity.In a high-complexity scenario, we assume that a desktop PCcommunicates with the CMSS by means of a wired LAN thathas a speed of 100 Mb/s. In a medium-complexity scenario, weassume that a laptop communicates with the CMSS by meansof a 70 Mb/s WiMAX network. Finally, in a low-complexityscenario, we assume that a smartphone communicates withthe CMSS by means of a 3GPP network that has a speed of7.2 Mb/s. The scalability tools of JPEG XR can be used tocreate adaptive video content.

2) Modified JPEG XR Encoder and Decoder: Fig. 2(a) il-lustrates the design of our modified JPEG XR encoder. Beforeperforming entropy coding, different scrambling techniquesare applied to the transform coefficients in the DC, low pass(LP), and high pass (HP) subbands, respectively (see thegray-shaded boxes). A detailed explanation of the scramblingtechniques employed is provided in Section III-B.

When ROI-based scrambling is in use, the location of eachface region is communicated by the face detection module toour modified JPEG XR encoder (we assume face detectionis accurate). The modified JPEG XR encoder is then ableto construct an appropriate tile layout, making it possible toonly scramble MBs located in face regions. Note that pseudo-random numbers are generated by relying on a secret key that

Fig. 2. Modified JPEG XR encoder and decoder. (a) Encoder. (b) Decoder.

is used as a seed value. At the side of the decoder, only anauthorized user (i.e., a user who knows the secret key) is ableto revert the scrambling process during decoding. The designof our modified JPEG XR decoder is shown in Fig. 2(b). Amore detailed description of key management can be found in[9].

B. Subband-Adaptive Scrambling

We use a subband-adaptive approach to scramble surveil-lance video content encoded with JPEG XR. This approachis motivated by the following observation: when scramblinga particular subband, a tradeoff exists between the visualimportance of the subband, the available amount of coded datain the subband, the visual distortion and the level of protectionoffered by the scrambling technique used, and the effect oncoding efficiency. Similar to [10], we only focus on scramblingluminance channels in order to keep the impact on the codingefficiency and the computational complexity low. However, itshould be straightforward to extend the proposed scramblingstrategy to chrominance channels.

1) DC Subbands: Random Level Shift: In a DC subband,a limited amount of data is available for the purpose ofscrambling. Indeed, each MB in a tile only contributes asingle DC coefficient to the DC subband of that particulartile. Therefore, we propose to make use of random level shift(RLS) in order to scramble DC subbands with a sufficient levelof protection. RLS pseudo-randomly shifts the level of a DCcoefficient value as follows:

DCcoeff e = DCcoeff + R(L) (1)

where DCcoeff denotes the data to be scrambled and whereDCcoeff e denotes the pseudo-randomly level-shifted data.R(L) represents a pseudo-randomly generated number whoserange is from −2L−1 to 2L−1. RLS comes with a decreasein coding efficiency due to the pseudo-random offset addedto each DC coefficient value. Hence, to avoid a significantloss of coding efficiency, it is necessary to select a propervalue for L. This selection process is discussed in more detail

172 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 2, FEBRUARY 2011

in Section IV-A1. In [4], we applied RSI to the sign of DCcoefficients while we pseudo-randomly flipped the refinementbits of DC coefficients. Although this approach does not affectthe coding efficiency, it provides a lower level of securitydue to the limited amount of coded data that is available forscrambling.

2) LP Subbands: Random Permutation: An LP subbandis visually less important than a DC subband, but visuallymore important than an HP subband. Also, an LP subbandcontains more transform coefficients than a DC subband, butless transform coefficients than an HP subband. Therefore, wepropose to apply random permutation (RP [11], [12]) to thetransform coefficients stored in an LP subband (RP is appliedat the level of a MB). RP pseudo-randomly permutes theordering of LP coefficients in a MB as follows:

LPcoeff ei = LPcoeffj where i = 1, ..., C j = x1, ..., xC (2)

where LPcoeffj denotes the jth LP coefficient in a MBand where LPcoeff e

i denotes the ith LP coefficient in thepseudo-randomly permuted macroblock. In (2), C representsthe number of LP coefficients in a MB (i.e., C is equal to 15)and x1, . . . , xC represent non-overlapping random numbers,ranging from 1 to C. RP offers a higher level of protectionthan RLS as RP allows for a higher number of possiblecombinations. However, RP comes with a higher decrease incoding efficiency as this approach affects the effectiveness ofentropy coding more significantly (see Section IV-A2).

3) HP Subbands: Random Sign Inversion: An HP subbandis visually less important than a DC and LP subband. However,this subband still contains visually important information thatcould for instance be exploited by face recognition techniques.The image shown in Fig. 8(b) was decoded by only usingluminance information from the HP subbands of the “ATM”image shown in Fig. 8(a), illustrating that the high frequencyinformation in the HP subbands describes edges that mayreveal the silhouette of a face. Consequently, we proposeto apply random sign inversion (RSI [11], [12]) to the HPtransform coefficients. RSI pseudo-randomly flips the sign ofeach coefficient as follows:

HPcoeff e =

{−HPcoeff, if r = 1

+HPcoeff, otherwise(3)

where HPcoeff denotes the coefficients to be scrambled andwhere HPcoeff e denotes the pseudo-randomly sign-flippedcoefficients. In (3), r denotes a 1-bit pseudo-random number.Since the sign information of HP coefficients is signaled usinga Boolean flag, the coding efficiency is not affected.

4) Flexbits Subbands: No Scrambling: We propose notto scramble the Flexbits subbands. These subbands containthe lower order bits of the HP transform coefficients. Hence,the information provided by the Flexbits subbands is only oflimited visual importance. Moreover, the amount of coded datain the Flexbits subband significantly decreases as the bit ratedecreases (see Table I).

Fig. 3. Bit stream overhead when using RLS for scrambling DC subbands(video sequence used: “ATM”).

IV. Experimental Results

We have implemented the proposed scrambling approach inthe HD photo device porting kit 1.0 [13]. As no surveillancevideo content is publicly available with a sufficiently highresolution, we have manually generated three representativevideo sequences: “ATM,” “Stairs,” and “Hall way” [14]. Allvideo sequences contain face images (see Fig. 10), and have4CIF resolution, a frame rate of 30 frames/s, and a length of200 frames. Overhead figures were obtained by averaging theoverhead over all 200 video frames.

This section discusses four experiments. Section IV-A in-vestigates the bit stream overhead and the visual effectivenessof subband-adaptive scrambling in a frame-based setting, foreach individual subband (as discussed in Section IV-D, anadversary may separately attack each subband). Section IV-B describes the effect of tiling on the coding efficiency.Section IV-C analyzes subband-adaptive scrambling in anROI-based setting. Finally, Section IV-D discusses the levelof protection offered by subband-adaptive scrambling.

A. Analysis of Frame-Based, Subband-Adaptive Scrambling

1) Scrambled DC Subbands: Fig. 3 shows the impactof RLS on the coding efficiency of the DC subbands for avarying bit rate. The selection of an appropriate value for L(see Section III-B1) was done as follows: we first measuredthe average bit stream overhead for all video sequences forvarying values of L and then selected the maximum value of Lthat produces less than 30% of bit stream overhead. Althoughthe same value of L can be utilized for the whole range of bitrates, the amount of overhead significantly increases as the bitrate decreases (see the bit stream overhead results when L = 8in Fig. 3). Hence, this observation motivated us to make thevalue of L dependent on the bit rate. Although the use of asmall value of L results in a lower level of protection at lowerbit rates, the DC video frames at lower bit rates are alreadyseverely distorted due to significant quantization.

Fig. 3 shows that RLS causes a significant decrease incoding efficiency. However, the amount of coded data in theDC subbands is relatively small compared to the amount ofcoded data in the other subbands. As such, the overhead atthe level of the DC subband does not significantly decreasethe overall coding efficiency. For instance, scrambling DCsubbands results in an increase of the overall bit rate of 0.6%at the highest bit rate and in an increase of the overall bit rateof 11.7% at the lowest bit rate (the increase in overhead for the

SOHAN et al.: PRIVACY PROTECTION IN VIDEO SURVEILLANCE SYSTEMS 173

Fig. 4. Visual effect of RLS when scrambling DC subbands. (a) Original DCimage of “ATM.” (b) RLS (L = 6). (c) RLS (L = 8) (images were croppedand magnified for visualization purposes).

Fig. 5. Bit stream overhead when using RP for scrambling LP subbands(video sequence used: “ATM”).

Fig. 6. Visual effect of scrambling LP subbands. (a) Original DC+LP imageof “ATM.” (b) Original LP subband of “ATM” (luminance only). (c) RSI on(b). (d) RP on (b) (images were cropped and magnified for visualizationpurposes).

DC subbands only is, respectively, equal to 12.8% and 24.5%in Fig. 3). Fig. 4(b) and (c) shows DC images scrambled withRLS when L = 6 and L = 8, respectively. The images illustratethat a larger value of L leads to more severe distortion.

2) Scrambled LP Subbands: Fig. 5 shows the bit streamoverhead when RP is applied to LP subbands. The overheadbecomes higher as the bit rate of the bit streams decreases.This implies that the effect of scrambling on the codingefficiency is more significant at lower bit rates. The signal-ing overhead also increases as the bit rate decreases. RSI(see Section III-B3) is also a feasible candidate techniquefor scrambling LP subbands since this technique does notproduce any bit stream overhead. However, RSI comes witha lower level of protection compared to RP. Since a MB onlycontributes 15 coefficients to an LP subband, the use of RSIresults in 215 combinations per MB, while the use of RP resultsin 240 (15!) combinations per MB. Fig. 6 illustrates the visualeffect of RSI and RP on the LP subbands, showing that theuse of RP leads to more visual distortion than the use of RSI.

3) Scrambled HP Subbands: Fig. 7 shows the bit streamoverhead when RP is applied to HP subbands. The HPsubbands contain significantly more coefficient informationthan the other subbands (see, for example, the bit rates of theDC and LP subbands in Figs. 3 and 5, respectively). Hence,even a slight change of the coefficients may significantly affectthe effectiveness of entropy coding, resulting in a significant

Fig. 7. Bit stream overhead when using RP for scrambling HP subbands(video sequence used: “ATM”).

Fig. 8. Visual effect of scrambling HP subbands. (a) Original DC+LP+HPimage of “ATM.” (b) Original HP subband of “ATM” (luminance only). (c) RPon (b). (d) RSI on (b) (images were cropped and magnified for visualizationpurposes, contrast and brightness were enhanced).

decrease of the coding efficiency. Although the use of RPresults in a significant number of combinations (240!) for anHP subband, it produces a prohibitive amount of overhead (asshown in Fig. 7). Further, the overhead becomes higher as thebit rate decreases.

Fig. 8 illustrates the visual effect of RP and RSI on the HPsubbands. The use of RP results in more visual distortion thanthe use of RSI. However, as RP causes a significant amountof bit stream overhead, it is more appropriate to scramble HPsubbands using RSI.

4) Unified Scrambling Strategy: The scrambling tech-niques discussed in the previous sections can be combinedinto a single, subband-adaptive scrambling strategy: RLS forscrambling DC subbands, RP for scrambling LP subbands,and RSI for scrambling HP subbands. Table I shows the bitstream overhead when applying subband-adaptive scramblingto complete frames. Results are only shown for a few selectedquantization parameter (QP) values, resulting in video bitrates that are in line with the bandwidth capabilities of aconventional IP camera: 1 Mb/s, 5 Mb/s, and 10 Mb/s [5].In Table I, the labels “I” and “IS” represent the bit rates ofthe original and the scrambled video sequences, respectively.Further, the label “O” denotes the overall increase in bit streamoverhead caused by scrambling the original video sequences(hereinafter, the same labels are used in all other tables).

The overhead caused by frame-based, subband-adaptivescrambling can be attributed to the use of RLS and RP.Specifically, the bit stream overhead produced by RLS andRP becomes higher as the bit rate decreases since the effectof scrambling on the coding efficiency is more significantat lower bit rates. Fig. 9 allows for a subjective assessmentof the effectiveness of our scrambling strategy, visualizingscrambled versions of a representative image from “ATM,”each time decoded using a different number of subbands. Thevisual distortion caused by scrambling is sufficiently strongto conceal the identity of the subject shown in the original

174 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 2, FEBRUARY 2011

TABLE I

Overall Bit Stream Overhead Caused By Frame-Based, Subband-Adaptive Scrambling

All Subbands DC+LP+HP DC+LP DCQP I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%)

ATM20 8837 9070 2.6 8814 9047 2.6 3396 3628 6.9 684 828 20.935 4656 4948 6.3 4656 4948 6.3 2330 2622 12.5 569 751 32.180 819 961 17.2 819 961 17.2 599 740 23.6 263 346 31.3

Stairs35 8452 8673 2.6 8378 8599 2.6 2794 3015 7.9 559 692 23.755 4602 4777 3.8 4602 4777 3.8 1873 2049 9.4 432 532 23.1100 855 890 4.0 855 890 4.0 592 627 5.8 228 269 18.4

Hall Way25 8743 9015 3.1 8609 8881 3.2 3236 3508 8.4 665 805 21.045 4632 4913 6.1 4632 4913 6.1 1995 2276 14.1 517 667 29.195 760 876 15.3 760 876 15.3 531 647 22.0 221 284 28.9

Fig. 9. Visual effect of frame-based, subband-adaptive scrambling.(a) Scrambled version of Fig. 4(a). (b) Scrambled version of Fig. 6(a).(c) Scrambled version of Fig. 8(a) (images were used without cropping).

images. Similar observations were also made for “Stairs” and“Hall way.”

B. Analysis of Tiling

This section investigates the effect of tiling on the codingefficiency. Table II shows the bit stream overhead for uniformtile layouts with a varying tile size. The label “1 × 1 MB” forexample refers to the use of a uniform tile layout consistingof 44 × 36 tiles at 4CIF resolution. As shown in Table II, thecombined use of a small tile size and a uniform tile layoutmay significantly decrease the coding efficiency. This can beattributed to a broken entropy coding and an increasing numberof tile headers and index table entries. Also, the bit streamoverhead becomes higher as the bandwidth decreases since thesame syntax structures are used to signal a lower amount ofcoded image data. The effect of tiling on the coding efficiencyis also lower in the higher frequency subbands.

Table III shows the bit stream overhead for several non-uniform tile layouts. The size of each tile is interactivelydetermined according to the location of the facial regionsin the surveillance video content. In summary, our resultsdemonstrate that it is necessary to make use of a non-uniformtile layout to avoid the significant amount of overhead thatcomes with the use of fine-grained and uniform tiling.

C. Analysis of ROI-Based, Subband-Adaptive Scrambling

Fig. 10 illustrates that the face regions in “ATM,” “Stairs,”and “Hall way” are sufficiently concealed by our scram-bling strategy, protecting the identity of the subjects shown.

Fig. 10. Visual effect of ROI-based, subband-adaptive scrambling.(a) “ATM” (all subbands). (b) “ATM” (DC+LP+HP). (c) “ATM” (DC+LP).(d) “Stairs” (all subbands). (e) “Stairs” (DC+LP+HP). (f) “Stairs” (DC+LP).(g) “Hall way” (all subbands). (h) “Hall way” (DC+LP+HP). (i) “Hall way”(DC+LP).

Table IV shows the bit stream overhead when using ROI-based, subband-adaptive scrambling (where the ROI is rep-resented by a non-uniform tile layout). Compared to frame-based scrambling (see Table I), ROI-based scrambling mostlyallows for a lower bit stream overhead. However, at the lowestbit rate, the bit stream overhead of ROI-based scrambling ishigher than the bit stream overhead of frame-based scramblingfor “Stairs” and “Hall way.” The results in Tables III and IVshow that this can be attributed to the use of tiling.

D. Security Considerations

This section analyzes the level of protection offered by theproposed scrambling technique against a brute force attack.For one MB, the use of RLS at the level of the DC coefficientsresults in 2L+1 possible combinations, the use of RP at thelevel of the LP coefficients results in 15!/(15 − K)! possiblecombinations, and the use of RSI at the level of the HPcoefficients results in 2M possible combinations. Note that L

SOHAN et al.: PRIVACY PROTECTION IN VIDEO SURVEILLANCE SYSTEMS 175

TABLE II

Bit Stream Overhead Caused by Uniform Tiling (Video Sequence Used: ‘‘ATM’’)

All Subbands DC+LP+HP DC+LP DCQP I (kb/s) IS (kb/s) O (%) I (kb/s) IS(kb/s) O (%) I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%)

1 × 1 MB20 8837 14 612 65.4 8814 14 620 65.9 3396 8593 153.1 684 3856 463.535 4656 11 163 139.8 4656 11 163 139.8 2330 8309 256.7 569 3856 578.180 819 8667 957.7 819 8667 957.7 599 8052 1244.5 263 3856 1365.0

2 × 2 MB20 8837 11 226 27.0 8814 11 161 26.6 3396 5493 61.8 684 1810 164.535 4656 7606 63.4 4656 7595 63.1 2330 5001 114.7 569 1779 212.980 819 4744 478.9 819 4744 478.9 599 4353 1553.7 263 1724 554.9

6 × 6 MB20 8837 9169 3.8 8814 9139 3.7 3396 3704 9.1 684 816 19.335 4656 5111 9.8 4656 5102 9.6 2330 2691 15.5 569 713 25.480 819 1325 61.7 819 1325 61.7 599 1055 76.1 263 448 70.2

TABLE III

Bit Stream Overhead Caused by Non-Uniform Tiling

All Subbands DC+LP+HP DC+LP DCQP I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%)

ATM: 9 tiles (3 × 3 non-uniform tile layout)20 8837 8846 0.1 8814 8824 0.1 3396 3449 1.6 684 708 3.535 4656 4734 1.7 4656 4734 1.7 2330 2394 2.8 569 596 4.880 819 931 13.6 819 931 13.6 599 690 15.2 263 300 14.0

Stairs: 9 tiles (3 × 3 non-uniform tile layout)35 8452 8499 0.6 8378 8428 0.6 2794 2850 2.0 559 585.85 4.755 4602 4692 2.0 4602 4692 2.0 1873 1945 3.9 432 463.11 7.2100 855 963 12.6 855 963 12.6 592 686 15.8 228 264.75 16.4

Hall way: 15 tiles (5 × 3 non-uniform tile layout)25 8743 8868 1.4 8609 8760 1.8 3236 3367 4.1 665 714 7.445 4632 4793 3.5 4632 4793 3.5 1995 2146 7.6 517 571 10.695 760 948 24.8 760 948 24.8 531 699 31.7 221 288 30.6

TABLE IV

Bit Stream Overhead Caused by ROI-Based, Subband-Adaptive Scrambling

All Subbands DC+LP+HP DC+LP DCQP I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%) I (kb/s) IS (kb/s) O (%)

ATM20 8837 8867 0.3 8814 8844 0.3 3396 3469 2.2 684 714 4.335 4656 4757 2.2 4656 4757 2.2 2330 2417 3.7 569 605 6.380 819 945 15.3 819 945 15.3 599 704 17.5 263 302 14.8

Stairs35 8452 8500 0.6 8378 8428 0.6 2794 2851 2.0 559 586 4.755 4602 4693 2.0 4602 4693 2.0 1873 1946 3.9 432 463 7.2100 855 998 16.6 855 998 16.6 592 720 21.6 228 306 34.4

Hall Way25 8743 8870 1.5 8609 8762 1.8 3236 3370 4.1 665 715 7.545 4632 4796 3.5 4632 4796 3.5 1995 2148 7.7 517 572 10.795 760 950 25.0 760 950 25.0 531 700 32.0 221 288 30.6

TABLE V

Parameters Used for Scrambling ROIs (N , K , and M Were Obtained by Averaging Over 200 Frames)

ATM (N = 132.0) Stairs (N = 9.1) Hall way (N = 12.9 and 17.4)QP L K M QP L K M QP L ROI1 ROI2

K M K M20 8 10.9 35.1 35 7 5.2 28.8 25 8 13.5 61.7 11.9 41.235 8 8.6 19.1 55 5 3.6 16.3 45 7 11.9 35.1 9.7 23.280 3 2.5 4.2 100 1 1.0 2.2 95 2 3.7 5.4 2.3 3.7

176 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 2, FEBRUARY 2011

TABLE VI

Statistical Properties of the LP Coefficients in the Face

Regions Used (ME and VAR Denote the Mean and the Variance

of the LP Coefficients, Respectively)

ATM Stairs Hall WayQP ME VAR QP ME VAR QP ROI1 ROI2

ME VAR ME VAR20 −0.3 96.5 35 0.0 54.0 25 0.2 245.7 −0.6 156.035 −0.2 26.5 55 0.0 9.1 45 0.1 45.1 −0.3 28.780 0.0 0.5 100 0.0 0.2 95 0.0 0.6 0.0 0.4

denotes the parameter used by RLS in the DC subbands, Kdenotes the number of non-zero LP coefficients in a singleMB, and M denotes the number of non-zero HP coefficientsin a single MB. As such, when subbands are incrementallyattacked in the order of DC, LP, and HP, the total number ofcombinations required to break the complete protection of NMBs is equal to (2L+1)N+(15!/(15 − K)!)N+(2M)N .

Note that, when offering a maximum level of security,the use of RP at the level of the LP coefficients results in15! possible combinations. However, our computation of thenumber of possible combinations of LP coefficients reflects thefollowing observations. First, when low bit rates are in use,the major factor that affects the security level is the number ofzero-valued LP coefficients (as most coefficients become zerothen). Indeed, the presence of repeated zero coefficients in anLP subband reduces the number of combinations that needs tobe tested since an adversary can exploit knowledge about thenumber of the zero coefficients in an LP subband during de-coding. Second, when high bit rates are in use, the probabilityof the presence of repeated identical non-zero coefficients islow. Hence, for simplicity of computation, we assume that allnon-zero LP coefficient values in a single MB are different.Similar observations hold true for the HP coefficients.

Table V shows the parameters used by RLS, RP, and RSIfor the face regions used in our experiments. At the lowest bitrate (when the QP value of “ATM,” “Stairs,” and “Hall way” is80, 100, and 95, respectively), for each face region consistingof N MBs, the total number of combinations that needs tobe tested is approximately equal to 3.7 × 10453, 5.0 × 1010,1.8 × 1058, and 2.6 × 1040 (for simplicity of calculation,the K values in Table V were rounded-off). Further, weobserved that decoding and descrambling of DC subbandsrequires about 1.9 ms on a quad-core 2.0 GHz processor.Consequently, the time needed to generate all possible faceregions is approximately equal to 2.3 × 10443, 3.0, 1.1 × 1048,and 1.5 × 1030 years for each case. These numbers show thatROI-based, subband-adaptive scrambling provides a feasiblelevel of protection against a brute force attack.

In practice, attacking the DC and LP subbands may alreadybe sufficient to reveal the identity of a subject [see, forexample, Fig. 6(a)]. In this context, the total number ofcombinations required to break the protection of N MBs isreduced to (2L+1)N+(15!/(15-K)!)N . As such, at the lowestbit rate, the number of combinations required to break theprotection of the DC and LP subbands of the face regionsdescribed in Table V is equal to 3.7 × 10453, 5.0 × 1010,1.8 × 1058, and 2.6 × 1040, corresponding to approximately

2.3 × 10443, 3.0, 1.1 × 1048, and 1.5 × 1030 years needed togenerate all possible face images. It should be clear that theabsence of HP subbands does not affect the overall securitylevel since the number of combinations required to break theprotection of the HP subbands is significantly smaller thanthe number of combinations required to break the protectionof both the DC and LP subbands.

Further, as the bit rate decreases, the security level ofthe LP subbands decreases. Indeed, most LP coefficientsbecome zero then, thus lowering the effectiveness of RP(see Table VI). In this case, an adversary may attack theLP subbands by making use of an error-concealment attack,simply setting the scrambled LP coefficients to zero [10].However, as this error-concealment attack comes down toonly decoding the DC subbands, we believe that such anapproach will, in most cases, be unsuccessful in revealingthe identity of a subject [see Fig. 4(a)]. Moreover, the useof lower bit rates also results in DC images with a lowervisual quality, thus further hampering face recognition efforts.Therefore, we believe that, in general, an adversary needs toconduct a brute-force attack to at least both the DC and LPsubbands in order to achieve visual results that are meaningfulfor the purpose of face recognition. This could for instancebe investigated in more detail in a user study.

V. Conclusion

We discussed a subband-adaptive approach for scramblingprivacy-sensitive face regions in JPEG XR-encoded surveil-lance video content. Our subband-adaptive approach is theresult of a tradeoff between the visual importance of subbands,the amount of coded data in the subbands, the level of securityoffered by a particular scrambling technique, the effect ofscrambling on the coding efficiency, and the computationalcomplexity of the scrambling technique used.

Experimental results were obtained for representativesurveillance video content, having 4CIF resolution and aframe rate of 30 frames/s. Our results show that subband-adaptive scrambling is able to conceal privacy-sensitive faceregions with a feasible level of protection. However, thecombined use of scrambling and tiling lowers the codingefficiency as the video bit rate decreases. Therefore, insteadof only scrambling privacy-sensitive face regions, scramblingthe whole image region may be more efficient from thepoint-of-view of coding efficiency. In particular, for twoout of the three video surveillance sequences used in ourexperiments (“Stairs” and “Hall way”), we observed thatframe-based, subband-adaptive scrambling resulted in alower bit stream overhead than ROI-based, subband-adaptivescrambling when a video bit rate of 1 Mb/s was in use (forall spatial resolutions used). In all other cases, subband-adaptive scrambling for ROIs outperformed subband-adaptivescrambling for complete frames in terms of coding efficiency.

References

[1] S. Srinivasan, C. Tu, S. L. Regunathan, and G. J. Sullivan, “HD photo:A new image coding technology for digital photography,” Proc. SPIE,vol. 6696, pp. 66960A.1–66960A.19, Aug. 2007.

SOHAN et al.: PRIVACY PROTECTION IN VIDEO SURVEILLANCE SYSTEMS 177

[2] H. Sohn, E. T. Anzaku, W. De Neve, Y. M. Ro, and K. N. Plataniotis,“Privacy protection in video surveillance systems using scalable videocoding,” in Proc. IEEE Int. Conf. Adv. Video Signal Based Surveillance,Sep. 2009, pp. 424–429.

[3] A. Cavallaro, “Privacy in video surveillance,” IEEE Signal Process.Mag., vol. 24, no. 2, pp. 166–168, Mar. 2007.

[4] H. Sohn, W. De Neve, and Y. M. Ro, “Region-of-interest scrambling forscalable surveillance video using JPEG XR,” in Proc. ACM Int. Conf.Multimedia, Oct. 2009, pp. 861–864.

[5] J. Honovich. (2009, Aug.). Security Manager’s Guide to Video Surveil-lance, Version 3.0, pp. 26–27 [Online]. Available: http://ipvideomarket.info

[6] T. D. Tran, L. Liu, and P. Topiwala, “Performance comparison ofleading image codecs: H.264/AVC intra, JPEG2000, and MicrosoftHD photo,” Proc. SPIE, vol. 6696, pp. 66960B.1–66960B.14, Oct.2007.

[7] C. Perra and D. Giusto, “An image browsing application based on JPEGXR,” in Proc. Int. Workshop CBMI, Jun. 2008, pp. 396–401.

[8] W. De Neve, D. Van Deursen, W. Van Lancker, Y. M. Ro, and R. Vande Walle, “Improved BSDL-based content adaptation for JPEG 2000and HD photo (JPEG XR),” Signal Process. Image Commun., vol. 24,no. 6, pp. 452–467, Jul. 2009.

[9] Y. G. Kim, S. H. Jin, and Y. M. Ro, “Scalable security and conditionalaccess control for multiple regions of interest in scalable video coding,”in Proc. Int. Workshop Digital Watermarking, LNCS 5041. Dec. 2008,pp. 71–86.

[10] F. Dufaux and T. Ebrahimi, “Scrambling for privacy protection invideo surveillance systems,” IEEE Trans. Circuits Syst. Video Technol.,vol. 18, no. 8, pp. 1168–1174, Aug. 2008.

[11] F. Dufaux and T. Ebrahimi, “H.264/AVC video scrambling for privacyprotection,” in Proc. IEEE ICIP, Oct. 2008, pp. 1688–1691.

[12] W. Zeng and S. Lei, “Efficient frequency domain video scrambling forcontent access control,” in Proc. ACM Int. Conf. Multimedia, Oct. 1999,pp. 285–294.

[13] HD Photo Device Porting Kit 1.0 [Online]. Available: http://www.microsoft.com/whdc/xps/hdphotodpk.mspx

[14] IVY Lab Surveillance Video Dataset [Online]. Available: http://ivylab.kaist.ac.kr/demo/vs/dataset.htm

Hosik Sohn received the B.S. degree from KoreaAerospace University, Goyang, Korea, in 2007, andthe M.S. degree from the Korea Advanced Insti-tute of Science and Technology (KAIST), Daejeon,Korea, in 2009. He is currently working toward thePh.D. degree from the Image and Video SystemsLaboratory, Department of Electrical Engineering,KAIST.

His current research interests include visual qualitymeasurement, video adaptation, multimedia security,bio-cryptography, scalable video coding, and JPEG

extended range.

Wesley De Neve received the M.S. degree in com-puter science and the Ph.D. degree in computerscience engineering from Ghent University, Ghent,Belgium, in 2002 and 2007, respectively.

He is currently a Senior Researcher with the Imageand Video Systems Laboratory (IVY Lab), in the po-sition of an Assistant Research Professor. IVY Labis part of the Department of Electrical Engineering,Korea Advanced Institute of Science and Technology(KAIST), Daejeon, Korea. Prior to joining KAIST,he was a Post-Doctoral Researcher with both Ghent

University-IBBT, Ghent, and the Information and Communications University,Daejeon. His current research interests and areas of publication include thecoding, annotation, and adaptation of image and video content, graphicsprocessing unit based video processing, efficient XML processing, and thesemantic and social web.

Yong Man Ro (M’92–SM’98) received the B.S.degree from Yonsei University, Seoul, Korea, and theM.S. and Ph.D. degrees from the Korea AdvancedInstitute of Science and Technology (KAIST),Daejeon, Korea.

In 1987, he was a Visiting Researcher withColumbia University, New York, NY, and from 1992to 1995 he was a Visiting Researcher with theUniversity of California, Irvine, and KAIST. He wasa Research Fellow with the University of California,Berkeley, and a Visiting Professor with the Uni-

versity of Toronto, Toronto, ON, Canada, in 1996 and 2007, respectively.He is currently a Full Professor with KAIST, where he is directing theImage and Video Systems Laboratory, Department of Electrical Engineering.He participated in the MPEG-7 and MPEG-21 international standardizationefforts, contributing to the definition of the MPEG-7 texture descriptor, theMPEG-21 digital item adaptation visual impairment descriptors, and modalityconversion. His current research interests include image/video processing,multimedia adaptation, visual data mining, image/video indexing, and multi-media security.

Dr. Ro received the Young Investigator Finalist Award of ISMRM in 1992and the Scientist Award in Korea in 2003. He served as a TPC member ofinternational conferences such as IWDW, WIAMIS, AIRS, and CCNC, andwas the Program Co-Chair of IWDW in 2004.