Transform domain distributed video coding with spatial correlations

11
Transform domain distributed video coding with spatial correlations M. B. Badem & W. A. C. Fernando & A. M. Kondoz Published online: 7 July 2009 # Springer Science + Business Media, LLC 2009 Abstract Distributed Video Coding is a new coding technique which has been evolving very fast recently. However the rate distortion performances of current solutions are below the expectations especially for high motion sequences even at a group of picture (GOP) size of 2. Main reason of this problem is the temporal prediction of the Wyner-Ziv (WZ) frames at the decoder. In this paper we propose a novel transform domain DVC codec architecture which splits each frame into two sub-frames and they are encoded separately as key sub- frame and WZ sub-frame. Pixel interpolation or median prediction techniques are utilized to generate the side information at the decoder. Simulation results show that a significant rate distortion improvement can be obtained with the proposed algorithm over the current DVC solutions. Keywords Distributed video coding . Wyner-Ziv coding 1 Introduction During the last decade video coding technologies have evolved significantly. These conventional approaches are designed mainly in need of one to many topologies such as TV broadcasting. In conventional video coding, encoder complexity can be very high and the decoder complexity is low so that it is suitable for one to many topologies. However, this concept is challenged by recently emerging applications which require more encoders than decoders. Wireless sensor networks, security surveillance cameras, remote monitoring applications use a widely spread encoder networks sharing a limited number of decoders. Multimed Tools Appl (2010) 48:369379 DOI 10.1007/s11042-009-0317-5 M. B. Badem (*) : W. A. C. Fernando : A. M. Kondoz Centre for Communication Systems Research, University of Surrey, Guildford GU2 7XH, UK e-mail: [email protected] W. A. C. Fernando e-mail: [email protected] A. M. Kondoz e-mail: [email protected]

Transcript of Transform domain distributed video coding with spatial correlations

Transform domain distributed video codingwith spatial correlations

M. B. Badem & W. A. C. Fernando & A. M. Kondoz

Published online: 7 July 2009# Springer Science + Business Media, LLC 2009

Abstract Distributed Video Coding is a new coding technique which has been evolvingvery fast recently. However the rate distortion performances of current solutions are belowthe expectations especially for high motion sequences even at a group of picture (GOP) sizeof 2. Main reason of this problem is the temporal prediction of the Wyner-Ziv (WZ) framesat the decoder. In this paper we propose a novel transform domain DVC codec architecturewhich splits each frame into two sub-frames and they are encoded separately as key sub-frame and WZ sub-frame. Pixel interpolation or median prediction techniques are utilized togenerate the side information at the decoder. Simulation results show that a significant ratedistortion improvement can be obtained with the proposed algorithm over the current DVCsolutions.

Keywords Distributed video coding .Wyner-Ziv coding

1 Introduction

During the last decade video coding technologies have evolved significantly. Theseconventional approaches are designed mainly in need of one to many topologies such as TVbroadcasting. In conventional video coding, encoder complexity can be very high and thedecoder complexity is low so that it is suitable for one to many topologies. However, thisconcept is challenged by recently emerging applications which require more encoders thandecoders. Wireless sensor networks, security surveillance cameras, remote monitoringapplications use a widely spread encoder networks sharing a limited number of decoders.

Multimed Tools Appl (2010) 48:369–379DOI 10.1007/s11042-009-0317-5

M. B. Badem (*) :W. A. C. Fernando :A. M. KondozCentre for Communication Systems Research, University of Surrey, Guildford GU2 7XH, UKe-mail: [email protected]

W. A. C. Fernandoe-mail: [email protected]

A. M. Kondoze-mail: [email protected]

Distributed Video Coding (DVC) has been proposed as an attractive solution for thegiven applications above which require low complexity encoders. In DVC, the traditionalbalance of the encoder-decoder complexity is reversed by moving the majority of theredundancy exploitation to the decoder side [1]. Therefore, DVC significantly reduces thecomplexity, power consumption and cost of the encoders.

Currently, DVC codec designs generate the side information (described in the nextsection) utilizing temporal interpolation at the decoder. For high motion sequences, the rate-distortion (RD) performance of current DVC solutions are therefore far below that ofconventional video coding techniques. The main reason of this problem is the reducedcorrelation between the frames of the sequence and the temporal prediction.

In this paper we propose a novel sub-frame DVC technique that is based on separatingeach frame of the sequences into two sub-frames and encoding them separately and at thedecoder generating the side information by applying spatial interpolation using thereference sub-frame. Rest of the paper is organized as follows; background and relatedwork is given in Section 2, proposed DVC technique is explained in Section 3, simulationresults are presented in Section 4, and finally Section 5 concludes the paper.

2 Background and related work

2.1 Distributed source coding

DVC is identified as the adaptation of the theoretical framework of distributed sourcecoding (DSC) set by the Slepian-Wolf theorem [17] and the Wyner-Ziv [22] theorem forvideo coding. DSC concept deviates from the conventional source coding paradigm in thecontext of the dependency of the encoding of statistically correlated sources. In theconventional approach, the statistically correlated sources are jointly encoded and jointlydecoded for the perfect reconstruction of the information stream at the decoder. DSC, incontrast, proposes to carryout independent encoding of statistically dependant sources, andyet be jointly decoded. The information theoretical limitations inherited by the so calledindependent encoding have been reported by Slepian and Wolf, as discussed in the nextsection.

2.1.1 Slepian-Wolf theorem

Assume that X and Y are two statistically dependant discrete random sequences which areindependently and identically distributed (i.i.d.). Consider the case that these sequences areseparately encoded with rates RX and RY, respectively, but are jointly decoded, exploitingthe correlation between them as illustrated in Fig. 1. Slepian and Wolf have presented ananalysis of the possible rate combinations of RX and RY for the reconstruction of X and Y

Fig. 1 Distributed coding of two statistically dependent discrete random sequences

370 Multimed Tools Appl (2010) 48:369–379

with an arbitrarily small error probability [17] as shown below, which is widely known asSlepian-Wolf theorem:

RX � H X=Yð Þ ð1Þ

RY � H Y=Xð Þ ð2Þ

RX þ RY � H X ; Yð Þ ð3Þwhere H(X |Y ) and H(Y |X ) are the conditional entropies and H(X ,Y ) is the joint entropyof X and Y.

Thus it is concluded that independent encoding of the statistically dependent does notimpose any theoretical loss of compression efficiency compared to the more establishedapproach of joint encoding which is practiced in the conventional video coding techniques.

2.1.2 Wyner-Ziv theorem

Wyner-Ziv theorem [22] refers to a particular case of the Slepian-Wolf theorem [17] whichis commonly known as the lossy compression scenario with side information at the decoder.This concept assumes a finite acceptable distortion level d, between the source informationX and the corresponding decoded output X `; hence the term lossy compression. Assumingthe information content Y is available at the decoder, which is statistically dependant to X,Wyner and Ziv attempted to quantify the minimum bit rate to be passed between theencoder and decoder, termed RWZ(d ) for achieving the finite distortion d between the inputand output. Wyner-Ziv theorem states that when the statistical dependency between X and Yis exploited only at the decoder, the transmission rate increases comparing to the case wherethe correlation is exploited both at the encoder and the decoder, for the same averagedistortion, d. The Wyner and Ziv theorem reads:

RWZðdÞ � RX=Y ðdÞ; d � 0 ð4ÞWhere RWZ(d ) is called the Wyner-Ziv minimum encoding rate (for X ) and RX|Y(d)

represents the minimum rate necessary to encode X when Y is simultaneously available atthe encoder and decoder for the same average distortion d. When d approaches zero, i.e.when no distortion exists, the Wyner-Ziv theorem falls back to the Slepian-Wolf result, i.e.RWZð0Þ ¼ RW=Kð0Þ. This means that it is possible to reconstruct the sequence W with anarbitrarily small error probability even when the correlation between W and the sideinformation is only exploited at the decoder.

2.2 Distributed video coding

2.2.1 Background work

There have been several approaches to build DSC techniques into video coding. The codecdesign criteria include: flexible and extremely low complexity of the encoder, optimumrate-distortion performance and operational efficiency suitable for diverse applicationsincluding real-time video communications and video storage. Several channel codingschemes including: Turbo coding [8], Turbo Trellis Coded Modulation (TTCM) [19, 20],

Multimed Tools Appl (2010) 48:369–379 371

Low-Density Parity-Check (LDPC) codes [16, 23] have been proposed to be used in DVCimplementations. Other distinct architectures have also been proposed for the same purpose,including the PRISM codec [15]. Another key selection option in designing the codec isoperating in either pixel domain or transform domain. Pixel domain DVC yields the lowestcomplexity for the encoder structure. However it has not exploited any spatial correlationswithin each frame. To solve this, Transform domain DVC was proposed where itcompromises low complex tools for improved rate-distortion performance for the codec.Discrete cosine transform (DCT) has been widely proposed to be utilized for the abovepurpose [9]. Wavelet transform and Integer transform are further variants proposed in theliterature for transform domain DVC codecs [12].

Using the above described distributed source coding concept as a guideline, several pixeldomain [1, 2, 4, 6, 7, 10, 11, 13, 14, 18, 21] and transform domain [3, 9, 12] DVCalgorithms for the coding of Wyner-Ziv frames have been proposed in the researchliterature. In this paper, the widely appreciated turbo coding based transform domain (DCT)DVC implementation is considered.

2.2.2 DVC codec architecture

Figure 2 shows a general block diagram of the turbo coding based transform domain DVCcodec. Main blocks are identified as: DCT, quantizer, turbo encoder, parity buffer and thekey frame encoding mechanism on the encoder side, and side information generator, DCT,turbo decoder, reconstructor, the key frame decoder and IDCT on the decoder side of thecodec. The transmission media between the encoder and the decoder involves a feedbackchannel for communicating the parity request messages dynamically.

At the encoder, frames of a video sequence are divided into key frames and WZ frames.Period of key frames (Group of Picture-GOP size) is generally set to 2. Key frames areconventionally intra-coded. WZ frames are discrete cosine transformed (DCT) in order toexploit spatial redundancies at the encoder [3, 9]. Then, WZ frames are quantized and turboencoded. Parity bits are stored in a buffer and transmitted in small chunks upon requestfrom the turbo decoder. At the decoder, side information is generated by using previouslydecoded key frames. Side information is DCT transformed and quantized to obtain a similarbitstream as at the encoder and fed into the turbo decoder and the reconstruction block. Theturbo decoder requests for more parity bits via the feedback channel until the frame is

Quantizer Turbo Encoder Buffer

Intraframe Encoder Intraframe Decoder

Side Info. Gen.

TurboDecoder

Recons-truction

Wyner-ZivFrames

KeyFrames

DecodedWyner-ZivFrames

Encoder Decoder

Feedback Channel

DCT DCT IDCT

DecodedKey

Frames

Fig. 2 Transform domain DVC codec architecture using turbo coding

372 Multimed Tools Appl (2010) 48:369–379

successfully decoded which is controlled by a request stopping criteria. Finally, decodedWZ frame is obtained after inverse discrete cosine (IDCT) transform.

3 Proposed technique

The architecture of the proposed codec is depicted in Fig. 3. In the proposed technique allframes are coded independently, so it is an intra-coding technique. At the encoded eachframe is split into two sub-frames, which are coded separately. One of the sub-frames, keysub-frame, is conventionally intra-coded. And the other sub-frame, WZ sub-frame, is codedas a Wyner-Ziv coded as in the conventional DVC. i.e DCT transformed, quantized, turboencoded and parity bits are stored in a buffer. At the decoder, decoded key sub-frame isused for obtaining the prediction of the WZ sub-frame, and the predicted sub-frame is alsoDCT transformed and fed into the turbo decoder and reconstruction block. Turbo decoder,decodes the WZ sub-frame by requesting parity bits from the encoded until is itsuccessfully decoded. Turbo decoded sub-frame is then reconstructed and inverse DCTtransformed. Finally decoded WZ sub-frames and key sub-frames are merged to obtain thedecoded frames X`n.

3.1 Frame splitting

In the proposed technique, input frames are divided into two sub-frames at the encoder.Odd lines are grouped together to form the key sub-frame and even lines are groupedtogether to form WZ sub-frame (Fig. 4). Key sub-frame and WZ sub-frame are stronglycorrelated; they are encoded separately and decoded jointly.

3.2 Side information generation

The proposed technique utilizes only spatial prediction to obtain the side information. Mainreason for this approach is the strong correlation between the two sub-frames. Once the firstsub-frame is decoded, the side information is generated from the decoded sub-frame usingspatial feature extraction techniques: pixel interpolation or median prediction.

Fig. 3 Proposed DVC codec architecture

Multimed Tools Appl (2010) 48:369–379 373

3.2.1 Median prediction

All neighbor pixels in odd lines and previously predicted pixels in even lines are consideredin median prediction. Let a, b, c, ..., g and x be the values of the pixels A, B, C, ..., G and Xshown in Fig. 5. x is the median value where all a, b, c, ..., g is available. However for theother cases x is defined as following;

x ¼

bþ fð Þ=2 case imedian a; b; c; d; e; f ; gð Þ case iimedian a; b; d; e; fð Þ case iiib case iv and vimedian a; b; cð Þ case v

8>>>><

>>>>:

ð5Þ

3.2.2 Pixel interpolation

In this technique, each pixel in the even lines is predicted as average of two neighboringpixels in the same column, one from the previous odd line and one from the next odd line(Fig. 6-i). However, the last line is directly copied from the previous odd line, since there isnot an odd line after (Fig. 6-ii).

x ¼ bþ fð Þ=2 case ib case ii

ð6Þ

Fig. 5 Neighboring pixels used to predict x—median prediction

Fig. 4 Splitting frame into two subframes

374 Multimed Tools Appl (2010) 48:369–379

4 Experimental results

The performance of the proposed DVC codec is tested for a number of QCIF (176×144)and 256×192 video sequences at 15 fps. In the experiments only the luminance componentis considered and DCT size is 4×4. Key sub-frames are intra coded using H.264/AVC andquantization parameters are selected to match the average PSNR of WZ sub-frames and keysub-frames. We have considered eight different quantization matrices, which are widelyused in transform domain DVC codecs, to obtain different rate-distortion points [9].

Figure 7 shows frame numbers 2, 10 and 46 of Soccer test sequence respectively for theproposed codec with pixel interpolation and the transform domain WZ (TDWZ) codec usedin [5] at the same bitrate (220 kbps). It can be observed that proposed codec has clearlyimproved the picture quality.

Figures 8 and 9 show the overall rate distortion performance compared to TDWZ [5] andH.264/AVC with IBI coding structure for the Soccer and Breakdancers (view 0) sequence.It is observed that the proposed technique increases the PSNR and requires less informationfrom the encoder. As seen in Fig. 8, there is an average gain up to 2 dB for the proposedpixel interpolation technique compared to TDWZ and the performance gap between theconventional video coding techniques and DVC is narrowed. When the proposed pixelinterpolation and median prediction techniques are compared, it is seen that the former

(a)

(b)

Fig. 7 Images for the Soccer test sequence: (a) TDWZ and (b) Proposed Pixel Interpolation (frame number2, 10 and 46 from left to right)

Fig. 6 Neighboring pixels used to predict x—pixel interpolation

Multimed Tools Appl (2010) 48:369–379 375

results slightly better in terms of rate distortion performance because the side informationgenerated by pixel interpolation is more accurate.

5 Conclusions

In this paper we have proposed a novel DVC codec exploiting spatial correlations at thedecoder. The proposed technique is an improved solution for especially high motionsequences, which is the case for some applications such as security surveillance and mobilecamera phones. In the proposed technique, frames are divided into two sub-frames; one isintra-coded using a conventional intra coding and the other one is WZ coded. At thedecoder, the prediction of the WZ sub-frame is generated by two different techniques:

Breakdancers 256x192 @ 15 fps

29

31

33

35

37

39

41

0 100 200 300 400 500

Rate [kbps]

PS

NR

[d

B]

TDWZ

H.264 IBIB

Median Prediction

Pixel interpolation

Fig. 9 Rate-distortion performance comparison of the proposed techniques against TDWZ and H.264/AVCIBIB for the Breakdancers sequence

Soccer QCIF @ 15 fps

26

28

30

32

34

36

38

40

0 100 200 300 400 500 600

Rate [kbps]

PS

NR

[dB

]

TDWZ

H.264 IBIB

Median Prediction

Pixel interpolation

Fig. 8 Rate-distortion performance comparison of the proposed techniques against TDWZ and H.264/AVCIBIB for the Soccer sequence

376 Multimed Tools Appl (2010) 48:369–379

median prediction and pixel interpolation. Since no temporal dependency is considered inthe proposed algorithm it is an intra-coding technique; therefore, the memory requirementat the encoder and decoder is significantly reduced and also overall latency is reduced.Simulation results clearly show that the proposed codec outperforms TDWZ by asignificant margin. In future we will work to improve the performance of the proposedtechnique for larger group of picture sizes and consider temporal information to furtherimprove the side information.

References

1. Aaron A, Zhang R Girod B (2002) Wyner-Ziv coding for motion video. Proc. Asilomar Conf. onSignals, Systems and Computers

2. Aaron A, Setton E, Girod B (2003) Towards practical Wyner-Ziv coding of video. In Proc. IEEE Int.Conf. on Image Process., ICIP-2003, Barcelona, Spain

3. Aaron A, Rane S, Setton E, Girod B (2004) Transform-domain Wyner-Ziv codec for video. In Proc.VCIP, San Jose, USA

4. Adikari ABB, Fernando WAC, Weerakkody WARJ, Arachchi HK (2006) Sequential motion estimationusing luminance and chrominance information for distributed video coding of Wyner-Ziv frames. IEEElectron Lett 42(7):396–397

5. Areia JD, Pereira F, Fernando WAC (2008) Impact of the key frames quality on the overall Wyner-Zivvideo coding performance. ELMAR Symposium, Zadar, Croatia

6. Ascenso J, Brites C, Pereira F (2005) Motion compensated refinement for low complexity pixel baseddistributed video coding. In Proc. IEEE AVSS 2005, pp. 593–598

7. Ascenso J, Brites C, Pereira F Improving frame interpolation with spatial motion smoothing for pixeldomain distributed video codec,” VISNET Report, [online]. Available: www.visnet-noe.org

8. Berrou C, Glavieux A, Thitimajshima P (1993) Near Shannon limit error-correcting coding anddecoding: turbo-codes. in Proc. IEEE Int. Conf. on Communications, Geneva, Switzerland, 1064–1070

9. Brites C, Ascenso J, Pereira F (2006) Improving transform domain Wyner-Ziv video codingperformance. In Proc. IEEE ICASSP, vol. 2, pp.II-525-II-528

10. Dalai M, Leonardi R, Pereira F (2006) Improving turbo codec integration in pixel-domain distributedvideo coding. In Proc. IEEE ICASSP, vol. 3, pp. III-257-III-260

11. Girod B, Aaron A, Rane S, Monedero DR (2003) Distributed video coding. In Proc. IEEE Special Issueon Advances in Video Coding and Delivery 93(1):1–12, 2003

12. Guo X, Lu Y, Wu F, Gao W (2006) Distributed video coding using wavelet. In Proc. IEEE ISCAS13. Ishwar P, Prabhakaran V, Ramchandran K (2003) Towards a theory for video coding using distributed

compression principles. In Proc. Int. Conf. on Image Processing 3:687–69014. Natario L, Brites C, Ascenso J, Pereira F Extrapolating side information for low-delay pixel-domain

distributed video coding. VISNET Report. [online]. Available: www.visnet-noe.org15. Puri R, Ramchandran K (2002) PRISM: A new robust video coding architecture based on distributed

compression principles. In Proc. 40th Allerton Conf. on Comm., Control, and Computing, Allerton, IL16. Schonberg D, Pradhan SS, Ramchandran K (2002) LDPC codes can approach the Slepian-Wolf bound

for general binary sources. In Proc. Allerton Con. on Communication, Control, and Computing,Champaign, IL

17. Slepian D, Wolf JK (1973) Noiseless coding of correlated information sources. IEEE Trans InformTheory IT-19:471–480

18. Tagliasaccchi M, Tubaro S (2005) A MCTF video coding scheme based on distributed source codingprinciples. Visual Communication and image process, Beijing

19. Ungerboeck G (1982) Channel coding with multilevel phase signals. IEEE Trans Inf Theory IT-28(I):55–6720. Weerakkody WARJ, Fernando WAC, Adikari ABB, Rajatheva RMAP (2006) Distributed video coding

of Wyner-Ziv frames using Turbo Trellis Coded Modulation. In Proc. IEEE ICIP, Atlanta, USA21. Wu B, Guo X, Zhao D, Gao W, Wu F (2006) An optimal non-uniform scalar quantizer for distributed

video coding. In Proc. IEEE Int. Conf. on Multimedia and Expo, pp.165–16822. Wyner D, Ziv J (1976) The rate-distortion function for source coding with side information at the

decoder. IEEE Trans Inform Theory IT-22:1–1023. Zhong H, Zhang T (2005) Block-LDPC: a practical LDPC coding system design approach. Circuits Syst

I: Fundam Theory Appl 52(4):766–775

Multimed Tools Appl (2010) 48:369–379 377

M. B. Badem received his B.Sc. (Hons.) degree in electrical and electronics engineering from Middle EastTechnical University, Ankara, Turkey, in 2006. Currently he is a research student completing his PhD in theCentre for Communication Systems Research in the University of Surrey, UK. His research interests includedistributed video coding and video compression.

W. A. C. Fernando received the B.Sc. Engineering degree (First class) in Electronic and Tele-communications Engineering from the University of Moratuwa, Sri Lanka in 1995 and the MEng degree(Distinction) in Telecommunications from Asian Institute of Technology (AIT), Bangkok, Thailand in 1997.He completed his PhD at the Department of Electrical and Electronic Engineering, University of Bristol, UKin February 2001. Currently, he is a senior lecture in signal processing at the University of Surrey, UK. Priorto that, he was a senior lecturer in Brunel University, UK and an assistant professor in AIT. His currentresearch interests include Distribute Video Coding (DVC), 3D video coding, intelligent video encoding forwireless communications, OFDM and CDMA for wireless channels, channel coding and modulationschemes for wireless channels. He has published more than 145 international papers on these areas. He is asenior member of IEEE and a fellow of the HEA, UK. He is also a member of the EPSRC College.

378 Multimed Tools Appl (2010) 48:369–379

A. M. Kondoz received the B.Sc. (Hons.) degree in engineering, the M.Sc. degree in telematics, and the Ph.D. in communication in 1983, 1984, and 1986, respectively. He became a Lecturer in 1988, a Reader in1995, and then in 1996, a Professor in Multimedia Communication Systems and deputy director of Centre forCommunication Systems Research (CCSR), University of Surrey, Guildford, U.K. He has over 250publications, including a book on low bit-rate speech coding and several book chapters. He has graduatedmore than 40 Ph.D. students in the areas of speech/image and signal processing and wireless multimediacommunications, and has been a consultant for major wireless media terminal developers and manufacturers.Prof Kondoz has been awarded several prizes, the most significant of which are The Royal TelevisionSocieties’ Communications Innovation Award and The IEE Benefactors Premium Award. He has been on theRefereeing College for EPSRC and on the Canadian Research Councils. He is a member of the IEEE and theIET.

Multimed Tools Appl (2010) 48:369–379 379