Duality between source coding and channel coding and its extension to the side information case

23
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49,NO. 5, MAY2003 1181 Duality Between Source Coding and Channel Coding and its Extension to the Side Information Case S. Sandeep Pradhan, Member, IEEE, Jim Chou, Member, IEEE, and Kannan Ramchandran, Senior Member, IEEE Abstract—We explore the information-theoretic duality between source coding with side information at the decoder and channel coding with side information at the encoder. We begin with a math- ematical characterization of the functional duality between clas- sical source and channel coding, formulating the precise conditions under which the optimal encoder for one problem is functionally identical to the optimal decoder for the other problem. We then extend this functional duality to the case of coding with side infor- mation. By invoking this duality, we are able to generalize the re- sult of Wyner and Ziv [1] relating to no rate loss for source coding with side information from Gaussian to more arbitrary distribu- tions. We consider several examples corresponding to both dis- crete- and continuous-valued cases to illustrate our formulation. For the Gaussian cases of coding with side information, we invoke geometric arguments to provide further insights into their duality. Our geometric treatment inspires the construction and dual use of practical coset codes for a large class of emerging applications for coding with side information, such as distributed sensor networks, watermarking, and information-hiding communication systems. Index Terms—Capacity–cost function, channel coding with side information, duality, rate–distortion function, source coding with side information. I. INTRODUCTION C LASSICAL source coding under a distortion constraint and channel coding under a channel cost constraint have long been considered as information-theoretic duals 1 of each other starting from Shannon’s landmark paper in 1959 [3]. In the source coding (rate–distortion) problem, the goal is to find, for a given source distribution, the conditional distribution be- tween the input source alphabet and the output reconstruction alphabet that minimizes the mutual information [4] between the Manuscript received October 11, 2000; revised January 5, 2003. This work was supported in part by the NSF ITR under Grant CCR-0219735, and by DARPA Sensor Information Technology Office under the Sensorwebs Project F30602-00-2-0538. The material in this work was presented in part at the 33rd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, 1999, at the IEEE International Symposium on Information Theory, Sorrento, Italy, June 2000, and at the IEEE International Symposium on Information Theory, Lausanne, Switzerland, June/July 2002. This work was performed while S. S. Pradhan was at the University of California, Berkeley. S. S. Pradhan is with the Electrical Engineering and Computer Science De- partment, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: pradhanv @eecs.umich.edu). J. Chou and K. Ramchandran are with the Electrical Engineering and Computer Science Department, University of California, Berkeley, Berkeley, CA 94720 USA (e-mail: [email protected]; [email protected]. edu). Communicated by P. A. Chou, Associate Editor for Source Coding. Digital Object Identifier 10.1109/TIT.2003.810622 1 This should not be confused for Lagrangian duality in optimization theory [2]. input source and the output reconstruction subject to a target dis- tortion constraint corresponding to a given distortion measure. In the channel coding (capacity–cost) problem, the goal is to find, for a given channel input–output conditional distribution, the channel input distribution that maximizes the mutual infor- mation between input and output subject to a target channel cost (or power) constraint corresponding to a channel cost measure. Cover and Thomas [4] point out that these two problems are information-theoretic duals of each other. This duality has also been exploited in the formulation of numerical optimization al- gorithms for the channel capacity and rate–distortion problems using the Blahut–Arimoto algorithm [5]. In [4 ] (see [4, Sec 13.5]), a packing versus covering interpretation of duality in Gaussian source and channel coding has been formulated to illustrate this duality. Indeed, some of the solutions to prac- tical source coding (respectively, channel coding) have been inspired by solutions to channel coding (source coding): e.g., trellis-coded quantization [6], [7] in source coding has been in- spired by trellis-coded modulation [8], [7] in channel coding, and shaping of constellation points [9] in channel coding has been inspired by constraining of entropy of quantization points [10] in source coding. In [7], a duality between quantization and modulation has been analyzed. In this work, we address the problem of duality between source [1] and channel [11] coding in the presence of side information at the decoder and encoder, respectively. As a first step toward this goal, we first address the more classical duality between source and channel coding with the objective of providing a mathematical characterization of their functional duality. To be specific, given an optimal source (respectively, channel) coding scheme, given by the encoder and decoder, we detail the conditions under which such a scheme is a functional dual to a channel (source) coding scheme, given by the encoder and decoder, in the sense that the optimal encoder mapping for one problem is functionally identical to the optimal decoder mapping for the other problem. We show in the sequel that this functional duality is enabled through appropriate choices of measures and constraints for the dual source (or channel) coding problem. Our inspiration for this formulation is derived from recent work by Gastpar et al. on measure-matching for uncoded communications [12], [13]. We then extend this duality to the case of coding with side information, where we study the duality between source coding with side information at the decoder, and channel coding with side information at the encoder. These problems have attracted considerable attention in recent times due to a large class of relevant applications related to both problems. A partial list of applications for source coding with side information includes 0018-9448/03$17.00 © 2003 IEEE

Transcript of Duality between source coding and channel coding and its extension to the side information case

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003 1181

Duality Between Source Coding and Channel Codingand its Extension to the Side Information Case

S. Sandeep Pradhan, Member, IEEE, Jim Chou, Member, IEEE, and Kannan Ramchandran, Senior Member, IEEE

Abstract—We explore the information-theoretic duality betweensource coding with side information at the decoder and channelcoding with side information at the encoder. We begin with a math-ematical characterization of the functional duality between clas-sical source and channel coding, formulating the precise conditionsunder which the optimal encoder for one problem is functionallyidentical to the optimal decoder for the other problem. We thenextend this functional duality to the case of coding with side infor-mation. By invoking this duality, we are able to generalize the re-sult of Wyner and Ziv [1] relating to no rate loss for source codingwith side information from Gaussian to more arbitrary distribu-tions. We consider several examples corresponding to both dis-crete- and continuous-valued cases to illustrate our formulation.For the Gaussian cases of coding with side information, we invokegeometric arguments to provide further insights into their duality.Our geometric treatment inspires the construction and dual use ofpractical coset codes for a large class of emerging applications forcoding with side information, such as distributed sensor networks,watermarking, and information-hiding communication systems.

Index Terms—Capacity–cost function, channel coding with sideinformation, duality, rate–distortion function, source coding withside information.

I. INTRODUCTION

CLASSICAL source coding under a distortion constraintand channel coding under a channel cost constraint have

long been considered as information-theoretic duals1 of eachother starting from Shannon’s landmark paper in 1959 [3]. Inthe source coding (rate–distortion) problem, the goal is to find,for a given source distribution, the conditional distribution be-tween the input source alphabet and the output reconstructionalphabet that minimizes the mutual information [4] between the

Manuscript received October 11, 2000; revised January 5, 2003. This workwas supported in part by the NSF ITR under Grant CCR-0219735, and byDARPA Sensor Information Technology Office under the Sensorwebs ProjectF30602-00-2-0538. The material in this work was presented in part at the 33rdAsilomar Conference on Signals, Systems and Computers, Pacific Grove, CA,1999, at the IEEE International Symposium on Information Theory, Sorrento,Italy, June 2000, and at the IEEE International Symposium on InformationTheory, Lausanne, Switzerland, June/July 2002. This work was performedwhile S. S. Pradhan was at the University of California, Berkeley.

S. S. Pradhan is with the Electrical Engineering and Computer Science De-partment, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: [email protected]).

J. Chou and K. Ramchandran are with the Electrical Engineering andComputer Science Department, University of California, Berkeley, Berkeley,CA 94720 USA (e-mail: [email protected]; [email protected]).

Communicated by P. A. Chou, Associate Editor for Source Coding.Digital Object Identifier 10.1109/TIT.2003.810622

1This should not be confused for Lagrangian duality in optimization theory[2].

input source and the output reconstruction subject to a target dis-tortion constraint corresponding to a given distortion measure.In the channel coding (capacity–cost) problem, the goal is tofind, for a given channel input–output conditional distribution,the channel input distribution that maximizes the mutual infor-mation between input and output subject to a target channel cost(or power) constraint corresponding to a channel cost measure.

Cover and Thomas [4] point out that these two problems areinformation-theoretic duals of each other. This duality has alsobeen exploited in the formulation of numerical optimization al-gorithms for the channel capacity and rate–distortion problemsusing the Blahut–Arimoto algorithm [5]. In [4 ] (see [4, Sec13.5]), a packing versus covering interpretation of duality inGaussian source and channel coding has been formulated toillustrate this duality. Indeed, some of the solutions to prac-tical source coding (respectively, channel coding) have beeninspired by solutions to channel coding (source coding): e.g.,trellis-coded quantization [6], [7] in source coding has been in-spired by trellis-coded modulation [8], [7] in channel coding,and shaping of constellation points [9] in channel coding hasbeen inspired by constraining of entropy of quantization points[10] in source coding. In [7], a duality between quantization andmodulation has been analyzed.

In this work, we address the problem of duality betweensource [1] and channel [11] coding in the presence of sideinformation at the decoder and encoder, respectively. As afirst step toward this goal, we first address the more classicalduality between source and channel coding with the objectiveof providing a mathematical characterization of theirfunctionalduality. To be specific, given an optimal source (respectively,channel) coding scheme, given by the encoder and decoder, wedetail the conditions under which such a scheme is a functionaldual to a channel (source) coding scheme, given by the encoderand decoder, in the sense that the optimal encoder mapping forone problem is functionally identical to the optimal decodermapping for the other problem. We show in the sequel thatthis functional duality is enabled through appropriate choicesof measures and constraints for the dual source (or channel)coding problem. Our inspiration for this formulation is derivedfrom recent work by Gastparet al. on measure-matching foruncoded communications [12], [13].

We then extend this duality to the case of coding with sideinformation, where we study the duality between source codingwith side information at the decoder, and channel coding withside information at the encoder. These problems have attractedconsiderable attention in recent times due to a large class ofrelevant applications related to both problems. A partial list ofapplications for source coding with side information includes

0018-9448/03$17.00 © 2003 IEEE

1182 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

distributed sensor networks [14], [33] sensor arrays, digitalupgrade of analog television signals [15], and communicationin ad hoc networks. The application set for the problem ofchannel coding with side information includes data hiding,watermarking [16], [17], broadcast [18], intersymbol inter-ference (ISI) precoding [19], multiantenna communicationsystems, and steganography. In this paper, we explore theinformation-theoretic duality between these two problems,and provide a mathematical characterization of the conditionsunder which they become functional duals, i.e., the encoder forone problem is exchangeable for the decoder of the other.

As an example, we will show that the problem of encoding amemoryless Gaussian source with correlated Gaussian side in-formation available at the decoder studied by Wyner and Ziv in[1] is a functional dual to the Gaussian watermarking problemstudied in [16], [17], related to the problem of “writing on dirtypaper” considered by Costa in [20]. Recently, Costa’s result [20]has been generalized from the case of Gaussian side informationto more arbitrary distributions in [21], [22]. By invoking our du-ality concepts, we are able to find a similar generalization of theresult of Wyner-Ziv [1] relating to no rate loss for source codingwith side information from Gaussian to more arbitrary distribu-tions. Rate loss refers to the performance loss due to the pres-ence of side information at only one end (decoder only in sourcecoding and encoder only in channel coding) rather than at bothends. A goal of this paper is to inspire the practical dual useof functional blocks for the two problems, building on our pre-liminary work in [23]. It is illuminating to note that both prob-lems have recently inspired promising constructive frameworksbased on coset codes that are underpinned by the relevant infor-mation-theoretic concepts, e.g., [24], [17], [23], [25], [22] forthe channel coding problem, and [26], [27] for the source codingproblem. Finally, as a way of providing additional insights intothis duality, and how it is useful in directly applying the toolsof one problem to the other, we provide a geometric proof ofthe Gaussian source coding problem with side information, andshow how it can be directly applied as a geometric proof of theGaussian watermarking problem.

To summarize, the main contributions of this paper are 1) amathematical characterization of the functional duality betweenconventional source and channel coding, 2) extension of this tothe case of source and channel coding with side informationat the decoder and encoder, respectively, and 3) additional in-sights into this duality through a geometric interpretation forthe Gaussian case of coding with side information. We note thatthe functional duality characterized through distributional re-lationships presented in this paper can be further extended toinvolve duality in codebooks similar to that considered in [34,Sec. 2.2]. In other words, for channel coding, we need code-books with probability of error , and for source codingwe need codebooks with probability of error . The paperis organized as follows. In the next section, we review the prob-lems of source coding with side information and channel codingwith side information. Sections III formulates the precise no-tion of duality between general source and channel coding prob-lems. We give a detailed geometric treatment of Gaussian sourcecoding with side information in Section IV and give a brief geo-metric treatment of channel coding with side information to get

Fig. 1. Source encoding with side information at the decoder (SCSI). Theencoder transmits at a rate greater than or equal toR bits per source sample,whereX is the source andS is the side information.

intuition, and to summarize the conceptual details in Section V.Section VI concludes the paper.

II. CODING WITH SIDE INFORMATION

In this section, we give a brief review of source coding andchannel coding with side information.

A. Source Coding With Side Information at Decoder (SCSI)

Consider the problem [1] of rate–distortion optimal lossyencoding of a source with the side information avail-able (losslessly) at the decoder as shown in Fig. 1.and

are correlated random variables with joint distribution, such that the sequence pair denotes

independent realizations of the given random variables. Letthe alphabets of and be , respectively. The encodingand decoding are done in blocks of length. The encoderis a mapping: , and the decoder isa mapping where is thereconstruction alphabet and the distortion criterion is given by

, where

is the per-letter additive distortion measure,2 is theexpectation operator, and is the rate of transmission.

Fact 1 [1]: The rate–distortion function for this setup,, is given by

(1)

such that , and form a Markov

chain and , where is an auxiliaryrandom variable with alphabet. These are the natural Markovchains associated with the definition of the problem, wherethe first chain characterizes the fact that only the decoder hasaccess to the side information. In other words, the dependencybetween the side information and the auxiliary variable iscompletely captured through the source. The second chainimplies that the decoder does not have access to the source.In other words, the reconstruction is completely3 characterizedby and . Using the first Markov chain it can be seen that

2Although d: X � X ! is typically considered in the literature, theextension to a more generald which includesS is straightforward. It can beinterpreted as the distortion betweenx and x when the outcome of the sideinformation iss. This is necessary for a precise formulation of duality betweensource and channel coding with side information. A similar generalization ofcost measure is considered in Section II-B for the case of channel coding withside information.

3It has been shown in [1] that there is no loss of performance in restrictingp(xju; s) to a deterministic function.

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1183

Fig. 2. Source coding with side information present at both encoder anddecoder: an intermediate stage is introduced withU .

. From this identity, the aboveoptimization problem can be rewritten as follows:

(2)

such that and , and

Now the rate–distortion function when both the en-coder and the decoder have access to the side information isgiven by

(3)

such that . As a way of measuring the per-formance loss due to the presence of side information at the de-coder only (see Fig. 1), we note that the SCSI problem (2) differsfrom (3) in two regards: the inclusion of auxiliary random vari-able , and the presence of the Markov chains. In order to ad-dress this, it is insightful to recast the rate–distortion problem in(3) as in Fig. 2 by introducing an intermediate processing stagebetween and that includes , and imposing the Markovconstraints as follows:

(4)

such that and , and

Note that this formulation subsumes that of (3) when ,i.e., the first processing stage is an identity operation. The onlyreason for this expanded version is to exactly match the con-straints of the SCSI problem of (2) for a fair comparison.

In general, under the given set of constraints. Thus, if and only if at opti-

mality in (4), , which means thatconditioned on the side information, the information con-veyed about the source by observing is the same as thatby observing . This implies the Markov constraint:

. Thus, given the side information, the auxiliaryvariable , and the reconstruction are information equivalent.So, the encoder can as well generate(which depends only onthe source ) instead of (which depends on both and sideinformation ), thus obviating the need for the side information

at the encoder.More formally, there is no rate loss, i.e.,

if and only if [1] the distribution achieving the minimization in(3) (say, ) can be represented in the form having theauxiliary random variable : such that

Fig. 3. Channel coding with side information (CCSI). The encoder has accessto side informationS about the channel;m represents the message, andX andY are the input and the output of the channel, respectively.

a) ;

b) the reconstruction can be represented as ,which is obtained by the SCSI problem (1); and

c) the following three Markov chains are satisfied:, and and .

It is important to note that these constraints are associatedwith the optimizing distribution for the problem of sourcecoding with side information at both encoder and decoder asgiven by (3), (4). It is not in general possible to guarantee norate loss if we start the other way around, i.e., with the SCSIproblem of (1), (2) and impose these constraints.

B. Channel Coding With Side Information at Encoder (CCSI)

Channel coding with side information (CCSI) [11], [24],[20] about the channel at the encoder is shown schematicallyin Fig. 3. The encoder wishes to communicate over a channelgiven by the conditional distribution , where is thechannel output, is the channel input which the encoder cancontrol, and , the side information with a given distribution

, is the second channel input which the encoder cannotcontrol but can only observe, and has alphabetand there is acost measure . This can be interpreted as thecost of transmitting when the side information outcome is.The encoder is a mapping ,and the decoder is a mapping , where

is the rate of transmission.Fact 2: The capacity , for a cost constraint , of this

channel coding setup is given by

(5)

such that , and forma Markov chain4 and , where is an aux-iliary random variable with alphabet. These are the naturalMarkov chains associated with the definition of the problem:where the first chain means that the dependency between theauxiliary variable and the channel output is completelycaptured through the channel inputand the side information

. In other words, given and , the channel does not careabout . The second chain implies that since the encoder doesnot have access to the channel output, the channel input is com-pletely5 characterized by and . The encoder can influencethe channel output only through the input . The originalformulation of [11] was extended to the case of cost measuresin [35].

4See [11, eq. (2.4)].5It has been shown in [11] that there is no loss in performance in restricting

p(xju; s) to a deterministic function.

1184 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

Now the capacity of the system , when both the en-coder and the decoder have access to the side information isgiven by

(6)

such that . As was done in SCSI, we notethat the CCSI problem (5) differs from (6) in two regards: theinclusion of auxiliary random variable , and the presence ofthe Markov chains. To address this, it is insightful to recast thecapacity–cost problem in (6) as in Fig. 4 by introducing an in-termediate processing stage that includes, and imposing theMarkov chains as follows:

(7)

such that and and

This is because in (7), using the Markov chain, we have . Note that this formula-

tion subsumes that of (6) when , i.e., the first processingstage is an identity operation.

In general, under the given set of constraints. Thus, if and only if

at optimality in (7), , whichmeans that the information conveyed about the auxiliary vari-able by observing is the same as that by observing.This implies that . Thus, given the channel output

, the side information does not give any more informationabout (which contains the message sent by the encoder) thusobviating the need for the side information at the decoder.

More formally, there is no rate loss, i.e.,if and only if the distribution achieving the maximization in (6)(say ) can be represented in the form having the aux-iliary variable : such that

a) ;

b) the channel input can be represented as ,which is obtained by the CCSI problem (5); and

c) the following three Markov chains are satisfied:, , and .

It is important to note that these constraints are associatedwith the optimizing distribution for the problem of channelcoding with side information at both encoder and decoder asgiven by (6), (7). It is not, in general, possible to guarantee norate loss if we start the other way around, i.e., with the CCSIproblem of (5) and impose these constraints.

III. D UALITY

Traditionally, the conventional source and channel coding [4]problems have been considered as information-theoretic duals

Fig. 4. Channel coding with side information present at both encoder anddecoder. An intermediate stage is introduced withU .

of each other. Shannon in his 1959 paper [3] on rate–distortiontheory stated (italicized for emphasis) “…duality between theproperties of a source with a distortion measure and those ofa channel. This duality is enhanced if we consider channels inwhich there is a cost associated with the different input letters… .” In this section, we obtain a precise characterization offunctional duality between conventional source and channelcoding and then extend this notion for source and channelcoding with side information. In other words, for a given sourcecoding problem, we obtain a channel coding problem, wherethe roles of encoder and decoder are functionally exchangeableand the input–output joint distribution is the same with somerenaming of variables. For a given source, the rate–distortionfunction [4] denoted by is the minimum rate of infor-mation required to represent it with a distortion constraint.Similarly, for a given channel, the capacity–cost function [4]denoted by is the maximum rate of information that canbe reliably transmitted with a cost constraint.

A. Duality in Conventional Source and Channel Coding

We would like to emphasize that the notion of duality per-vades the literature starting with Shannon in 1959. To studythese concepts in one framework, let us state a correspondencebetween the variables involved in the two coding problems asshown in (8) at the bottom of the page. To avoid confusion, westick to the conventional notation of source coding and use thesame for channel coding. Let us now recall two facts which havebeen recently studied in [12], [13] in a totally different context ofoptimality of uncoded transmission. It was shown in [12], [13]that for a given source and a channel, there exist distortion andcost measures such that uncoded transmission is optimal. Theessential concepts are presented in the following form for com-pleteness. We denote the given distribution by and the dis-tribution which optimizes the given objective function by .

Fact 3 [12]: For a conventional channel coding problem witha channel , input and output alphabets and , respec-tively, given a channel input distribution , a cost measure

and a cost constraint such that

(9)

where the cost measure and cost constraint are given, respec-tively, by

source codingsource input channel output

source reconstruction channel inputchannel coding. (8)

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1185

and

(10)

is information divergence [4], , are arbitraryconstants, .

Fact 4 [12]: For a conventional source coding problem, witha source , input and reconstruction alphabets, , respec-tively, given a conditional distribution , a distortionmeasure , and a distortion constraint suchthat

(11)

where the distortion measure and distortion constraint are given,respectively, by

and

(12)

and are arbitrary

It should be noted that (as suggested in [12]), in Fact 3, thecost measure given in (10) is unique (modulo affine transforma-tion) if , , where as in Fact 4, the distortionmeasure as given in (12) is unique (modulo affine transforma-tion). See [12] for details and other examples.

Remark 1: In Fact 3, for a range of cost measures, thechannel input distribution maximizes . This costmeasure has a nice interpretation in some interesting cases suchas Gaussian and binary-symmetric channels. It can be shown[12] that for an additive memoryless Gaussian noise channel,and Gaussian channel input distribution, this cost measure isgiven by , for an appropriate choice of and . In gen-eral, this cost measure penalizes those input distributions whichdo not result in the output distribution given by . Simi-larly, in Fact 4, the conditional distribution minimizes

for a range of distortion measures. As in Fact 3, it hasnice closed-form solutions for the Gaussian and binary memory-less sources. For example, for the Gaussian memoryless source,and additive Gaussian conditional distribution, this distortionmeasure is given by for an appropriate choice ofand . In general, this distortion measure somewhat cor-responds to the conditional length of description of the sourcevalue , given a reconstruction value.

Using this, we have the following theorem which connectsthe source and channel coding problems.

Theorem 1a:For a source coding problem, with a givensource , input, and reconstruction alphabets and ,respectively, a distortion measure , and a dis-tortion constraint , let the conditional distribution achievingrate–distortion optimality be given by

(13)

inducing the following distributions:

(14)Then adualchannel coding problem for the channel ,having input and output alphabetsand , respectively, a costmeasure , and a cost constraint , such that

• the rate–distortion bound is equal to the ca-pacity–cost bound , i.e.,

(15)

• the channel input distribution given by achieves ca-pacity–cost optimality, i.e.,

(16)

where the cost measure and cost constraint are given, re-spectively, by

(17)

for arbitrary and .Proof: Omitted (follows from Facts 3 and 4).

Theorem 1b:For a channel coding problem, with a givenchannel , input and output alphabets and , respec-tively, a cost measure , and a cost constraint , letthe channel input distribution achieving the capacity–cost opti-mality be given by

(18)

inducing the following distributions:

(19)

Then a dual source coding problem for the source ,having input and reconstruction alphabetsand , respec-tively, a distortion measure , and a distortionconstraint , such that

• the capacity–cost bound is equal to the rate–distor-tion bound , i.e.,

(20)

• the conditional distribution given by achievesrate–distortion optimality, i.e.,

(21)

where the distortion measure and distortion constraint aregiven, respectively, by

(22)

for arbitrary and .

1186 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

(a)

(b)

Fig. 5. (a) For a given source coding problem with sourcep(x), the dualchannel coding problem is defined in a reverse order. (b) For a given channelcoding problem with channelp(xjx), the dual source coding problem is definedin a reverse order.

Remark 2: The first property of the dual problem as givenby (15) implies that the minimum value of with re-spect to under one set of constraints is exactly equaltothe maximum value of with respect to underanother set of constraints. The second property as given by (16)implies that the solution to the dual channel coding problemis exactly the distribution induced from the solution tothe given source coding problem, provided we are careful aboutour choice of cost measure as given by (17). Similar interpreta-tions can be given to the properties given by (20)–(22). In otherwords, for a given source coding problem, we can associate adual channel coding problem with the same input–output jointdistribution, and the rate–distortion bound is equal to the ca-pacity–cost bound . Note that for a given sourcecoding problem, although the conditional distribution iswith as the source input, the dual channel coding problemis defined for the channel with input distributionin the reverse order as shown in Fig. 5. For a given source codingproblem, the test channel [4] characterized by becomesthe channel for the dual channel coding problem andvice versa.

Uniqueness of Duals:It can be seen that (under the condi-tion that , ) modulo affine transformations,we have a unique association of a distortion measure and a costmeasure for a given joint distribution . This is becauseof the condition of uniqueness of cost and distortion measuresas given by Facts 3 and 4. In other words, for a given joint dis-tribution, , there is a unique (modulo affine transforma-tion of the corresponding cost and distortion measures) pair ofsource and channel coding problems which are associated usingduality.

Using the forward part of Shannon source and channel codingtheorems, for a pair of dual problems, the random codebooksare constructed using the same distribution and the jointlytypical encoding and decoding (done, respectively, in source andchannel coding) are the same in the sense of finding a code-word from the codebook which is jointly typical (in the senseof or ) withthe observed -length sequence (coming from the same distri-bution ) of source input and channel output, respectively.Thus, the distribution characterizing the function of theencoder of source coding also characterizes the function of thedecoder in the dual channel coding. Thus, the encoder of theone is functionally identical to the decoder of the other andviceversa. Summarizing, we have the following.

(a)

(b)

Fig. 6. (a) Gaussian Source coding:p(x) = N (0; � ), � = ,

p(q) = N 0; , p(Z) = N (0; D). The effect of encoding onX

is characterized as�X + q. Test channel gives the relation betweenX andX . (b) Channel coding dual with the same joint distribution. The decoding isdepicted as one of recoveringX fromX .

• Source encoder and channel decoder are characterized bya mapping which has the same domain and range

, and similarly, the source decoder andchannel encoder are characterized by a mapping whichhas the same domain and range ,where is the rate of transmission. Thus, the roles ofencoder and decoder are reversed in these two problems.

• Using the forward part [4] of Shannon6 source andchannel coding theorems, in the limit of large blocklength, a rate–distortion optimal encoder of the former isfunctionally identical to a capacity–cost optimal decoderof the latter andvice versain the sense of Theorems 1aand 1b.

Let us consider the following examples as illustrations of du-ality.

Example 1—Source Coding7 (see Fig. 6(a)): The source isGaussian with meanand variance , i.e., ,with a quadratic (squared error) distortion measure

which is given by , and a distortionconstraint . The optimal conditional distribution [4] satisfying(13) is of the form , whereand , i.e.,

The induced distributions are given by , where, i.e.,

and

Note that .Channel Coding Dual (see Fig. 6(b)):Given the addi-

tive memoryless Gaussian noise channel as given above,, i.e., , and a cost measure

6Recall that in the limit of large block length, in source coding, the rate–dis-tortion boundR(D) is approached from above, while in channel coding thecapacity–cost boundC(W ) is approached from below.

7With a slight abuse of notation, we denote a Gaussian random variableXwith mean� and variance� as eitherX � N (�; � ) or p(x) = N (�; � ).

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1187

, a cost constraint ; the channel input distri-bution obtained from the above source coding problem, givenby maximizes over allsuch that , where using Theorem 1a

(23)

for an appropriate choice of parameters:

and the cost constraint is . This cost measurecorresponds to the traditional average power constraint

on the channel input. Note from Fig. 6 that the roles of the en-coder and the decoder of these dual problems are exactly re-versed. The test channel in the source coding problem becomesthe channel in the dual channel coding problem. It is also in-teresting to note that we could have started with this channelcoding problem and obtained the above source coding problemas its dual by noting that can be put in the form (givenin (12)), , where ,

.

Example 2—Channel Coding:Consider a binary-symmetricchannel with transition probability, characterized as

, where , , and denotesbinary modulo- addition. The cost measure is given by

for , which corresponds to a constraint on the dutycycle of the channel input. The capacity of this channel withexpected cost , for some can beshown [4] to be8 , where

. The optimal channel input distributionis . Note that .

Source Coding Dual: Given the binary source with distribu-tion from the above, i.e., , with a dis-tortion measure , and a distortion constraint ; the con-ditional distribution (induced from the above channelcoding problem) minimizes over all such that

, where using Theorem 1b

(24)

is the Hamming distortion measure if we choose

and the distortion constraint is . Thus, in this binaryexample, the duty cycle cost measure for the channel codingproblem corresponds to the Hamming distortion measure for thedual source coding problem.

B. Duality Between Source and Channel Coding WithCommon Side Information

Let us consider a similar formulation of duality for source andchannel coding with common side information present at boththe encoder and the decoder (denoted by SCSIB and CCSIB,respectively, where the appended “B” stands for “both”). Thisis a straightforward extension of the treatment of the previous

8Note that whenW = 0:5, we get the classical capacity of the binary-sym-metric channel,1� h(p), whereh(�) is the binary entropy function.

section to the case where we condition on the common sideinformation. The motivation for addressing this extension is thatthis is a necessary step (as will be evident in Section III-C) thatwill serve as a bridge to establishing the duality between SCSIand CCSI. Another motivation is to establish the need for thegeneralized notion of cost and distortion measures introduced inSection II. The following lemmas extend the ideas given in Facts3 and 4 to this case in a straightforward way and are presentedhere for completeness.

Lemma 1a: For CCSIB, with a given channel , sideinformation , input, side information, and output alphabets

, , and , respectively, given a conditional channel inputdistribution , a cost measure and acost constraint such that

(25)

under the constraints , , and, where

and

(26)

with and being arbitrary and

from standard Bayes’ rule.Proof: Omitted.

Lemma 1b: For SCSIB, with a given source , side in-formation , input, side information, and reconstruction al-phabets , , and , respectively, given a conditional distribu-tion , a distortion measure,and a distortion constraint such that

(27)

under the constraints , , and, where

and

(28)

with and being arbitrary, and

from standard Bayes’ rule.

As noted earlier, with the more general distortion and costmeasures, we are able to extend the concepts of Facts 3 and 4 tothe case when a common side information is present at both theencoder and the decoder. Let . We usethis notation for the rest of this paper. The cost measure, as givenby (26), can be shown to be unique modulo affine transformationif , , . The distortion measure asgiven by (28) is unique modulo affine transformation. We will

1188 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

give an example of a Gaussian source to illustrate the behaviorof these measures. We now connect these two coding problemsusing the following theorem (which is an extension of Theorem1a), which says that for a given SCSIB, we can associate a dualCCSIB having the same joint distribution.

Theorem 2a:For an SCSIB, with a given source , sideinformation , input, side information, and reconstruction al-phabets , , and , respectively, a distortion measure

, and a distortion constraint , let the conditionaldistribution achieving the rate–distortion optimality begiven by

(29)

inducing the following distributions:

(30)

Then, a dual CCSIB for the channel , having sideinformation , input, side information, and output alphabets

, , and , respectively, a cost measure , anda cost constraint such that

• the rate–distortion bound with side informationis equal to the capacity–cost bound9 with side information

, i.e.,

(31)

• the channel input distribution given by achievescapacity–cost optimality, i.e.,

(32)

where the cost measure and cost constraint are given, re-spectively, by

(33)

for arbitrary and .Proof: Omitted.

Remark 3: The first property of the dual problem as givenby (31) implies that the minimum of with respectto under one set of source-coding constraints is equalto the maximum of the same objective functionwith respect to under another set of channel-coding con-

9C (W ) denotes the capacity of CCSIB using (6) and (8).

straints. The second property as given by (32) implies that thesolution to the dual CCSIB problem is exactly the distribution

induced from the solution to the given SCSIB, pro-vided we are careful about our choice of cost measure as in (33).We emphasize that for a given SCSIB, we can associate a dualCCSIB with thesame joint distribution.

A similar result holds true for the case of CCSIB which isan extension of Theorem 1b. It essentially says that for a givenCCSIB problem, there exists a dual SCSIB problem with an op-timal distortion measure. We are not presenting this here.Inother words, for a given SCSIB, there exists a dual CCSIB suchthat in the limit of large block length, a rate–distortion optimalencoder of the former is functionally identical to a capacity–costoptimal decoder of the latter andvice versa. Further, using theforward part of the coding theorems [28], we see that the randomconditional codebooks are constructed with the conditional dis-tribution corresponding to the given outcome of theside information. The jointly typical encoding and decoding inSCSIB and CCSIB, respectively, are identical.

Uniqueness of Duals:Similar to the conventional sourcecoding and channel coding problems, it can be seen that(under the condition that , ),modulo affine transformations, we have a unique associationof a distortion measure and a cost measure for a given jointdistribution , induced from the uniqueness conditionstated following Lemmas 1a and 1b. In other words, underthis condition, we have a unique pair of SCSIB and CCSIBproblems which are associated under duality for a given jointdistribution . Let us consider an example to illustratethe dual problems.

Example 3—SCSIB (see Fig. 7(a)):Consider a Gaussiansource , given by , with , i.e.,

and is the Gaussian side informationgiven by for some positive real and ,the distortion measure is quadratic

, and the distortion constraint. This correspondsto the case of reconstruction of, the difference betweenand (instead of ) using a mean-squared error (MSE) asdistortion measure. Let . The optimal conditionaldistribution10 [28] satisfying (29) is given by ,with , i.e.,

The induced conditional test channel distribution is given by, where , i.e.,

and is independent of , i.e.,

Note that . Further, is equal to the coefficient in theminimum MSE (MMSE) estimate of from .

Dual CCSIB (see Fig. 7(b)):Given the additive memory-less Gaussian channel obtained from the above SCSIB, char-acterized by , i.e.,

10Given by the rate–distortion bound onV = X � S.

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1189

(a)

(b)

Fig. 7. Duality:� = (N � D)=N , p(q) = N 0; , p(Z) =

N (0; D). (a) SCSIB: As given in Example 3, with distortiond(x; x; s) =(x � x � s) . (b) CCSIB: Dual of (a) with cost!(x; s) = x . Note the rolesof encoder and decoder are exchanged in (a) and (b).

, with a cost measure , and a cost con-straint ; the channel input distribution induced from the aboveSCSIB, , maximizes overall such that , where using Theorem 2a,we have the cost measure

(34)

if we choose

and the cost constraint is . This corresponds to asimple average power constraint on the channel input, indepen-dent of the side information . Thus, the encoder completelyignores the presence of the side information, and the decodersubtracts the side information from the channel output beforefurther processing. Note that the functional roles of the encoderand decoder of these dual problems are exactly reversed: the testchannel of SCSIB becomes the channel of the dual CCSIB, andthe source encoder for SCSIB becomes the channel decoder forCCSIB. We could just as easily have started with the CCSIBproblem and found the dual SCSIB problem.

Remark 4: If in this example, for SCSIB with the samesource , and side information , we choose the distortionmeasure to be (i.e., if we are interestedin the conventional MSE reconstruction of the sourceratherthan that of the difference between the source and the sideinformation, ), then the optimal conditional distributionis given by , with

i.e.,

This induces the test channel given by , i.e.,, and the correlation between and

is given by . The dual CCSIB willhave the channel , under the channelcost measure . This corresponds to aconstraint on the average power of the difference between thechannel input and side information . (In data hiding andwatermarking applications [16], [17] such a cost measure ispopular as it represents the amount of distortion induced on theoriginal host signal.) Summarizing, we see that the distortionmeasure in SCSIB corresponds to the channel costmeasure in the dual CCSIB, where we saw earlier thatthe distortion measure in SCSIB corresponds tothe cost measure in the dual CCSIB.

C. Duality of SCSI and CCSI

This subsection contains the main result of this paper. Firstlet us state a correspondence between the variables used in thedefinition of SCSI and CCSI as shown in (35) at the bottom ofthe page. We follow the notation of SCSI and use the same forCCSI. The duality between SCSI and CCSI has been studied in[23], [29]. In this subsection, we formulate the notion of func-tional duality for SCSI and CCSI similar to the one developedin the previous subsections.

1) Duality Formulation: Recall that in the Wyner–Ziv for-mulation of SCSI (Fig. 1), the problem is to rate–distortion op-timally compress with side information present only atthe decoder. The objective function is ,which is minimized over and , where is anauxiliary random variable. It is important to note that there aretwo natural Markov chains associated with the definition of theproblem: and . Supposethe objective function is optimized by

. This,11 with the Markov chains com-pletely determine the joint distribution

thus fixing all conditional distributions including . Sim-ilarly, in the Gelfand–Pinsker formulation of CCSI (Fig. 3), theproblem is to reliably transmit the maximal amount of informa-tion across a channel in the presence of side information onlyat the encoder. The objective function is ,which is maximized over and , where is anauxiliary random variable. The natural Markov chains associ-ated with the definition of this problem are

11Recall the convention relating top(�) andp (�) given in Section III-A.

SCSI

source input channel outputside information side informationauxiliary variable auxiliary variable

source reconstruction channel input

CCSI (35)

1190 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

and . Suppose the objective function is opti-mized by . This, with the Markov chainscompletely determines the joint distribution

thus fixing all the conditional distributions including .In order for general duality between the CCSI and SCSI prob-

lems, we have to reconcile the fact that these two problemscome with different Markov chain conditions in their very defi-nition. Thus, for a general SCSI problem, the distribution min-imizing cannot be used for any CCSIproblem unless is satisfied, which is anecessary initial condition for the latter. Therefore, unless theoptimizing joint distribution satisfiesall threeMarkov chains, there can be no duality. However, there existsa theoretically significant subset of these problems (includingthe important Gaussian case), where these Markov chains aresatisfied, and we do have a functional duality. We have the fol-lowing theorems characterizing this.

Theorem 3a:For an SCSI, with a given source , sideinformation , input, side information, and reconstruction al-phabets , and , respectively, a distortion measure

, and a distortion constraint , let the conditionaldistribution achieving the rate–distortion optimality begiven by

(36)

inducing the following distributions:

and

(37)

If is such that , then adual CCSI, for the channel , having side information

, input, side information, and output alphabets, , andrespectively, a cost measure , and a cost

constraint , such that

• the rate–distortion bound with side information isequal to the capacity–cost bound with side information ,i.e.,

(38)

• the conditional distributions , achievecapacity–cost optimality, i.e.,

(39)

where the cost measure and cost constraint are given, respec-tively, by

(40)

for arbitrary and .Proof: See the Appendix.

Theorem 3b:For a CCSI, with a given channel ,side information , input, side information, and output al-phabets , , and , respectively, a cost measure

, and a cost constraint , let the conditional distributionachieving the capacity–cost optimality be given by

(41)

inducing the following distributions:

and

(42)

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1191

Fig. 8. Coset code structure for SCSI and CCSI for the Gaussian case. In SCSI (respectively, CCSI) the set of source (channel) codewords denoted by smallspheres are partitioned into cosets of channel (source) codewords denoted by big spheres.

If is such that , then a dual SCSIfor the source , having side information , input, sideinformation, and reconstruction alphabets and , respec-tively, a distortion measure , and a distortionconstraint such that

• the capacity–cost bound with side information isequal to the rate–distortion bound with side information ,i.e.,

(43)

• the conditional distributions andachieve rate–distortion optimality, i.e.,

(44)

where the distortion measure and distortion constraint are given,respectively, by

(45)

for arbitrary and .

Remark 5: Before we interpret these theorems in terms offunctional duality, we want to point out (see an illustration of theGaussian case in Fig. 8) that using the forward part of the codingtheorems [1], [11] in SCSI and CCSI, a codebook with blocklength , consisting of roughly codewords is parti-tioned (the so-called random binning as given in [4]) intocosets containing roughly codewords (see the corre-spondence in (35)). In SCSI, this is a partition of a source code-book for quantizing into cosets of “channel codebooks” for

the fictitious channel between and . In CCSI, this is a parti-tion of a channel codebook for the fictitious channel betweenand into cosets of “source codebooks” for quantizing. Thisconcept of coset codes also ties these two problems together.

Remark 6: In Theorem 3a, the property of the dual CCSI,given by (38), implies that the minimum ofwith respect to and under one set of con-straints is equal to the maximum of the same objective function

with respect to andunder another set of constraints. Further, the property as givenby (39) implies that the solution to the dual CCSI is exactlythe distributions and induced from thesolution to the given SCSI, provided we are careful about ourchoice of channel cost measure as given in (40). A similarinterpretation can be given to the properties given by (43) and(44) in Theorem 3b. In other words, Theorem 3a states thatfor a given SCSI, we can associate a dual CCSI such that thejoint distribution is identical, with the appropriate choice of thechannel cost measure, and the rate–distortion bound is equal tothe capacity cost bound, .

Using the forward part [1], [11], the random codebooksare constructed with distribution . These codebooks arerandomly partitioned into approximatelycosets (see Remark 5). The test channel characterized by

of SCSI becomes the channel of the dual CCSI andvice versa. The jointly typical encoding operation in SCSI isidentical to the jointly typical decoding operation in CCSI inthe following sense: find a codeword from the compositecodebook which is jointly typical (in the sense of )with the observed (source input in SCSI and channel outputin CCSI) -sequence coming from the same distribution

. Then the encoder of SCSI sends the index of the cosetcontaining this typical codeword as a message to the decoder,while the decoder of CCSI declares the index of the cosetcontaining this typical codeword as the message sent fromthe encoder. The corresponding decoder and encoder of SCSIand CCSI, respectively, have access to a message. The jointlytypical decoding operation in SCSI is identical to the jointlytypical encoding operation in CCSI, and is done as follows:find a codeword from the coset whose index is given bythe message that is jointly typical (in the sense of )with the observed side information. Thus, ( ,respectively) characterizing the function of the encoder (de-coder) of an SCSI also characterizes the function of the decoder(encoder) of the dual CCSI. The reconstruction and the channel

1192 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

Fig. 9. Venn diagram illustrating the conditions for duality and no rate loss.

input are the same, taken as a function of this typical codewordand the observed side information.Thus, the encoder (decoder,respectively) of SCSI and the decoder (encoder) of CCSI arefunctionally identical.

We want to emphasize that unlike conventional source andchannel coding or SCSIB and CCSIB, in SCSI and CCSI, wehave duality only under the condition as given in Theorem 3. Atthis point, we do not have any general characterization of thedistributions of the source, the channel or the side information,the distortion measures, and the cost measures for which thiscondition holds. However, in the sequel we provide two sets ofconcrete examples where this condition indeed holds and wehave duality.

2) Nonuniqueness of Duals:The well-studied Gaussiancases for SCSI [1] and CCSI [20] tempt one to associateduality with no rate loss.But, in general, functional dualitydoes not equate to no rate loss in the given SCSI and similarlyin the given CCSI.In other words, even if a given SCSI hasrate loss compared to its corresponding SCSIB, we can finda dual CCSI if the optimizer in the given SCSI satisfies thethree Markov chains as mentioned in Theorem 3a. This isillustrated using an example to be given in Section III-C4.A similar observation can be made for a given CCSI withthe joint distribution induced by the optimizer and

, satisfying . This is illustrated usinga Venn diagram in Fig. 9, where the box denotes the set of alltuples, given by , of a dis-tribution, a distortion measure, and a cost measure defined onappropriate alphabets.12 Condition A means thatis the optimizing distribution for SCSI with , ,and (obtained from using Bayes’rule) held fixed, under the two Markov chains natural toSCSI. Condition B means that is the optimizingdistribution for CCSI with , , and(obtained from using Bayes’ rule) held fixed,under the two Markov chains natural to CCSI.

12The distortion constraintD, and cost constraintW are (not shown in thefigure here for clarity) determined uniquely by the tuple:D = E(d(x; x; s))andW = E(w(x; s)).

If the distribution given by such a tuple satisfies the twoMarkov chains and Condition A as given, then we can associatean SCSI problem with this tuple, where the cost measure willnot have any significance. A similar property holds for theCCSI case. Finally, it should be noted that in the proof ofTheorems 3a and 3b, the corresponding CCSIB and SCSIBproblems serve as a bridge in obtaining the dual problems forSCSI and CCSI.

Note that for a general SCSI (say SCSI), with a given source, side information , and distortion measure, even if

the joint distribution (say, ) induced by the opti-mizer and has the Markov property

(needed to define the dual CCSI problem), theremight be rate loss (see the end of Section II-A for conditions forrate loss) compared to the corresponding SCSIB (say, SCSIB).We want to point out that its dual CCSI (say CCSI), with thecost measure given in Theorem 3a, which uses the same jointdistribution has no rate loss (see the end of Sec-tion II-B) compared to its corresponding CCSIB (say, CCSIB),as can be seen from the proof of Theorem 3a. Further, if wehad started with CCSIas the given problem, then its dual saySCSI with the associated distortion measure (say,) as givenby Theorem 3b, will have no rate loss compared to its corre-sponding SCSIB. Clearly, both SCSIand SCSI use the samejoint distribution , but different distortion mea-sures. Using the forward part of Wyner–Ziv theorem, the en-coder and the decoder of SCSIare, respectively, identical tothat of SCSI. This is illustrated as given at the bottom of thispage. A similar sequence of arguments holds true for a generalCCSI with the joint distribution satisfying .

Thus, in SCSI and CCSI, we cannot guarantee uniquenessof functionally dual problems. In other words, a class of SCSIwith and held fixed (but with different distortionmeasures, and the associated distortion constraints) can be func-tionally dual to another class of CCSI with andheld fixed (but with different cost measures and the associatedcost constraints), where every SCSI and CCSI problem in theseclasses result in the same joint distribution, say, .But, we can guarantee (as will be shown in the sequel) thatamong this pair of classes there exists a unique (modulo affinetransformation) pair of distortion and cost measures for whichthe corresponding SCSI and CCSI problems have no rate loss.This leads to a precise characterization of unique dual SCSI andCCSI problems. This is summarized by the following corollary.

Corollary 1: For every SCSI (respectively, CCSI) with agiven source (channel ), side information ,a distortion measure (cost measure ), and a distortion con-straint (cost constraint ), if the optimal joint distribution

satisfies ( ),then a unique (modulo affine transformation) distortion

SCSIBrate loss

SCSI

CCSIBNO rate loss

CCSI

SCSIBNO rate loss

SCSI

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1193

measure (cost13 measure ), and a distortion constraint(cost constraint ) such that the SCSI (CCSI) using this

distortion (cost) measure has as the optimaljoint distribution, and there is no rate loss compared to itscorresponding SCSIB (CCSIB). They are given as follows:

and, respectively,

(46)

where (respectively, ) is obtained fromusing Bayes’ rule.

The uniqueness as suggested in Corollary 1 is because ofthe following arguments. For the corresponding SCSIB (withsource , side information ), the unique (see the ar-guments following Remark 3), modulo affine transformation,distortion measure for which the conditional distribution givenby (obtained from using Bayes’ rule)achieves optimality is given by

where are arbitrary, and is obtainedfrom using Bayes’ rule. This means that for anyother distortion measure which is not an affine transformationof , will not be the optimizing distribution for thisSCSIB. Thus, for any other distortion measure which is not anaffine transformation of , for which is the opti-mizing distribution in the SCSI, the optimal performance of thecorresponding SCSIB is strictly better, resulting in rate loss forthe SCSI. Thus, is unique (modulo affine transformation) forwhich there is no rate loss for a given . A similarargument can be used for CCSIB, with the additional conditionthat , .

In other words, if a SCSI given by

is dual to a CCSI given by

then a corresponding SCSI

is also dual to this CCSI, where

, and are arbitrary.Similarly, all these SCSI problems are also dual to CCSI prob-lems given by

where , ,and are arbitrary. These SCSI and CCSI formequivalence classes, say, denoted by

and

13Under the condition that8 s 2 S ; x 2 X , p (xjs) > 0 for CCSI.

respectively, and let be the resulting joint distri-bution. The distortion/cost measures in an equivalence class areaffine transformations of each other. Further, there might be an-other set of equivalence classes

of SCSI and

of CCSI such that all of them result in the same joint distribution. Thus, we can define a more general equivalence

class of SCSI

where every problem in this class results in . Sim-ilarly, define a more general equivalence class of CCSI

where every problem in the class results in .It should be recalled that the SCSI problems and the CCSIproblems considered here are those for which all three Markovchains are satisfied. Using Corollary 1, among such

there exists auniqueequivalence class

for which there is no rate loss. A similar result holds for CCSIunder the condition that , , wherethe unique equivalence class is given by

Thus, among two equivalence classes of dual SCSI and CCSIgiven by

and

respectively, there exists a unique pair of equivalence classes ofdual SCSI and CCSI, given by

and

respectively, for which there is no rate loss.Thus, for

we can define a unique distortion measure ,where is obtained from using Bayes’rule. Similarly, for

1194 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

we can define a unique cost measure ,where is obtained from using Bayes’rule.Thus, we have an association of a distortion measure anda cost measure which are unique modulo equivalence class,for a pair of dual SCSI and CCSI problems. Thus, there are twosources of nonuniqueness, where one is coming from affinetransformations of distortion/cost measures, and the other iscoming from the no-rate-loss condition. In Section III-C3 andIII-C4, we give examples of SCSI and CCSI as duals andobtain the corresponding unique (modulo equivalence class)distortion/cost measures, where we show that for the Gaussianexample, these distortion and cost measures are quadratic innature, and for the binary example, they are simple extensionsof Hamming measures.

Summarizing the duality between SCSI and CCSI, we notethe following.

• The objective function is convex inone variable , given , , ,

, and ; and concave in the othervariable , given , , ,

, and .

• For SCSI, and are fixed; the optimization in-volves minimizing over .

• For CCSI, and are fixed; the optimizationinvolves maximizing over .

• The encoder of SCSI and the decoder of CCSI are charac-terized by a mapping having the same domain and range:

. Similarly, the decoder of SCSIand the encoder of CCSI are characterized by a mappinghaving the same domain and range:

. Thus, the mappings of the encoder and de-coder are reversed in these two problems.

• Using the forward14 part of the Wyner–Ziv [1] andGelfand–Pinsker [11] theorems, in the limit of large blocklength, a rate–distortion optimal encoder of a SCSI isfunctionally identical to a capacity–cost optimal decoderof the dual CCSI and vice versa in the sense of Theorems3a and 3b.

In the following, we consider an important generalization ofthe result of Wyner–Ziv [1] relating to no rate loss for SCSI fromGaussian to more arbitrary distributions. We have obtained thisgeneralization as a dual to the recent generalization of Costa’sresult to arbitrary side information, obtained in [21], [22]. Fi-nally, we consider a discrete example [1] of SCSI having rateloss compared to its corresponding SCSIB and obtain its dualCCSI.

3) Generalization of Gaussian Case of Wyner and Ziv WithNo Rate Loss:Recently, in [21], [22], a generalization of the“dirty paper” result of Costa [20] was considered, where norate loss was shown to hold even if the channel side informa-tion were not Gaussian (the channel noise has to be Gaussian).That is, for the CCSI problem with (see

14Recall that in the limit of large block length, in SCSI, the rate–distortionboundR (D) is approached from above, while in CCSI, the capacity–costboundC (W ) is approached from below.

(a)

(b)

Fig. 10. Duality:�, q, andZ are the same as in Fig. 7. The side informationS has arbitrary but fixed distribution. (a) CCSI: The cost measure is!(x; s) =x . The joint distribution is represented in this form. The decoding operation isdepicted as one of recoveringU fromX . (b) SCSI: The distortion measure isd(x; x; s) = (x � x � s) . The joint distribution is represented in this form.The effect of encoding is illustrated as�X + q.

correspondence in (35)), where is independent and identi-cally distributed (i.i.d.) Gaussian, there is no capacity loss overthe corresponding CCSIB problem whereis known at bothends, with being arbitrary. By the above duality, we are nowin a position to correspondingly generalize the Wyner–Ziv re-sult (with no rate loss) for the Gaussian source characterizedby (where are all Gaussian) to the moregeneral case where is Gaussian but and are otherwisearbitrary (but related by ): this follows directly fromthe above duality, and is a direct example of how generalizationof a result in one problem leads to a corresponding impact inthe dual problem. This important generalization is presented asa corollary in the following. While the SCSI claim of this corol-lary can be obtained “from first principles” similar to [21], [22],it is far more insightful and elegant to deduce this using the du-ality concepts developed in Theorem 3.

CCSI [21], [22] (see Fig. 10(a)): Let the channel be addi-tive white Gaussian, given by , with

, i.e., for some andlet the side information, given by bearbitrary, and let thecost measure be quadratic: and letfor some15 . Let as before. It was shownin [21], [22] that (no rate loss) andthat the optimizer given by the distribution, , i.e.,

and the distributioncharacterizing the channel input, induce the rela-tion between and as given by , where

i.e.,

Note (as in Section III-B) that is equal to the coefficientin the MMSE estimate of from . We now provide aninterpretation for the joint distribution depicted in Fig. 10(a) byconnecting it to that depicted in the CCSIB problem in Fig. 7(b).

15Using the correspondence as given in (35) and renamingP = N � D,N = D, we get the notation for CCSI conventionally used in the literature.

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1195

It is observed that CCSI can be obtained from the correspondingCCSIB by moving the side-information adder (arrow labeled“- ” in Fig. 7(b)) to the right past the addition by “” and wrap-ping around to the encoder with input as , re-sulting in . This is akin to moving the decisionfeedback equalizer (DFE) from the receiver to the transmitteras a precoding unit in ISI channels [19]. Note that the channelinput is statistically independent of the side information,i.e., . Further, it can be verified (seethe Appendix for a note regarding this) that the joint distributionsatisfies the Markov chain (a necessary conditionfor finding the dual SCSI). The channel outputis related tothe side information in the following way: , where

, i.e., .

Corollary 2: For the above CCSI, we can associate adual SCSI (see Fig. 10(b)) given as follows. Given a con-ditional source characterized by , where

, i.e., and side informa-tion as above (chosen arbitrarily in CCSI); the conditionaldistribution given by the above CCSI, , i.e.,

and given by minimize, such that and and

, implying and thedistortion measure is given by

(47)

by choosing and withdistortion constraint .

Proof: Follows from Theorem 3b and the three Markovchains.

The reconstruction is the MMSE estimate offrom and . It is observed that the joint distribution

of SCSI as depicted in Fig. 10(b) can be obtained from that ofSCSIB as depicted in Fig. 7(a), by moving the side-informationadder (arrow labeled “-”) in the encoder to the right past theaddition by “ ,” and wrapping around to the decoder with

, resulting in . Note that in both problems,there is no rate loss compared to the corresponding CCSIB andSCSIB even though is chosen arbitrarily. This CCSI withthe given quadratic cost measure is the problem of transmissionover a Gaussian channel with known interference similar to theone considered by Costa [20]. Thus, its dual is the SCSI with thedecoder wishing to reconstruct under a MSE constraint.

As discussed in Section III-B, if we consider the channel (seeFig. 11(a)) to be additive white Gaussian, , i.e.,

, with the side information being arbi-trary but known and as the cost measure

(a)

(b)

Fig. 11. Duality:�, q, andZ are the same as in Fig. 10. The side informationS has arbitrary but fixed distribution. (a) CCSI: The cost measure is!(x; s) =(x� s) . (b) SCSI: The distortion measure isd(x; x; s) = (x� x) .

with the cost constraint given by , then the optimaldistributions , , and will remain the sameas before, but the channel input becomes .The dual SCSI (see Fig. 11(b)) will have as the distor-tion measure, similar to the observation made in Remark 4, with

as the conditional source and all the dualproperties as given by Theorem 3b. With this cost measure, thisCCSI corresponds to the watermarking channel [16], [17], i.e.,a power constraint on , and its dual is the SCSI with thedecoder wishing to reconstruct under an MSE constraint. Inboth of the above pairs of dual problems, the channel of CCSIbecomes the test channel of SCSI, and the encoder of one ofthe problems is functionally identical to the decoder of the dualproblem.

4) Example of Discrete SCSI and CCSI:Here we considera discrete example which has the following interesting charac-teristics.

• We find a pair of SCSIB and CCSIB (given by SCSIBandCCSIB ) to be dual problems in the sense of Theorem 2a,but their corresponding SCSI and CCSI (given by SCSIand CCSI), although looking similar, are not duals in thesense of Theorems 3a and 3b.

• SCSI and CCSI have rate loss (unlike the previousexample) compared to their corresponding SCSIBandCCSIB , respectively.

• The optimal distribution in SCSIsatisfies the two Markovchains and we find its dual CCSI given by CCSIas shownat the bottom of the page.

CCSI : Given a binary (watermarking) channel ,characterized by where and ,

, for some , , andindependent. Let the cost measure be (equivalentto a constraint on the Hamming distortion betweenand )

SCSIBrate loss

CCSI SCSI

CCSIBrate loss

CCSI

1196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

The encoder wishes to communicate with forsome . The capacity of this channel is given as fol-lows.

Lemma 2: The capacity of above channel is, where

and

(48)

for , , and .Proof: See the Appendix.

This is essentially the concave hull of the function . Thecapacity is plotted as a function of in Fig. 12(a). When

(see Fig. 12(a)), and we have ,, and is independent of , and . For

, we need time-sharing to achieve the optimal points.The joint distributiondoes notsatisfy . Thus, it isnot possible to find a dual SCSI problem.

CCSIB : The capacity of the corresponding channel whenboth the encoder and the decoder have access to the side infor-mation is given by .

SCSI [1]: Consider a doubly symmetric binary sourcesuch that , , and

where and independent of andfor some . The encoder wishes to

represent such that , whereand . The rate–distortion function [1] is

where

and

(49)

for and and . Thus,the rate–distortion function is the convex hull of the function

. The rate–distortion function is plotted as a function ofin Fig. 12(b). When (see Fig. 12(b)),and we have where and isindependent of and . Further,the joint distributionsatisfies .

SCSIB : The rate–distortion function when the side informa-tion is available at both the encoder and decoder is given by

if and if .Note that for a channel noise, and cost , the capacity–cost

function of CCSI is (which for CCSIB is) and for a distortion and correlation noise

, the rate–distortion bound for SCSIis(which for SCSIB is ). Thus, SCSIB andCCSIB are duals in the sense of Theorem 2a but the corre-sponding SCSI and CCSI are not duals in the sense of The-orem 3a.

CCSI : The given SCSI does satisfy ,and we can associate with it a dual CCSI as given by The-

(a)

(b)

Fig. 12. (a) Capacity–cost function for the discrete channel with sideinformation withp = 0:1. Solid line: Capacity ofCCSI . Upper dotted line:Capacity of CCSIB. (b) Rate–distortion function of doubly symmetric binaryrandom variable witha = 0:3. Solid line: Rate of SCSI. Lower dotted line:Rate of SCSIB.

orem 3a with the same joint distributions: with the channel givenby . We consider a numerical example for illustration.

Example 4: Let and and the minimizationin SCSI induces the following distribution:

and

The cost measure (using Theorem 3a) is ,where we choose and . Note thatthe Hamming distortion measure in SCSI correspondsto the Hamming cost measure in the dual CCSI. The

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1197

TABLE ISIDE-INFORMATION-DEPENDENTDISTORTION MEASURE FOREXAMPLE 4

capacity is 0.456 bits/sample, which is equal to the rate of trans-mission for dual SCSI. Further, by following Corollary 1, forSCSI with the same source and side informationwe can find a distortion measure (which results in the samejoint distribution ) such that there is no rate loss.By choosing and in (45), this is given by

, and Table I.

IV. GEOMETRIC TREATMENT OF SOURCECODING WITH SIDE

INFORMATION

In this section, we provide a geometric derivation of theachievable rate–distortion bound for the Gaussian SCSI to giveintuition for practical constructions. Specifically, we considerthe Gaussian SCSI problem with the distortion measure givenby . Note that the typical SCSI problemconsidered in the literature [1] assumes , where

is the side information, isthe source, and is the additive white Gaussiannoise for some . The problem which we are consideringassumes , , , and and

are independent, which is equivalent to the above problemwithin a scaling factor. As a result of SCSI and CCSI beingduals for the cases that we are considering, we will give thegeometric derivation of one of the problems and use duality toobtain the geometric derivation of the other problem.

A. Preliminaries

The rate–distortion function for the case of isgiven by

(50)

where is the distortion constraint.16 We will useupper case, bold-faced letters to denote random vectorsand lower case, bold-faced letters to denote outcomes ofrandom vectors. For an -dimensional vector, , we define

, where denotes theth component ofthe random vector . The distance between two vectors isdefined as . Let us denote an -dimensionalsphere and an -dimensional spherical shellas follows:

(51)

We will follow the approach taken in [30]. Letand . It can then be noted that and

for sufficiently large , tend to (thevariance of ) and , respectively. Letbe the message sphere. Let be the projection of onto

16We will only give the forward part of the proof as the converse is trivial.

. Now if is quantized to the closestrepresentation vector, the expected distortion is given [30] by

(52)

where for large L. Thus, we quantizeinstead of without any loss in performance.

The codebook that we will be discussing is to be gener-ated as follows. First, randomly distribute codewordsindependently and uniformly on the surface of a sphere ofradius where is a real number whichdepends on , , and (the relation between andthe other variables will be discussed in Section IV-C). Callthis codebook a coset and give an index to it. Then, repeatthis process independently times to generate a total of

codewords which are evenly partitioned intocosets. Let . Thus, is

the codeword sphere and there are indexes.

B. Sketch of the Proof

The encoding proceeds only if the outcome of is withinthe spherical shell for some (oth-erwise, an error is declared). Encoding involves finding a code-word from the space of codewords satisfying a distance crite-rion (within distance ) and sending the index of the cosetcontaining the encoded codeword to the decoder. If none ofthe codewords satisfies the criterion, then an error is declared.It is shown geometrically that the probabilities of these errorevents can be made arbitrarily small for largeby giving alower bound on . During the error events, the distortion canbe bounded. The side information is correlated to the encodedcodeword. We quantify this correlation in terms of the distancebetween them. Thus, the decoder, with the help of side infor-mation, obtains the correct codeword. The receiver starts byscaling the side information by the factor and de-coding only if the scaled side information is found within theshell for any (otherwise, an error isdeclared). The first part of the decoder involves finding a code-word satisfying a distance criterion in the coset whose indexis received by the encoder. If either more than one or none ofthe codewords satisfy the distance criterion, then an error is de-clared. It is shown geometrically that the probabilities of theseerror events and the error event that the decoded codeword isthe wrong one, can be made arbitrarily small for largebygiving an upper bound on . The second part of the decoderthen computes the reconstruction vector as a function of the en-coded codeword and the side-information vector.

C. Encoder, Decoder, and Error Events

Let us consider an arbitrary projection vector of the scaledsource , on . For this arbitrary point

, we obtain the minimum expected distortion as a function of

1198 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

the rate of transmission. For any , , and for a blocklength of , define the encoder as follows.

: If

declare an error and send any randomly chosen index to thedecoder. If

then choose a codewordfrom the set of codewordssuch thatas shown in Fig. 13(a). If there is

no such codeword, then declare an error and send any randomlychosen index to the decoder. If there is any such codeword, thensend the coset index of that codeword to the decoder. If there ismore than one codeword, then send the lowest index to the de-coder. When an error is declared by the encoder, the distortionincurred is upper-bounded by twice the variance of. Con-sider the two-dimensional projection of the spheres as shownin Fig. 13(a). is denoted byC. Now, associated with the en-coder described above, define the following error events for any

, , and block length :

Given , define : no suchthat

where represents complementation. The encoded index is re-ceived by the decoder without any error. Thus, the decoder nowreconstructs the random vector based on this index informa-

tion and the outcome of the random vector. Letbe the side-information sphere. We have two stages of decoding.The first part disambiguates with the help of the random vector

, the codeword used by the encoder to represent the randomvector from the knowledge of the coset index informationreceived. Let the random variabledenote this decoded code-word. This codeword information, hence obtained, is passed tothe second stage for further processing. The second part obtainsthe reconstruction as an optimal function of the codeword de-coded by the first stage of decoding and the outcomeof theside information . Now for the given encoder ,define the following first stage of the decoder, , forany and block length of :

: If

declare error and useas the reconstruction for , where isthe all-zero vector. Otherwise, if only one codeword , suchthat , then declare that (seeFig. 13(b)) and pass this codeword to the second part of thedecoder . If none or more than one codeword

such that , then declare an error anduse as the reconstruction for .

The second part of the decoder, , for anyand block length , is described in the latter part of this section.Now, associated with the first stage of the decoder describedabove, define the following error event. The following event isconsidered to occur only when the events associated with the

(a)

(b)

Fig. 13. (a) A two-dimensional projection of the spheres:xxx is shown asC.kACk = kBCk =

p�D andkECk = kFCk = p

�D � �. The searchregion of codewords is the annular region on the codeword sphere bounded bytwo planes as shown. (b) Geometry of decoding: The dotted sphere is given bySSS (

p� Q� �) and�sss is denoted by C.kACk = kBCk = N �D + �

andkDCk = kECk = N �D � �.

encoder and occur. Givenand , define the following error event:

error is declared byor

the decoded codeword is not the correctcodeword used in encoding.

(53)

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1199

Bounding : Using Chebyshev’s inequality, forany , for large .

Bounding : Using the argumentsof [30], for any , , , and (suffi-ciently close to ), such that for any ,

using encoderand

Thus, we have a lower bound on . Clearly, the encoder usesone index for a group of codewords forming a coset. The de-coder disambiguates this using the side information. The code-word depends only on the message. The side informationdepends on which induces a correlation between the randomvectors and . This correlation has to be exploited by the de-coder. This relation is also given in information theory literatureby the name of the Markov lemma [4] (see [4, Lemma 14.8.1]).Let the random vector represent . It is shown in thefollowing theorem that with high probability, will be closeto . Thus, we get an upper bound on the number of code-words that can be distributed on one coset.

Theorem 4: For any (sufficiently close to ),such that for any

using encoder where and are chosen suchthat (this is, a mild technical condition given by (75) inthe Appendix).

Proof: See Appendix.

Bounding the Probability of : Using Theorem 4, and thegeometric arguments of conventional Gaussian channel codingtheorem as given in [31], for any and (sufficientlyclose to ), and a decoder, , such that(implying that ), for any

using encoder , such that

where , , and are chosen appropriately for a given.Now, let us define the second part of the decoder,

. The first part of the decoder,provides the second stage with the codewordand the outcomeof the side information . We consider the expected distortionduring the events and . Now,let be such that its coordinates are given by

as shown in Fig. 14(a) (we are changing the bases for ease of no-tation). Since has occurred,and .

: Project onto the region , which isgiven by

(54)

(a)

(b)

Fig. 14. (a) The geometry when� = 0. OA, OC, and OH represent thecodeword and observed side information and the reconstruction, respectively.A, H, andC are collinear.FD andGE represent two spheres as given in (56).His the center of the region given by (56). (b) Plane passing throughF, H, andD. UV denotes the intersection of two(L� 1)-dimensional spheres as given in(56). ForL = 3, the intersection given in (56) is the set of two pointsU andV.

Let the random vector be the projection of onto .The reconstruction for is given by

(55)

and we have which agrees withthe reconstruction value given in Section III-C3. Note that the

1200 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

Fig. 15. Geometry of decoding for� > 0 corresponding to Fig. 14(a) and(b): Let OA be the codeword vector. The dotted spheres represent the sphericalshells around the message sphere and the side-information sphere. The annularregion on the message sphere subtended betweenFD andfd is the set of allxxxwhich might have usedOAas the codeword.JKLMdenotes the two-dimensionalprojection of the region given by (58) The annular region on the message spherebetweenRPandNQ is the set of allxxx given thatsss 2 JKLM .

reconstruction vector for is given by . We now describethe rationale behind choosing the decoder as given above. Con-sider the geometry of the imaginary case when as shownin Fig. 14. and are denoted byOA andOC, respectively.Consider the following intersection of two -dimensionalspheres in the -dimensional space:

(56)

FD andGE represent these two spheres, respectively. The in-tersection of these two spheres is shown in Fig. 14(b) asU andV. As an aid for understanding, when , the intersectionof these spheres (circles for ) is the set of two pointsgiven byU andV. This is the most probable region in which

might lie given that and . This intersec-tion is an -dimensional sphere. We take the center ofthis sphere (denoted byH in Fig. 14) as the optimal represen-tation for . The optimal representation of will thenbe . The distortion performance is given by thefollowing lemma.

Lemma 3: For the case when , the reconstructionvector is given byOH OA OC and the optimaldistortion (denoted in Fig. 14(b) byHU HV ) isgiven by

(57)

which implies that the optimal distortion betweenand is.

Proof: OmittedFor any , it can be shown that the distortion

, where is a continuous function of (forsufficiently close to zero) and as illustrated in

Fig. 15.JKLM shows the two-dimensional projection of the re-gion given by

(58)

Now given , belongs to the region given by (58).Given that OA and and , be-

longs to the region on bounded between

the planes given byFD and fd ( AF AD ,Af Ad ) and this region is denoted by

(59)

Now given that JKLM and , belongs to the

region on , shown in Fig. 15 as a regionbounded between two planesRPandNQ, and is given by

for any (60)

The intersection of the two regions given by (59), (60) clearlycontains the pointsU and V corresponding to the case when

. Thus, we have for any and

(61)

where . The expected distortion for anyand for any , , is given by

which can be made arbitrarily closeto for large enough.

V. GEOMETRIC TREATMENT OF CHANNEL CODING WITH

SIDE INFORMATION

In this section, we obtain the channel capacity of CCSIusing geometric arguments. The geometric arguments closelyresemble those used to derive the rate–distortion bounds forSCSI because of duality. We only briefly outline this, whilehighlighting its duality to SCSI. The channel model17 is givenby , with , i.e.,

for some positive real , the side information is given by. There is a power constraint on the encoded message

17As pointed out in Section III-C-3, for the SCSI considered in Section IV,the dual CCSI corresponds to the watermarking channel withX = X +Z and!(x; s) = (x�s) . But in this section, we give the geometric treatment of thechannel considered by Costa [20] due to its intuitive appeal. These two CCSIhave very similar structure in their joint distribution.

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1201

and . For CCSI, thecapacity–cost function is

(62)

Sketch of Proof:Both the encoder and decoder have accessto a codebook which is generated as follows. Randomlydistribute codewords on the surface of an-dimensionalsphere of radius , where asbefore. Call this coset . Repeat this process times togenerate a total of codewords which are evenlypartitioned into cosets.

The encoder wants to send bits per symbol across thechannel. These bits will specify the index of the coset containingthe codewords to which the side informationcan be encoded.Encoding proceeds only if the outcome of is within thespherical shell for some (otherwise, anerror is declared). If the outcome of is within the sphericalshell , then search the coset (containingcodewords on the sphere ) with theindex given by the message, for a codevectorsuch that thedistance between and is close to . If none of thecodewords satisfies the criterion, then an error is declared. Itcan be shown geometrically that the probabilities of these errorevents can be made arbitrarily small for largeby obtaining alower bound on (given by ) im-plying that the probability of finding at least one codeword forencoding tends to. This bound can be approached arbitrarilyclosely. This is identical to the arguments used in the decoderfor SCSI (see Fig. 13(b)). The encoder of CCSI then sends alinear combination of the found codeword , andthe side information across the channel which correspondsexactly to the output of the second part of the decoder for SCSI.

For this input and side information, the channel producesan output at the decoder. The decoder finds a codewordfrom the set of codewords, such that the distancebetween and is close to , and determines the indexof the coset to which this codeword belongs. The index rep-resents the decoded message. If either more than one or noneof the codewords satisfies the distance criterion, then an erroris declared. It can be shown geometrically that the probabili-ties of these error events and the error event that the decodedcodeword is the wrong one, can be made arbitrarily small forlarge , by obtaining an upper bound on (given by

). This bound can be approachedarbitrarily closely. This we have an upper bound onand thiscan be approached arbitrarily closely.

VI. CONCLUSION

We have considered a mathematical formulation of dualitybetween source and channel coding with side information. Ournotion of duality is in the functional sense, i.e., where the en-coder–decoder functional mappings can be exactly swapped inthe dual problems. We have, as a necessary first step, first ad-dressed the well-known duality between conventional sourceand channel coding and then extended this notion to the moregeneral case of coding with side information. We have givenseveral examples for illustration. By using this concept we have

obtained a generalization of the Wyner–Ziv result relating to norate loss in source coding with side information (with respect tothe side information being available at both ends) from Gaussianto arbitrary distributions. A key observation that follows fromour duality treatment is that contrary to popular opinion, du-ality does not equate to no rate loss in coding with side infor-mation. This duality has been further illustrated using a geo-metric treatment of the direct coding theorems of these twoproblems. The main motivation for understanding the dualitybetween these two important problems is to enable efficient con-structions of the encoder and the decoder based on the frame-work of structured coset codes. Our goal is to inspire dual con-structions for these two problems that can leverage this func-tional encoder–decoder duality, and allow progress made in onefield to apply naturally to the other as well. This would havean impact on a wide range of emerging applications includingsensor networks, data hiding, as well as broadcast and multi-antenna communication systems. We note that the functionalduality characterized through distributional relationships can befurther extended to involve duality in codebooks similar to thatconsidered in [34].

APPENDIX

Proof of Theorem 3a:The outcome of the given sourcecoding problem is and such that

and and ,which defines a joint distribution . Consider thechannel (derived from the joint distribution) withinput and output , and side information present at boththe encoder and the decoder (CCSIB). For this channel, giventhe input distribution (derived from the above jointdistribution), a cost measure (using Lemma 1a) given by

such that

(63)

where . Now, clearly, the joint dis-tribution (obtained from the above )can be represented in the form such that

and and , thecondition required for no rate loss. Thus, the capacity of thischannel coding problem is the same even if the side informationis present only at the encoder. Hence, andderived from the above joint distribution maximize

under the constraint ,, , , and

.

Markov Constraint in Section III-C3 [21], [22]:Using theinduced joint distributions, we note that

Now

1202 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003

Since is independent of and , we cansee that

implying that

Thus, using this and the Markov condition in the definition ofthe problem , we have

which implies .

Proof of Lemma 2:Clearly, is a monotone nonde-creasing function of . For , it is a concavefunction of . First we prove that . Let

where and are independent, , andwhich implies that . Now

(64)Thus for . It can be notedthat . Let be given, and and aregiven such that , , and . Thus,

(65)We maximize the right-hand side to get the tightest lower bound.

Now we prove that . We use techniquessimilar to those given in [1]. Let be the random variablesand be given by for some function

, such that their joint distribution satisfies the constraint:. Let . Thus, we

have . Let the set be defined as:. Now let

for

Now, for , if and , then, and if and then .

Thus, for

and . Now consider . Let besuch that and . If

then , and ifthen . Thus, . Since

, we have

(66)

(67)

(68)

where and andfor . Since for

and , we have

(69)

where . Clearly, ,which implies that as . We have

(70)

Thus, we have shown that

such that and . Since is mono-tone in , we have proved that .

Proof of Theorem 4:Consider a setup such that the projec-tion of the observed source vector is as shown in Fig. 13(a).The events in this proof assume that andhave occurred. Let the-dimensional basis be such that isgiven by , as denoted byin Fig. 13(a). The encoder finds a codeword suchthat it meets the distance constraint with respect to this. Itshould be noted that even though we are quantizing, the re-lation between and is through . Now given and

, the codeword used by the encoder can be givenas

and

for

(71)

where , , forand are independent, is an independent

random variable which is a perturbation in the coordinates ofdue to the nature of the encoding process and it is given by

and

where and are continuous functions of (forsufficiently close to ) as shown in Fig. 13(a) and

. Given , and the events and, the encoded codeword will lie between the two

planes given byAB andEF. Thus, the probability of observingthe codeword in this region is simulated by the generation of[32] random variables , , and for . Thisregion is symmetric about the bases . The side in-formation vector is given by , where

are samples of the random variable,and is a random variable such that . Sinceis the projection of the source vector on the message sphere,

PRADHAN et al.: DUALITY BETWEEN SOURCE CODING AND CHANNEL CODING WITH SIDE INFORMATION 1203

the random variable is introduced. Let . Nowconsider

(72)where

(73)

(74)

For concise notation, we will omit the argumentsand of the error events. Now and

for large . Now when ,, and is a continuous function of its arguments.

Clearly, where

(75)

Thus, can be chosen such that for any . Usingthe law of large numbers on and , we have

for any .

ACKNOWLEDGMENT

The authors would like to thank D. Tse, P. Viswanath, and M.Gastpar for inspiring discussions and comments regarding thepaper. They also wish to thank the anonymous reviewers, andthe Associate Editor Philip Chou for critical comments on anearlier version of this manuscript that led to a much improvedversion.

REFERENCES

[1] A. D. Wyner and J. Ziv, “The rate–distortion function for source codingwith side information at the decoder,”IEEE Trans. Inform. Theory, vol.IT-22, pp. 1–10, Jan. 1976.

[2] D. P. Bertsekas,Nonlinear Programming. Belmont, MA: Athena Sci-entific, 1999.

[3] C. E. Shannon, “Coding theorems for a discrete source with a fidelitycriterion,” in IRE Nat. Conv. Rec., Pt. 4, 1959, pp. 142–163.

[4] T. M. Cover and J. A. Thomas,Elements of Information Theory. NewYork: Wiley, 1991.

[5] R. E. Blahut, “Computation of channel capacity and rate-distortion func-tions,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 460–473, July 1972.

[6] M. W. Marcellin and T. R. Fischer, “Trellis coded quantization of memo-ryless and Gauss–Markov sources,”IEEE Trans. Commun., vol. 38, pp.82–93, Jan. 1990.

[7] M. V. Eyuboglu and G. D. Forney, “Lattice and trellis quantization withlattice and trellis-bounded codebooks—High rate theory for memorylesssources,”IEEE Trans. Inform. Theory, vol. 39, pp. 46–59, Jan. 1993.

[8] G. Ungerboeck, “Channel coding with multilevel/phase signals,”IEEETrans. Inform. Theory, vol. IT-28, pp. 55–67, Jan. 1982.

[9] R. Laroia, N. Farvardin, and S. A. Tretter, “On optimal shaping of mul-tidimensional constellations,”IEEE Trans. Inform. Theory, vol. 40, pp.1044–1056, July 1994.

[10] R. Laroia and N. Farvardin, “A structured fixed-rate vector quantizerderived from a variable length scalar quantizer—Part I: Memorylesssources,”IEEE Trans. Inform. Theory, vol. 39, pp. 851–867, May 1993.

[11] S. Gel’fand and M. Pinsker, “Coding for channel with random parame-ters,”Probl. Contr. Inform. Theory, vol. 9, pp. 19–31, 1980.

[12] M. Gastpar, B. Rimoldi, and M. Vetterli, “To code or not to code,” inProc. IEEE Int. Symp. Information Theory (ISIT’00), Sorrento, Italy,June 2000.

[13] , “To code or not to code: Lossy source–channel communicationrevisited,” IEEE Trans. Inform. Theory, vol. 49, pp. 1147–1158, May2003.

[14] T. Berger, “Multiterminal source coding,” inThe Information TheoryApproach to Communication (CISM Courses and Lecture Notes), G.Longo, Ed. New York: Springer-Verlag, 1977, vol. 229.

[15] S. Shamai (Shitz), S. Verdú, and R. Zamir, “Systematic lossysource/channel coding,”IEEE Trans. Inform. Theory, vol. 44, pp.564–579, Mar. 1998.

[16] J. A. O’Sullivan, P. Moulin, and J. M. Ettinger, “Information theoreticanalysis of steganography,” inProc. IEEE Int. Symp. Information Theory(ISIT’98), Cambridge, MA, Aug. 1998, p. 297.

[17] B. Chen and G. W. Wornell, “An information-theoretic approach tothe design of robust digital watermarking systems,” inProc. Int. Conf.Acoustics, Speech, and Signal Processing (ICASSP), Phoenix, AZ, Mar.1999.

[18] K. Marton, “A coding theorem for the discrete memoryless broadcastchannel,”IEEE Trans. Inform. Theory, vol. IT-25, pp. 306–311, May1979.

[19] M. Tomlinson, “New automatic equalizer employing modulo arith-metic,” Electron. Lett., vol. 7, pp. 138–139, Mar. 1971.

[20] M. Costa, “Writing on dirty paper,”IEEE Trans. Inform. Theory, vol.IT-29, pp. 439–441, May 1983.

[21] A. Cohen and A. Lapidoth, “The Gaussian watermarking game,”IEEETrans. Inform. Theory, vol. 48, pp. 1639–1667, June 2002.

[22] U. Erez, S. Shamai, and R. Zamir, “Capacity and lattice-strategies forcancelling known interference,” inProc. ISITA 2000, Honolulu, HI, Nov.2000.

[23] J. Chou, S. S. Pradhan, and K. Ramchandran, “On the duality betweendistributed source coding and data hiding,” inProc. 33rd Asilomar Conf.Signals, Systems and Computers, Pacific Grove, CA, Nov. 1999.

[24] C. Heegard and A. El Gamal, “On the capacity of computer memory withdefects,”IEEE Trans. Inform. Theory, vol. IT-29, pp. 731–739, Sept.1983.

[25] M. Kesal, M. K. Mihcak, R. Kotter, and P. Moulin, “Iteratively decodablecodes for watermarking applications,” inProc. 2nd Int. Symp. TurboCodes and Related Topics, Brest, France, Sept. 2000.

[26] R. Zamir and S. Shamai, “Nested linear/lattice codes for Wyner–Ziv en-coding,” in Proc. IEEE Information Theory Workshop, Killarney, Ire-land, 1998.

[27] S. S. Pradhan and K. Ramchandran, “Distributed source coding usingsyndromes (DISCUS): Design and construction,” inProc. IEEE DataCompression Conf., Snowbird, UT, Mar. 1999.

[28] R. M. Gray, “A new class of lower bounds to information rates of sta-tionary sources via conditional rate-distortion functions,”IEEE Trans.Inform. Theory, vol. IT-19, pp. 480–489, July 1973.

[29] M. Chiang and T. M. Cover, “Unified duality between channel capacityand rate distortion with side information,” inProc. Int. Symp. Informa-tion Theory (ISIT’01), Washington, DC, June 2001, p. 301.

[30] D. J. Sakrison, “A geometric treatment of the source encoding of aGaussian random variable,”IEEE Trans. Inform. Theory, vol. IT-14,pp. 481–486, May 1968.

[31] J. M. Wozencraft and I. M. Jacobs,Principles of Communication Engi-neering. New York: Wiley, 1967.

[32] R. Durrett, Probability Theory and Examples. Pacific Grove, CA:Wadsworh and Brooks, 1996.

[33] S. S. Pradhan and K. Ramchandran, “Distributed source coding: Sym-metric rates and applications to sensor networks,” inProc. IEEE DataCompression Conf., Snowbird, UT, Mar. 2000.

[34] I. Csiszár and J. Körner,Information Theory: Coding Theorems for Dis-crete Memoryless Systems. New York: Academi, 1981.

[35] P. Moulin and J. A. O’Sullivan, “Information-theoretic analysis of infor-mation hiding,” inProc. IEEE Int. Symp. Information Theory (ISIT’00),Sorrento, Italy, June 2000.