Error-resilient first-order multiplexed source codes: performance bounds, design and decoding...

11
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 4, APRIL 2006 1483 Error-Resilient First-Order Multiplexed Source Codes: Performance Bounds, Design and Decoding Algorithms Hervé Jégou, Associate Member, IEEE, and Christine Guillemot, Senior Member, IEEE Abstract—This paper describes a new family of error-resilient variable-length source codes (VLCs). The codes introduced can be regarded as generalizations of the multiplexed codes described by Jégou and Guillemot. They allow to exploit first-order source statistics while, at the same time, being resilient to transmission errors. The design principle consists of creating a codebook of fixed-length codewords (FLCs) for high-priority information and in using the inherent codebook redundancy to describe low-priority information. The FLC codebook is partitioned into equivalence classes according to the conditional probabilities of the high-priority source. The error propagation phenomenon, which may result from the memory of the source coder, is con- trolled by choosing appropriate codebook partitions and index assignment strategies. In particular, a context-dependent index assignment strategy, called crossed-index assignment, is described. For the symbol error rate criterion, the approach turns out to maximize the cardinality of a set of codewords, called the code kernel, offering synchronization properties. The decoder resynchronization capability is shown to be further increased by periodic use of memoryless multiplexed codes. Theoretical and practical performances in terms of compression efficiency and error resilience are analyzed. Index Terms—Data communication, data compression, entropy codes, source coding, variable-length codes, wireless communica- tion. I. INTRODUCTION E NTROPY coding, making use of variable-length codes (VLCs), is a core component of any data compression scheme. However, the main drawback of VLCs is their high sensitivity to channel noise: When some bits are altered by the channel, synchronization losses can occur at the receiver, and the position of symbol boundaries are not properly esti- mated, leading to dramatic symbol error rates. This observation has motivated studies of the synchronization ability of VLCs [3] as well as the design of codes with better synchroniza- tion properties such as self-synchronizing Huffman codes [4], [5] or reversible variable-length codes (RVLCs) [6]–[8]. The increased resynchronization capability is, however, often obtained at the expense of redundancy, hence of decreased compression efficiency. maximum a posteriori (MAP) or minimum mean-square error (MMSE) estimators, capitalizing on coders suboptimality by exploiting residual redundancy [9] Manuscript received October 7, 2003; revised March 12, 2005. The associate editor coordinating the review of this manuscript and approving it for publica- tion was Prof. Sheila A. Hemami. The authors are with the IRISA/INRIA, University of Rennes I, 35042 Rennes Cedex, France (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TSP.2006.870609 (the so-called “excess rate”) have also been shown to reduce the “desynchronization” effect [8], [10]–[12], however, at the expense of high decoding complexity. The decoder desynchronization problem comes from the dif- ficulty to properly segment the noisy bit stream into symbols. This segmentation problem can be addressed by introducing a priori information in the bit stream, taking often the form of synchronization patterns, or of forbidden symbols (e.g., [13]). Again, however, the improved resynchronization capability is obtained at the expense of loss in compression efficiency. The authors in [14] describe a bit stream structure called error-re- silient entropy codes (ERECs), which allow us to reduce the error propagation phenomenon. A family of codes, called “mul- tiplexed codes,” making use of the fact that compression sys- tems of real signals generate sources of information with dif- ferent levels of priority (e.g., texture and motion information for a video signal), are introduced in [1]. These codes are such that the risk of “desynchronization” is confined to low-priority infor- mation. The idea consists of creating a fixed-length code (FLCs) for the high-priority source and in exploiting the inherent re- dundancy to represent information of the low-priority source. Hence, the high-priority source inherits some of the properties of FLCs such as synchronous decoding, random access in the data stream, and high error resilience. These codes naturally offer some form of unequal error protection (UEP), otherwise often supported via the use of error correcting codes with dif- ferent rates (e.g., [15]). Here, the property of error resilience is achieved at a very low cost in terms of redundancy. This redun- dancy corresponds to the so-called “excess rate” that, depending on the parameters chosen, may result from some compression suboptimality of the code. This redundancy is thus given by the difference between the actual high-priority source description length and its stationary entropy. Note that this “excess rate” be- comes asymptotically very small as the length of the FLC code- words increases. However, the codes proposed in [1] only allow us to exploit stationary statistical distributions for the high-pri- ority source. In the sequel, we will refer to this family of codes as memoryless multiplexed codes. In this paper, we address the problem of design of a family of multiplexed codes exploiting first-order statistics for the high-priority source, in order to achieve higher compression performances. We focus on first-order statistical models. Similar to memoryless multiplexed codes, a set of FLCs is first created for the high-priority source . The inherent redundancy of this code is then exploited to represent information of the low-priority source , which is assumed to be pre-encoded with a classical entropy code [e.g., arithmetic codes (ACs)]. 1053-587X/$20.00 © 2006 IEEE

Transcript of Error-resilient first-order multiplexed source codes: performance bounds, design and decoding...

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 4, APRIL 2006 1483

Error-Resilient First-Order MultiplexedSource Codes: Performance Bounds,

Design and Decoding AlgorithmsHervé Jégou, Associate Member, IEEE, and Christine Guillemot, Senior Member, IEEE

Abstract—This paper describes a new family of error-resilientvariable-length source codes (VLCs). The codes introduced canbe regarded as generalizations of the multiplexed codes describedby Jégou and Guillemot. They allow to exploit first-order sourcestatistics while, at the same time, being resilient to transmissionerrors. The design principle consists of creating a codebook offixed-length codewords (FLCs) for high-priority informationand in using the inherent codebook redundancy to describelow-priority information. The FLC codebook is partitioned intoequivalence classes according to the conditional probabilities ofthe high-priority source. The error propagation phenomenon,which may result from the memory of the source coder, is con-trolled by choosing appropriate codebook partitions and indexassignment strategies. In particular, a context-dependent indexassignment strategy, called crossed-index assignment, is described.For the symbol error rate criterion, the approach turns outto maximize the cardinality of a set of codewords, called thecode kernel, offering synchronization properties. The decoderresynchronization capability is shown to be further increased byperiodic use of memoryless multiplexed codes. Theoretical andpractical performances in terms of compression efficiency anderror resilience are analyzed.

Index Terms—Data communication, data compression, entropycodes, source coding, variable-length codes, wireless communica-tion.

I. INTRODUCTION

ENTROPY coding, making use of variable-length codes(VLCs), is a core component of any data compression

scheme. However, the main drawback of VLCs is their highsensitivity to channel noise: When some bits are altered bythe channel, synchronization losses can occur at the receiver,and the position of symbol boundaries are not properly esti-mated, leading to dramatic symbol error rates. This observationhas motivated studies of the synchronization ability of VLCs[3] as well as the design of codes with better synchroniza-tion properties such as self-synchronizing Huffman codes[4], [5] or reversible variable-length codes (RVLCs) [6]–[8].The increased resynchronization capability is, however, oftenobtained at the expense of redundancy, hence of decreasedcompression efficiency. maximum a posteriori (MAP) orminimum mean-square error (MMSE) estimators, capitalizingon coders suboptimality by exploiting residual redundancy [9]

Manuscript received October 7, 2003; revised March 12, 2005. The associateeditor coordinating the review of this manuscript and approving it for publica-tion was Prof. Sheila A. Hemami.

The authors are with the IRISA/INRIA, University of Rennes I, 35042 RennesCedex, France (e-mail: [email protected]; [email protected]).

Digital Object Identifier 10.1109/TSP.2006.870609

(the so-called “excess rate”) have also been shown to reducethe “desynchronization” effect [8], [10]–[12], however, at theexpense of high decoding complexity.

The decoder desynchronization problem comes from the dif-ficulty to properly segment the noisy bit stream into symbols.This segmentation problem can be addressed by introducing apriori information in the bit stream, taking often the form ofsynchronization patterns, or of forbidden symbols (e.g., [13]).Again, however, the improved resynchronization capability isobtained at the expense of loss in compression efficiency. Theauthors in [14] describe a bit stream structure called error-re-silient entropy codes (ERECs), which allow us to reduce theerror propagation phenomenon. A family of codes, called “mul-tiplexed codes,” making use of the fact that compression sys-tems of real signals generate sources of information with dif-ferent levels of priority (e.g., texture and motion information fora video signal), are introduced in [1]. These codes are such thatthe risk of “desynchronization” is confined to low-priority infor-mation. The idea consists of creating a fixed-length code (FLCs)for the high-priority source and in exploiting the inherent re-dundancy to represent information of the low-priority source.Hence, the high-priority source inherits some of the propertiesof FLCs such as synchronous decoding, random access in thedata stream, and high error resilience. These codes naturallyoffer some form of unequal error protection (UEP), otherwiseoften supported via the use of error correcting codes with dif-ferent rates (e.g., [15]). Here, the property of error resilience isachieved at a very low cost in terms of redundancy. This redun-dancy corresponds to the so-called “excess rate” that, dependingon the parameters chosen, may result from some compressionsuboptimality of the code. This redundancy is thus given by thedifference between the actual high-priority source descriptionlength and its stationary entropy. Note that this “excess rate” be-comes asymptotically very small as the length of the FLC code-words increases. However, the codes proposed in [1] only allowus to exploit stationary statistical distributions for the high-pri-ority source. In the sequel, we will refer to this family of codesas memoryless multiplexed codes.

In this paper, we address the problem of design of a familyof multiplexed codes exploiting first-order statistics for thehigh-priority source, in order to achieve higher compressionperformances. We focus on first-order statistical models. Similarto memoryless multiplexed codes, a set of FLCs is first createdfor the high-priority source . The inherent redundancyof this code is then exploited to represent information of thelow-priority source , which is assumed to be pre-encodedwith a classical entropy code [e.g., arithmetic codes (ACs)].

1053-587X/$20.00 © 2006 IEEE

1484 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 4, APRIL 2006

The set of FLCs is then partitioned into equivalence classesaccording to conditional probabilities, in order to have amean description length (mdl) approaching conditional entropybounds. Theoretical performance upper bounds in terms ofsymbol error rate (SER) and mean-square error (MSE) arethen derived for discrete memoryless channels. These boundsprovide the SER and MSE performance expectations withhard decision decoding. They can be outperformed when softdecision decoding (e.g., making using of maximum-likelihood(ML) or maximum a posteriori estimators) is used instead. Asfor vector quantizers [16], [17], it is shown that the efficiencydepends on the codeword index assignment strategies, i.e., onthe construction of the equivalence classes and on the indexassignment used in the different classes. Index assignment (IA)strategies aiming at controlling error propagation resultingfrom erroneous contexts as well as the symbols reconstructionerror are described. Note that here the term context refers toconditioning symbol values. For the SER criterion, a crossed-IAstrategy controlling the error propagation turns out to maximizethe cardinality of a set of codewords, called the code kernel,offering synchronization properties. The multiplexed codescan be decoded at low computational cost by using either hardof soft decision decoding techniques. The complexity of softdecoding of the high-priority source is linear withrespect to the length of the sequence of symbols. In contrast,with classical VLCs (e.g., Huffman or arithmetic), the size ofthe treillis is in [18]. The decoder resynchronizationcapability can be further increased by a periodic use of memo-ryless multiplexed codes to code some symbols of the sequence.An analysis of the tradeoff between compression efficiency andresynchronization capability is given. Theoretical and practicalperformances both in terms of error resilience and compressionefficiency are discussed for different index assignment strate-gies and for different decoding techniques. Both hard and softdecision decoding relying on trellis-based decoding techniques(using e.g., the Bahl–Cocke–Jelinek–Raviv (BCJR) algorithm[19]), and considering different estimation criteria [i.e., MAP,maximum of posterior marginals (MPM), and MMSE] havebeen used. In the sequel, for sake of conciseness, soft deci-sion decoding techniques will be referred to as soft decodingtechniques. The SER and signal-to-noise ratio (SNR) perfor-mances are evaluated for a binary symmetric channel (BSC)and for an additive white Gaussian noise channel (AWGN).They are compared against state of the art source codes (ACs).When used in a tandem source-channel coding structure witha 1/2-rate systematic convolutional code, a coding gain up to0.75 and 1.5 dB has been achieved with respect to the structurebased on ACs, for sequence lengths of 100 and 1000 symbols,respectively.

II. PROBLEM STATEMENT AND NOTATIONS

In the following, we reserve capital letters to random vari-ables, and small letters to values of these variables. Bold facecharacters will be used to denote vectors or sequences. Let

be a sequence of source sym-bols of high priority taking their values in a finite alphabet

composed of symbols, . Thesource is assumed to be a first-order Markov process.The stationary and conditional probabilities of the source

are respectively denoted and, where stands for the proba-

bility and stands for the conditional probability. Let be a sequence of source

symbols of lower priority. This sequence is assumed to bepre-encoded with a classical VLC (e.g., Huffman or arithmeticcodes). We assume that the bit stream produced by the encoderof the low-priority source is a uniformly distributed binarysource. This assumption is closely verified if the encoder of thelow-priority is optimal or close to optimal. Note that the pre-en-coding of can be made conditionally to the realization of

. Emitted and received codewords are respectively denotedby the random variables and . At a given time instant ofthe sequence of high-priority symbols, the variable takes arealization where is a multiplexed codeword as de-fined in the sequel. The channel is assumed to follow a discretememoryless channel (DMC) model with transition probabilities

denoted . The reconstructed high-prioritysequence is denoted .

III. FIRST-ORDER MULTIPLEXED CODES

A. Memoryless Multiplexed Codes: A Review

Let us denote the set of binary codewords of length .A memoryless multiplexed code is constructed by partitioningthe codeword space into subsets of cardinal

, called equivalence classes. Note that . Eachequivalence class is associated to a symbol of the alphabet

. A codeword is associated to a symbol of the al-phabet and to an index value with . Thus,a multiplexed code is defined by the bijection

(1)

Hence, each fixed-length codeword represents jointly asymbol of the high-priority source and an index value .Since the value is the realization of a -valued variable, theamount of data described by this index value is bits.The expectation of this variable gives the multiplexing capacityper symbol of the high-priority sequence. Assuming that thiscapacity is fully used to describe data of the low-priority source,the description length of the symbol is given bybits. The mdl of the high-priority source is hence given by

. This quantity is equal to the sourcestationary entropy if , , i.e., if

.For example, let be the alphabet of the

source with the stationary probabilities given by ,, , and . Table I gives an example

of partitioning of the set into the four equivalence classesassociated to the alphabet symbols.

The encoding of the two sources proceeds as follows. Therealization of the sequence of high-priority symbols is asequence of symbol values taken from the source alphabet.The low-priority bit stream is converted into a sequence of indexvalues . The sequence of pairs provides entries in the

JÉGOU AND GUILLEMOT: ERROR-RESILIENT FIRST-ORDER MULTIPLEXED SOURCE CODES 1485

TABLE IEXAMPLE OF MEMORYLESS MULTIPLEXED CODES (c = 3)

multiplexed codes table from which the codewords are ex-tracted. In [2], two methods have been introduced for the con-version of the low-priority source.

1) The first approach regards the low-priority bit stream asan integer which is then decomposed using a hierarchicaldecomposition. The conversion of this integer into a se-quence of states, by itself, has a negligible impact on thecompression efficiency (less than 1 bit) [2]. This approachis optimal in the sense that it allows us to have a partitionof the FLC codebook that follows closely the distributionof the source .

2) A second approach relies on a constrained partition of theFLCs into equivalence classes with cardinals belongingto a subset of integer values. This approach still leads toa good approximation of the high-priority source distri-bution, hence is quasi-optimal.

In the following subsection, any of these two methods applies.We will then only focus on the design of the partition that willdepend on first-order statistics of the source .

B. First-Order Multiplexed Codes: Definition

The memoryless multiplexed codes above are constructed forthe stationary distribution of the high-priority source . How-ever, it is well known that, for Markov sources, higher compres-sion efficiency can be obtained by adapting the codes to higher-order source statistics. We focus here on first-order statistics;however, the approach can be easily extended to higher-orders.

First-order multiplexed codes can thus be constructed for theconditional probability distribution ofthe high-priority source. The partition of the set of fixed-lengthcodewords then becomes conditioned by the realization ofthe previous symbol in the sequence to be encoded.Let and respectively denote the equivalence class as-sociated to the symbol and its cardinal when the previoussymbol realization is . The set of codes, for all con-ditional symbol values from , defines a so-called first-ordermultiplexed code, referred to as in the sequel. The mdl of thecode (for the high-priority source ) is given by

mdl

(2)

(3)

TABLE IITHEORETICAL AND SIMULATED MEAN DESCRIPTION LENGTH FOR

MULTIPLEXED CODES AND ARITHMETIC CODES (GAUSS–MARKOV

SOURCE WITH CORRELATION � = 0:5) FOR K = 10, 100,1000 AND FOR c = 5,6

The mdl of the code will be equal to the first-order entropyof the source if and only if

(4)

Note that the higher compression efficiency may not result ina shorter sequence for , as in classical VLCs, but rather ina higher multiplexing capacity to be used for describing thelow-priority source . This mdl corresponds to the asymp-totic compression performances as the sequence length ofthe high-priority source increases to infinity. In practice, thesequence length has also an impact on the compression perfor-mance. First, the first symbol of the sequence has to be encodedwith the marginal probability mass function (PMF). Second, theconversion of the low-priority source into a sequence of statesleads to a penalty lying between 0 and 1 bit [2]. Table II givesmdl values obtained by simulation with different sequencelengths in comparison with the corresponding theoretic mdlvalues.

The conditioning symbol values are often called contexts insource compression. In practice, when the size of the alphabet

is large, the number of contexts may be too high leading toa problem of context dilution. This number can be reduced byconsidering different conditioning values as the same context.The context can then be defined by a function of the previoussymbol realization, taking its value in the set of contexts, de-noted :

(5)

Together with the realization of the symbol to be en-coded, this context identifies the equivalence class to be used,denoted . If is a constant function, the multiplexed codeis a memoryless multiplexed code. In that case, the code is re-ferred to as . The equivalence classes of code and thecorresponding cardinals are respectively denoted and ,

.

IV. ERROR-RESILIENCE ANALYSIS ON A DMC

This section analyzes the impact of the use of first-ordersource statistics on the error resilience of the code, expressedboth in terms of SER and of MSE obtained for the high-pri-ority source . Let us consider first-order multiplexed codesdesigned for a Markov source of transition probabilities

1486 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 4, APRIL 2006

. Note first that with first-order multiplexed codes, as withmemoryless multiplexed codes, the high-priority source doesnot suffer from desynchronization problem: The segmentationof the bit stream into high-priority source symbols is determin-istic. However, erroneous contexts (conditioning values) maylead to error propagation in the decoding of the symbol values.

A. Modeling the Dependencies in the Coding andTransmission Chain

In order to capture the error propagation phenomenon, theSER and MSE performance bounds must be expressed in termsof the probability distribution , or equiva-lently (with the appropriate renormalization factor) in terms of

. The analysis hence requires to considerthe global transmission chain formed by the source, the multi-plexed source coder, the transmission channel and the decoder.The coding, transmission and decoding chain can be modeledby a process represented by the pair of variables .The process , is a Markov chain. Itstransition probability is given by

.... . .

... (6)

where

(7)

(8)

(9)

It is shown in Appendix I that the transition probabilityis actually given by

(10)

where represents the channel model, i.e., the proba-bility of receiving the codeword if the codeword has beenemitted. The entity represents the conditional probability ofthe high-priority source, and the corresponding equiva-lence class of cardinal .

B. Asymptotic SER and MSE Bounds

For a sequence of infinite length, the Markov processconverges toward a stationary state of probability

. The corresponding asymptotic SER andMSE performance bounds are thus functions of the transitionprobability distribution , or equivalently of the jointprobability . The joint probability distribution

can be regarded as the stationary probability of theMarkov process defined by the pair of variables .Assuming that is irreductible and aperiodic, the stationaryprobability is given by the solution of the equation

(see, e.g., [20]). In other terms, this stationary distributionis obtained by processing the normalized eigenvectorassociated to the eigenvalue 1 of the transition matrix

. In the following, this eigenvector is denoted.

This PMF verifies the relation . The asymptoticvalue of the SER can be expressed in terms of the summationof probabilities over product states , i.e.,

SER (11)

(12)

where the vector denotes a vector of such thatif , and otherwise. Simi-

larly, the asymptotic performance of the MSE is given by

MSE (13)

where the distortion vector is defined such that. These bounds have been verified experimentally: as

the number of channel realizations increases, the experimentalSER and MSE values asymptotically converge toward the theo-retical bounds given in (12) and (13).

C. Impact of Index Assignment on Error Resilience

The assignment of multiplexed codewords to the differentequivalence classes for all the possible contexts (i.e., finding thepartition) is referred to in the sequel as IA. In the context ofvector quantization, it has been shown that error resilience canbe improved by modifying the binary labeling of codewords.To improve an existing IA, the binary switching algorithm [16]allows us to find a local optimum. Simulated annealing [17],[21] aims at finding a global optimum, by avoiding local op-timum in a statistical manner. It generally provides better re-sults than the binary switching algorithm. It has also been shownthat channel-optimized vector quantization [17] can improvethe overall rate-distortion efficiency. By overall, we mean thatthe distortion includes both the source quantization noise andthe channel noise. but in that case the designed quantizer maynot be optimum for noiseless channels. In this paper, the code-books are assumed to be optimum, in terms of compression, i.e.,they approximatively verify (4). Consequently, the analysis isrestricted to the IA of binary codewords. The IA is observed tohave a major impact on the multiplexed codes error resilience,as shown in Fig. 2. The optimal IA depends on channel andsource properties. However, it may also depend on the perfor-mance measure used (e.g., SER or SNR). Fig. 2 shows that anIA based on Gray codes has better SER performance than IAwith lexicographic codes. However, the lexicographic IA out-performs the Gray code when the SNR performance measure isused.

V. CODE DESIGN: CROSSED-IA AND CODE KERNEL

The design of a first-order multiplexed code can be decom-posed into the three following steps.

1) The codeword length is first chosen so that the ratio be-tween the mdl of the source and (that is, )corresponds to the expected ratio between the number ofhigh-priority and of low-priority bits required to encode

JÉGOU AND GUILLEMOT: ERROR-RESILIENT FIRST-ORDER MULTIPLEXED SOURCE CODES 1487

the high-priority and the low-priority sequences of sym-bols, respectively.

2) For each possible context, the cardinals of equivalenceclasses are chosen according to the probability of sym-bols knowing the given context, as given in (4). This stephas a strong impact on the compression efficiency of thecode.

3) The last step is the IA. This step impacts the error re-silience but not the compression efficiency of the code.

The two first steps have been addressed in detail in [2] andare briefly revised in Section III-A. The methods and analysisproposed for Step 1) and Step 2) still apply to first-order mul-tiplexed codes. In this section, we focus on the last step, whichturns out to be a key issue in a context of error resilience. Formemoryless multiplexed codes, finding the optimal IA has aprohibitive computational cost: the number of possible parti-tions of into subsets of cardinal is givenby . This number corresponds to the number ofall possible distinct partitions. For first-order multiplexed codes,the number of possible partitions is even higher and given by

(14)

The IA conditioned by a given context must take into accountthe IA conditioned by the other contexts. This leads to a newproblematic of IA that we refer to as crossed-IA.

The IA can in addition be designed in order to optimize dif-ferent criteria, e.g., SER or MSE with performance bounds re-spectively given in (12) and (13), assuming hard-decision de-coding. The optimal IA is hence given by the set of partitions(one per context) and corresponding codeword indexing leadingto the minimum argument of these expressions. Each step of theIA optimization procedure according to (12) and (13) requiresthe computation of eigenvalues of matrices of size .The resulting complexity is not easily tractable. One can designcrossed-IA strategies allowing to obtain good performances witha much reduced complexity. In the sequel, a specific approachof crossed-IA will be designed. For sake of clarity, this approachwill be also called crossed-IA.

A. Crossed-IA Strategy

Simplified optimization criteria controlling 1) error propaga-tion due to erroneous contexts and 2) the reconstruction errorof the current symbol can be considered. We first focus on re-ducing the error propagation effect by constructing the partitionin such a way that the decoding of a symbol will result in a cor-rect context even if the current context is erroneous.

1) Reduction of Error Propagation: The problem of IA at agiven time instant is then of trying to maximize the probabilitythat the next symbol will be decoded with a correct context.Let us denote the probability of having the context ata symbol clock instant . This probability is deduced from thesource stationary probabilities by

(15)

where is the set of symbols such that, ,. Now, let us denote the symbol of represented

by the codeword if the current state of the decoder is the con-text . If the received codeword is correct but the context is erro-neous, for example instead of , the context used for the nextsymbol will be erroneous if and only if . Then,we can derive the following similarity measure, which appliesto codewords:

(16)

where denotes the Kronecker operator. The measurerepresents the probability that decoding the codeword

leads to a correct context for subsequent symbols, even if thecurrent context is in error. In order to reduce the computationalcost, i.e., to avoid seeking eigenvectors of large matrices, weassume that . Although this approximation isbiased, it has been experimentally observed that it does notimpact in a noticeable manner the error-resilience property ofthe code. This measure can be extended to the whole set ofcodewords, which leads to

(17)

The IA can then be chosen so that the measure is op-timized. The IA algorithm then proceeds as follows: The IA isfirst initialized by taking, e.g., a lexicographic order. The algo-rithm then proceeds with permutations of the different symbolsacross the equivalence classes, and this for the different con-texts. If a permutation leads to an increase of , then itis retained. The algorithm stops when no further increase of

is observed. The underlying optimization principle isclose to the binary switching algorithm [16], in so far as a localoptimum of (17) is searched using symbol permutations. A sim-ulated annealing (SA) can also be applied. However, experimen-tally, we did not observe any significant improvement in termsof convergence of the final value .

2) Constrained Optimization of the IA: The valueonly depends on the assignment of symbols of to the dif-ferent classes, and this taking into account the different possiblecontexts. It does not take into account the channel characteris-tics, nor the binary representations of the codewords. The per-mutation, for all the contexts, of the binary representations oftwo codewords1 and does not modify the value .One can then choose the binary representations of the differentcodewords in order to reduce the SER (12) or the MSE (13). Theabove optimization of the crossed-IA can be carried out using anSA algorithm.

The SA algorithm can also be processed such that the per-turbations correspond to an arbitrary choice of the contexts andthe codewords to be switched. The performance of this SA al-gorithm strongly depends on the initial choices of the param-eters. We first applied the SA algorithm using a lexicographiccode as the initial state. However, with the SER criterion, theresulting codes were, most of the time, less error-resilient thanthe one obtained with the sequential approach described in Sec-tions V-A-1) and V-A-2). Hence, in the sequel, we have chosenthe code resulting from the approach referred to as crossed-IA,

1This is equivalent to performing, for each context �, the permutation of thesymbols a and a .

1488 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 4, APRIL 2006

and described in Sections V-A-1) and V-A-2), as the initializa-tion of the SA algorithm.

B. Definition and Properties of the Kernel

The crossed-IA strategy according to the SER criterion actu-ally turns out to construct a subset of codewords, referred to asthe kernel of the code with the following properties.

Definition 1: The kernel of a multiplexed code is theset of codewords of such that

if and only if (18)

In other words, a codeword that belongs to the kernel alwaysrepresents the same symbol of , whatever the context. Thus,the kernel can be partitioned into sets of codewords, as

(19)

where is the set of codewords of the kernel that representsthe symbol for every context. For a memoryless multiplexedcode, we have obviously and . For a first-order multiplexed code, the way a codeword of the kernel is de-coded does not depend on past realizations. Thus, these code-words act as hard synchronization points. The probability to en-code a symbol with a codeword of the kernel is given by

(20)

The proposed approximation is deduced from the approxima-tion of given in (4). It is valid if the probability distribu-tion function over codewords is almost uniform (i.e., ,

).

C. Maximum Cardinality of the Kernel

The cardinalities are first chosen in order to minimizethe mdl given by (3). Assuming that these values have alreadybeen found, one can then try to maximize the rate of hard syn-chronization points, i.e., to maximize the cardinality of thekernel. From Definition 1, we deduce that

(21)

which leads to

(22)

This means that, for each symbol value of the high-prioritysource , a maximum number of code-words are assigned to classes , independently of thecontext values (or past symbol realizations). The equality canbe obtained with some appropriate IA. In particular, we ob-served that the crossed-IA strategy with the SER criterion leadsto maximize the kernel cardinality. The code kernel propertycan be used for random data access.

VI. HARD AND SOFT RESYNCHRONIZATION OF CONTEXTS

The SER and SNR performances can be further improved atthe expense of a slight decrease in compression efficiency, byencoding some symbols at known positions with a memorylessmultiplexed code rather than with a first-order code. The code-words of the memoryless multiplexed code then act as hard syn-chronization points. One can also reduce the number of contextsin order to increase the source “excess rate” which can, in turn,be exploited by using the MPM (also called symbol-MAP) orMMSE decoding algorithms. Note that the BCJR algorithm [19]directly applies. Note also that, unlike VLCs, an optimal2 trellisdecoding based on the BCJR algorithm can be performed witha complexity in , which allows for soft decoding of verylarge sequences.

A. Error-Resilience Analysis for Finite Length Sequences

Let denote the probability thatthe event occurs at a given symbol clockinstant . Let be the vector of probabilities that the coder-de-coder product model is in the different states of the state space

. Since we assume that the first symbol is encoded with amultiplexed code (according to the stationary probability ofthe source ), the related vector is denoted . Its com-ponents are deduced from the stationary PMF of thesource and from the channel model as

(23)

(24)

(25)

(26)

For the symbol clock instant , the vector of probabilitiesis given by . Similarly, the general expression of the state

for any given symbol clock instant is given by

(27)

Then, the SER of a sequence of codewords is given by theaverage of symbol error rates over the sequence, i.e.,

SER (28)

(29)

Similarly, the average mean square error for the entire sequenceis given by

MSE (30)

2This is optimal in the sense that the “excess rate” is fully exploited; nopruning is done.

JÉGOU AND GUILLEMOT: ERROR-RESILIENT FIRST-ORDER MULTIPLEXED SOURCE CODES 1489

Fig. 1. Hard synchronization of the decoder using memoryless multiplexed codewords (C ) every three symbols (L = 3). The decoded symbols S and Sdepend only on the received codewords Y and Y .

B. Synchronization Using Memoryless Multiplexed Codes

In order to further reduce error propagation, one can encodesymbols at a priori known positions,3 with a memoryless multi-plexed code constructed from the stationary probability . Thiscode has already been introduced in Section VI-A and is referredto as . It asymptotically reaches, for high values of , the sta-tionary entropy of the source. Here, we restrict the analysis tothe case where this code is used at symbol clock positions suchthat , where . The corresponding codeis referred to as . Every symbols, the code acts as ahard resynchronization point as depicted in Fig. 1. The rate ofsymbols encoded using the conditional probabilities is given by

, which leads to the following compression efficiencyfor :

mdl mdl mdl (31)

Since some storage capacity can be lost during the conver-sion of the pre-encoded low-priority source into a finite se-quence of states, this conversion is performed for the whole se-quence, according to the values and deduced fromthe code and the realization of . Now, let us recall thatthe zero-order and first-order entropies of the source are re-spectively denoted and . If we assume that the codewordlength allows for a good precision in the stationary and condi-tional PMF approximations, the compression efficiency closelyverifies

mdl (32)

Each codeword block of length can be seen as a sequenceof multiplexed codewords of finite length : the vector pro-vides the probabilities to observe the event

for symbols encoded/decoded with . Similarly, the vectorgives the probabilities for all the symbol clock instants

3equivalently, fixed bit positions.

such that . The SER and MSE expressions for thecode are directly given by, respectively, (29) and (30) with

.Soft decoding methods also apply when a memoryless mul-

tiplexed code is periodically used. Only one slight modificationof the BCJR algorithm is required. For symbol clock instantswhere the memoryless multiplexed code is used, the probabili-ties are computed as

(33)

(34)

instead of using the BCJR recursion formula. Note that (34) issimilar to the initialization of the probability inthe BCJR algorithm.

VII. SIMULATION RESULTS

The SER and SNR performance of first-order multiplexedcodes has been evaluated considering order-1 Gauss–Markovsources, with zero-mean, unit-variance and different correlationfactors ( and ). The source is quantized on8 levels with a uniform quantizer. Notice that the reconstruc-tion of the quantized source leads to a SNR of 13.2 dB. Notethat the reconstructed symbols SNRs given here reflect only thenoise induced by the channel and do not include the quantiza-tion noise. The simulations have been performed assuming BSCand AWGN channels. Unless stated otherwise, the experimentsreported in this section have been performed with high-prioritysequences of 1000 symbols, and the results are aver-aged over 1000 channel realizations. The first-order entropiesof the two sources of correlation factors andare, respectively, and . Their stationaryentropy is . The curves given here show the perfor-mance obtained for the high-priority source . The low-pri-ority source is supposed to be pre-encoded with a classical AC.

1490 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 4, APRIL 2006

Fig. 2. Theoretical SER and SNR performance obtained with different IA strategies. The source considered is a Gauss–Markov source (� = 0:5) quantized oneight levels. The first-order multiplexed code used is characterized by c = 5 and 8 contexts.

Fig. 3. SER and SNR performance of memoryless and first-order multiplexed codes. The IA methods used for the two codes are based on SA and crossed-IA+SA,respectively. The figure also shows the SER and SNR curves obtained with ACs. Hard, MPM, and MMSE decoding techniques have been considered.

A. SER and SNR Performance of Memoryless and First-OrderMultiplexed Codes

The first set of experiments aimed at showing the respectiveperformance of memoryless and first-order multiplexed codes inpresence of bit errors, considering first a BSC channel. We havefirst started by studying the impact of the different IA strategies.Fig. 2 shows the theoretical SER and SNR performance of thefirst-order multiplexed codes obtained for a range of bit errorrates (BER) when encoding a Gauss–Markov sourcewith different IA strategies. It can be observed that the lexico-

graphic IA and the crossed-IA optimized with an SA algorithmoutperforms the other IA methods. These two methods will beretained in the rest of the experiments.

Fig. 3 shows how the memoryless and first-order multiplexedcodes compare with classical codes (i.e., Huffman and ACs).The memoryless codes amounts to set the number of con-text to one. For the first-order multiplexed codes , the numberof contexts considered is eight. The codes are constructed froma FLC codebook with a codeword length 5 and with thetwo IA strategies retained: SA-based IA and crossed-IA + SAtechniques. Table II gives, for a correlation factor of 0.5, both

JÉGOU AND GUILLEMOT: ERROR-RESILIENT FIRST-ORDER MULTIPLEXED SOURCE CODES 1491

Fig. 4. SER and SNR performances of multiplexed and arithmetic codes when used in a tandem source-channel coding chain. The figure shows the SER andSNR values obtained with the ACs with high-priority sequences of length K = 100 and K = 1000. The multiplexed code considered relies on a lexicographicIA. The high-priority sequence considered when the multiplexed code is used is K = 1000. The correlation factor of the source is � = 0:5.

the theoretical mdl values given by (3) and the mdl values ob-tained by simulation for several values of the codeword length

and of sequence length . The mdl values given in Table IIhave been computed by taking into account the number of bitsof the encoded sequence of low-priority symbols that have re-ally been multiplexed together with the source . For a cor-relation factor of , the first-order multiplexed codeslead to an mdl of 1.70. Considering first hard decoding, Fig. 3shows the strong advantage in terms of SER and SNR of first-order multiplexed codes versus ACs in presence of bit errorsfor similar compression efficiency. Fig. 3 also shows the extragain in SER and SNR obtained when using an MMSE soft de-coding technique. As expected, when the source correlation islow , the gain brought by soft decoding techniqueswith respect to hard decoding is small. Similar results have beenobtained with the MAP and MPM estimation criteria. However,for sources with high correlation, MPM and MMSE decodingallow us to improve, respectively, the SER and SNR perfor-mances.

B. Respective SER and SNR Performance of Multiplexed andArithmetic Codes in a Tandem Source-Channel Coding Scheme

A second set of experiments aimed at comparing the perfor-mance of multiplexed and ACs when used in a tandem source-channel coding chain. We consider a Gauss–Markov source witha correlation factor . The arithmetic coder allows alsoto exploit first-order source statistics. The source coders arefollowed by a rate 1/2 recursive systematic convolutional code(RSCC) of constraint length 9, for protection and error correc-tion. The overall rates (source+channel) of both chains are sim-ilar. In order to estimate the coding gain brought by the multi-plexed codes, we consider an AWGN channel. To have a com-parison as fair as possible, we assume that the source multi-plexed codes have been designed without knowing the channel

characteristics (the SNR). In particular, a simple lexicographicIA strategy has been used. For both schemes, hard source de-coding has been considered. Notice that, in both chains, softoutputs of the RSCC could be used to improve the source de-coder performance.

The SER and SNR values obtained are shown in Fig. 4. Acoding gain of about 0.75 and 1.5 dB has been obtained whenusing multiplexed versus ACs, for lengths of high-prioritysequences of 100 and 1000, respectively. The in-creasing gap with the sequence length results from the fact that,in contrast with ACs, in contrast with ACs, multiplexed codesdo not suffer from dramatic desynchronization problems ifsome residual bit errors occur. This leads to significantly higherSNR and SER performances. Fig. 4 also shows the resultsobtained if the code is transmitted without channel protection.The multiplexed codes offer an inherent unequal protectionof both sources without requiring the channel properties tobe known. Notice that, however, when used in a classicaltandem source-channel coding chain, the error correcting codesapply to both the high and the low-priority source. When thetwo sources are encoded separately (as in the chain based onclassical ACs), one has the additional freedom to use differentchannel codes for the two sources in the direction of unequalerror protection. However, this requires the channel propertiesto be known at the time of encoding.

C. Synchronization of the Codes

A third set of experiments aimed at evaluating the perfor-mance of the synchronization techniques based on a periodicuse of memoryless multiplexed codewords. Memoryless multi-plexed codewords are inserted every symbols. The values ofthe parameter vary between two and 50 and are shown on thecurve. When increases, the mdl obviously decreases to thevalue obtained when encoding the entire sequence with first-

1492 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 4, APRIL 2006

Fig. 5. SER–mdl tradeoff of the synchronization scheme based on a periodicuse of codewords of memoryless multiplexed code every L = 2; 3 . . . 50

symbols. The source considered is a Gauss–Markov source with a correlationfactor of � = 0:5. The circular points are those obtained with the memorylesscode C and the first-order multiplexed code C (quasioptimal in terms ofmdl), respectively.

order multiplexed codes. Fig. 5 illustrates the correspondingmdl-SER tradeoff on a BSC. The performance shown have beenobtained with MPM soft decoding. The periodic use of memo-ryless multiplexed codes is clearly an efficient way to decreasethe SER with a low cost in terms of mdl. The corresponding SERcurve is convex and located below the dotted line curve linkingthe mdl-SER points of the memoryless and first-ordermultiplexed codes. This curve corresponds to the case wherepart of the sequence would be coded with memoryless codes andthe rest with first-order codes in a proportion that varies alongthe mdl axis. The fact that the SER curve is below the dottedline curve evidences the effect of the periodic use of memory-less codes on the resynchronization capability of the decoder.

VIII. CONCLUSION

In this paper, we have described a new family of codesthat allow to exploit first-order source statistics while stillbeing error-resilient. Performance bounds in terms of SER andMSE are given. With respect to classical source codes (i.e.,Huffman and arithmetic codes), the memoryless and first-ordermultiplexed codes are shown to have significantly higher SERand SNR performance in presence of bit errors, for the samecompression efficiency. The IA having an impact on the re-sulting efficiency of the codes, different IA strategies have beendescribed. It is shown in particular that a significant SER andSNR performance gain can be obtained with an IA referred toas crossed-IA that takes into account the channel characteristics.A subset of codewords, the kernel of the code, is also shownto have properties allowing for improved error-resilience andrandom data access. It is shown that, when used in a tandem

source-channel coding chain, the resilience properties of thecodes allow to obtain a coding gain up to 1.5 dB in comparisonwith ACs for a sequence of 1000 high-priority symbols. Thedecoder resynchronization capability has been shown to befurther improved by a periodic use of memoryless multiplexedcodewords at the expense of a controlled increased mdl.

APPENDIX ITRANSITION PROBABILITY OF THE CODER–DECODER

PRODUCT MODEL

Given that, the first term on the right side of

(9) can be rewritten as

(35)

In addition, for given symbol realization and, the multiplexed codeword is such that .

Therefore, assuming that all the codewords belonging to a givenequivalence class have the same probability (which is valid if thelow-priority source is compressed and converted into a sequenceof states in an quasi-optimal manner)

(36)

(37)

From (35) and (36), we deduce

(38)

The other probability of (9) can be computed as follows, startingwith the following simplification:

(39)

Since for a given symbol value associated to a mul-

tiplexed codeword , one has the previous symbolvalue

(40)

(41)

Then from (37), (38), (40) and (41), one can finally deduce

(42)

REFERENCES

[1] H. Jégou and C. Guillemot, “Source multiplexed codes for error-pronechannels,” in Proc. IEEE Int. Conf. Communications (ICC), vol. 5, May2003, pp. 3604–3608.

[2] , “Robust multiplexed codes for compression of heterogeneousdata,” IEEE Trans. Inf. Theory, vol. 51, no. 4, pp. 1393–1407, Apr.2005.

JÉGOU AND GUILLEMOT: ERROR-RESILIENT FIRST-ORDER MULTIPLEXED SOURCE CODES 1493

[3] J. Maxted and J. Robinson, “Error recovery for variables length codes,”IEEE Trans. Inf. Theory, vol. IT-31, no. 6, pp. 794–801, Nov. 1985.

[4] T. Ferguson and J. H. Rabinowitz, “Self-synchronizing Huffman codes,”IEEE Trans. Inf. Theory, vol. IT-30, no. 4, pp. 687–693, Jul. 1984.

[5] W. Lam and A. Reibman, “Self-synchronizing variable-length codes forimage transmission,” in Proc. Intl. Conf. Acoustics, Speech, Signal Pro-cessing (ICASSP), vol. 3, Mar. 1992, pp. 477–480.

[6] Y. Takishima, M. Wada, and H. Murakami, “Reversible variable lengthcodes,” IEEE Trans. Commun., vol. 43, no. 2/3/4, pp. 158–162, Feb.1995.

[7] J. Wen and J. D. Villasenor, “Reversible variable length codes for effi-cient and robust image and video coding,” in Proc. Data CompressionConf. (DCC), Apr. 1998, pp. 471–480.

[8] R. Bauer and J. Hagenauer, “Turbo FEC/VLC decoding and its applica-tion to text compression,” in Proc. Conf. Information Theory Systems,Mar. 2000, pp. WA6.6–WA6.11.

[9] K. Sayood and J. Borkenhagen, “Use of residual redundancy in the de-sign of joint source/channel coders,” IEEE Trans. Commun., vol. 39, no.6, pp. 838–846, Jun. 1991.

[10] A. Murad and T. Fuja, “Joint source-channel decoding of variable lengthencoded sources,” in Proc. Information Theory Workshop (ITW), Jun.1998, pp. 94–95.

[11] N. Demir and K. Sayood, “Joint source-channel coding for variablelength codes,” in Proc. Data Compression Conf. (DCC), Mar. 1998, pp.139–148.

[12] A. Guyader, E. Fabre, C. Guillemot, and M. Robert, “Jointsource-channel turbo decoding of entropy coded sources,” IEEE J.Sel. Areas Commun., vol. 19, no. 9, pp. 1680–1696, Sep. 2001.

[13] B. Pettijohn, K. Sayood, and M. Hoffman, “Joint source/channel codingusing arithmetic codes,” presented at the Data Compression Conf.(DCC), Snowbird, UT, Mar. 2000.

[14] D. Redmill and N. Kingsbury, “The erec: an error resilient technique forcoding variable-length blocks of data,” IEEE Trans. Image Process., vol.5, pp. 565–574, Apr. 1996.

[15] J. Hagenauer, “Rate-compatible punctured convolutional codes (RCPCcodes) and their applications,” IEEE Trans. Commun., vol. 36, no. 4, pp.389–400, Apr. 1988.

[16] K. Zeger and A. Gersho, “Pseudo-gray coding,” IEEE Trans. Commun.,vol. 38, no. 12, pp. 2147–2158, Dec. 1990.

[17] N. Farvardin, “A study of vector quantization for noisy channels,” IEEETrans. Inf. Theory, vol. 36, no. 4, pp. 799–809, Jul. 1990.

[18] C. Weidmann, “Reduced-complexity soft-in-soft-out decoding ofvariable length codes,” presented at the Int. Conf. Information Theory(ISIT), Yokohama, Japan, Jul. 2003.

[19] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linearcodes for minimizing symbol error rate,” IEEE Trans. Inf. Theory, vol.20, no. 2, pp. 284–287, Mar. 1974.

[20] T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork: Wiley, 1991, ch. 4.

[21] A. E. Gamal, L. Hemachandra, I. Shperling, and V. Wei, “Using simu-lated annealing to design good codes,” IEEE Trans. Inf. Theory, vol. 33,no. 1, pp. 116–123, Jan. 1987.

Hervé Jégou (S’05–A’05) received the M.E. degreein industrial management from the Ecole NationaleSuprieures des Mines de Saint-Etienne, France, in2000, the M.S. degree in computer science from theUniversity of Rennes I, Rennes, France, in 2002, andthe Ph.D. degree at the University of Rennes in 2005.

From 2001 to 2003, he was also with the EcoleNormale Suprieure de Cachan, France. His researchinterests include joint-source channel coding anderror-resilient source coding.

Christine Guillemot (M’93–SM’04) received thePh.D. degree from Ecole Nationale Superieure desTelecommunications (ENST), Paris, France.

From 1985 to October 1997, she was with FranceTelecom/CNET, where she was involved in variousprojects in the domain of coding for TV, HDTV,and multimedia applications. From January 1990to mid-1991, she was a visiting Scientist at Bell-core, NJ. She is currently Director of Research atIRISA/INRIA, University of Rennes I, Rennes,France, where she is in charge of a research group

dealing with image modeling, processing, and video communication. Herresearch interests are signal and image processing, video coding, and jointsource and channel coding for video transmission over the Internet and overwireless networks.

Dr. Guillemot has served as Associate Editor for IEEE TRANSACTIONS ON

IMAGE PROCESSING (2000–2003). She is currently serving as Associate Editorfor IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY.