Bayesian Symbol Detection in Wireless Relay Networks via Likelihood-Free Inference

12
arXiv:1007.4603v1 [stat.AP] 27 Jul 2010 1 Bayesian Symbol Detection in Wireless Relay Networks via Likelihood-Free Inference Gareth W. Peters 1,2 , Ido Nevat 3 , Scott A. Sisson 1 , Yanan Fan 1 and Jinhong Yuan 3 1 School of Mathematics and Statistics, University of New South Wales, Sydney, 2052, Australia; email: [email protected] 2 CSIRO Sydney, Locked Bag 17, North Ryde, New South Wales, 1670, Australia 3 School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney, Australia. Abstract—This paper presents a general stochastic model developed for a class of cooperative wireless relay networks, in which imperfect knowledge of the channel state information at the destination node is assumed. The framework incorporates multiple relay nodes operating under general known non-linear processing functions. When a non-linear relay function is con- sidered, the likelihood function is generally intractable resulting in the maximum likelihood and the maximum a posteriori detectors not admitting closed form solutions. We illustrate our methodology to overcome this intractability under the example of a popular optimal non-linear relay function choice and demon- strate how our algorithms are capable of solving the previously intractable detection problem. Overcoming this intractability involves development of specialised Bayesian models. We develop three novel algorithms to perform detection for this Bayesian model, these include a Markov chain Monte Carlo Approximate Bayesian Computation (MCMC-ABC) approach; an Auxiliary Variable MCMC (MCMC-AV) approach; and a Suboptimal Exhaustive Search Zero Forcing (SES-ZF) approach. Finally, numerical examples comparing the symbol error rate (SER) performance versus signal to noise ratio (SNR) of the three detection algorithms are studied in simulated examples. I. BACKGROUND Cooperative communications systems, [1] and [2], have become a major focus for communications engineers. Particu- lar attention has been paid to the wireless network setting, as it incorporates spatial resources to gain diversity, and enhance connection capability and throughput. More recently, the focus has shifted towards incorporation of relay nodes [3], which are known to improve energy efficiency, and reduce the interference level of wireless channels [4]. In simple terms, such a system broadcasts a signal from a transmitter at the source through a wireless channel. The signal is then received by each relay node and a relay strategy is applied before the signal is retransmitted to the destination. A number of relay strategies have been studied in the literature ([5]; [2]). We focus on the amplify-and-forward strategy of [6] in which the relay sends a transformed version (determined by the relay function) of its received signal to the destination. The relay function can be optimized for different design objectives ([7]; [8]; [9]). It is common in the relay network literature to consider non-linear relay function choices which satisfy some concept of optimality. For example, in the estimate and forward (EF) scheme, in the case of BPSK signaling, the optimal relay function is the hyperbolic tangent [10]. Other criteria for which the optimal relay function is non- linear include: capacity maximisation [10], minimum error probability at the receiver [11], SNR maximisation [9], rate maximisation [12] and minimisation of the average error probability [13]. Currently, in all these cited works the authors have solved the problem of what the optimal choice of the non-linear relay function should be in order to satisfy the desired design constraint. However, we note that in all cases, whenever a non-linear relay function is considered, though it may satisfy the constraint of optimality, it will result in an intractable detection problem. This intractability has not been considered in the literature and therefore it has not been possible to tackle the resulting detection problem. We specify explicitly how this intractability arises and then develop and demonstrate extensively our solution to this general relay network detection problem. In this paper we will demonstrate that the choice of relay function directly affects the tractability of the system model. We will focus on a single hop relay design in which the number of relays and the type of relay function are allowed to be known, general non-linear functions. However, our methodology trivially extends to arbitrary relay topologies and multiple hop networks. Fig. 1 presents the system model considered. Our model incorporates stochastic parameters that are associated with each relay channel link. That is, we consider parameters associated with the model as random variables, which must be jointly estimated with the unknown random vector of transmitted symbols. Since this framework incorporates general relay functions, the resulting likelihood for the model typically cannot be evaluated pointwise, and in some cases cannot even be written down in a closed form. Bayesian ”likelihood-free” inference methodologies overcome the problems associated with the intractable likelihood by replacing explicit evaluation of the likelihood with simulation from the model. These methods are also collectively known as Approximate Bayesian Computa- tion (ABC), see [14], [15], [16], [17], [18], [19], and references therein for a detailed overview of the methodological and theoretical aspects. Applications of ABC methods are becoming widespread, and include: telecommunications [18]; extreme value theory [20]; protein networks, ([14], [21]); SIR models [22]; species migration [23]; pathogen transmission [24], non-life insurance claims reserving [16]; and operational risk [25].

Transcript of Bayesian Symbol Detection in Wireless Relay Networks via Likelihood-Free Inference

arX

iv:1

007.

4603

v1 [

stat

.AP

] 27

Jul

201

01

Bayesian Symbol Detection in Wireless RelayNetworks via Likelihood-Free Inference

Gareth W. Peters1,2, Ido Nevat3, Scott A. Sisson1, Yanan Fan1 and Jinhong Yuan3

1 School of Mathematics and Statistics, University of New South Wales, Sydney, 2052, Australia;email: [email protected]

2 CSIRO Sydney, Locked Bag 17, North Ryde, New South Wales, 1670, Australia3 School of Electrical Engineering and Telecommunications,University of New South Wales, Sydney, Australia.

Abstract—This paper presents a general stochastic modeldeveloped for a class of cooperative wireless relay networks, inwhich imperfect knowledge of the channel state informationatthe destination node is assumed. The framework incorporatesmultiple relay nodes operating under general known non-linearprocessing functions. When a non-linear relay function is con-sidered, the likelihood function is generally intractableresultingin the maximum likelihood and the maximum a posterioridetectors not admitting closed form solutions. We illustrate ourmethodology to overcome this intractability under the example ofa popular optimal non-linear relay function choice and demon-strate how our algorithms are capable of solving the previouslyintractable detection problem. Overcoming this intractabilityinvolves development of specialised Bayesian models. We developthree novel algorithms to perform detection for this Bayesianmodel, these include a Markov chain Monte Carlo ApproximateBayesian Computation (MCMC-ABC) approach; an AuxiliaryVariable MCMC (MCMC-AV) approach; and a SuboptimalExhaustive Search Zero Forcing (SES-ZF) approach. Finally,numerical examples comparing the symbol error rate (SER)performance versus signal to noise ratio (SNR) of the threedetection algorithms are studied in simulated examples.

I. BACKGROUND

Cooperative communications systems, [1] and [2], havebecome a major focus for communications engineers. Particu-lar attention has been paid to the wireless network setting,as it incorporates spatial resources to gain diversity, andenhance connection capability and throughput. More recently,the focus has shifted towards incorporation of relay nodes [3],which are known to improve energy efficiency, and reduce theinterference level of wireless channels [4].

In simple terms, such a system broadcasts a signal froma transmitter at the source through a wireless channel. Thesignal is then received by each relay node and a relay strategyis applied before the signal is retransmitted to the destination.A number of relay strategies have been studied in the literature([5]; [2]). We focus on theamplify-and-forwardstrategy of [6]in which the relay sends a transformed version (determined bythe relay function) of its received signal to the destination. Therelay function can be optimized for different design objectives([7]; [8]; [9]). It is common in the relay network literatureto consider non-linear relay function choices which satisfysome concept of optimality. For example, in the estimateand forward (EF) scheme, in the case of BPSK signaling,the optimal relay function is the hyperbolic tangent [10].

Other criteria for which the optimal relay function is non-linear include: capacity maximisation [10], minimum errorprobability at the receiver [11], SNR maximisation [9], ratemaximisation [12] and minimisation of the average errorprobability [13].

Currently, in all these cited works the authors have solvedthe problem of what the optimal choice of the non-linearrelay function should be in order to satisfy the desired designconstraint. However, we note that in all cases, whenever anon-linear relay function is considered, though it may satisfythe constraint of optimality, it will result in an intractabledetection problem. This intractability has not been consideredin the literature and therefore it has not been possible totackle the resulting detection problem. We specify explicitlyhow this intractability arises and then develop and demonstrateextensively our solution to this general relay network detectionproblem. In this paper we will demonstrate that the choice ofrelay function directly affects the tractability of the systemmodel. We will focus on a single hop relay design in whichthe number of relays and the type of relay function areallowed to be known, general non-linear functions. However,our methodology trivially extends to arbitrary relay topologiesand multiple hop networks. Fig. 1 presents the system modelconsidered. Our model incorporates stochastic parametersthatare associated with each relay channel link. That is, weconsider parameters associated with the model as randomvariables, which must be jointly estimated with the unknownrandom vector of transmitted symbols.

Since this framework incorporates general relay functions,the resulting likelihood for the model typically cannot beevaluated pointwise, and in some cases cannot even be writtendown in a closed form. Bayesian ”likelihood-free” inferencemethodologies overcome the problems associated with theintractable likelihood by replacing explicit evaluation of thelikelihood with simulation from the model. These methods arealso collectively known as Approximate Bayesian Computa-tion (ABC), see [14], [15], [16], [17], [18], [19], and referencestherein for a detailed overview of the methodological andtheoretical aspects.

Applications of ABC methods are becoming widespread,and include: telecommunications [18]; extreme value theory[20]; protein networks, ([14], [21]); SIR models [22]; speciesmigration [23]; pathogen transmission [24], non-life insuranceclaims reserving [16]; and operational risk [25].

2

source

Relay 1

Relay L

destination

X +

X +

X +

X +

Relay 2X + X +

( )1g

( )2g

( )Lg

( )1w

( )2w

( )Lw

( )1v

( )2v

( )Lv

( )1h

( )2h

( )Lh

Fig. 1. The system model studied in this paper involves transmissionfrom one source, through each of theL relay channels,h(l), tothe relay. Additive complex Gaussian noise,W was included at thereceiver of the relay then the signal is processed and retransmitted bythe relay to the destination. In this process the signal is transmittedagain throughL channels denoted byg(l) and additive complexGaussian noise,V was included at the destination.

A. Main contributions

In this paper we develop a novel sampling methodol-ogy based on the Markov Chain Monte Carlo Approx-imate Bayesian Computation (MCMC-ABC) methodology[26]. Since the focus of this paper is on detection of the trans-mitted symbols, we will integrate out the nuisance channelparameters. We also develop two novel alternative solutionsfor comparison: an auxiliary variable MCMC (MCMC-AV)approach and a suboptimal explicit search zero forcing (SES-ZF) approach. In the MCMC-AV approach the addition ofauxiliary variables results in closed form expressions forthefull conditional posterior distributions. We use this facttodevelop the MCMC-AV sampler and demonstrate that thisworks well when small numbers of relay nodes are considered.The SES-ZF approach involves an approximation based onknown summary statistics of the channel model and an explicitexhaustive search over the parameter space of code words.This performs well for relatively small numbers of relays anda high signal to noise ratio (SNR).

The paper is organised as follows: in Section II we introducea stochastic system model for the wireless relay system andthe associated Bayesian model. We discuss the intractabilityarising from these models when one considers arbitrary relayfunctions. Section III presents the likelihood-free methodologyand the generic MCMC-ABC algorithm. In Section IV wepresent the proposed algorithm, namely maximuma posteriorisequence detection. In Section V we derive an alternativenovel algorithm based on an auxiliary variable model for thejoint problem of coherent detection and channel estimationfor arbitrary relay functions. We contrast the performanceofthis approach with the performance of the MCMC-ABC basedalgorithm. Section VI presents a detector that is based onknown summary statistics of the channel model and an explicitexhaustive search over the parameter space of code words. Weshall demonstrate its performance, whilst acknowledging its

flaws. Section VII presents results and analysis and conclu-sions are provided in Section VIII.

The following notation is used throughout: random variablesare denoted by upper case letters and their realizations bylower case letters. In addition, bold will be used to denote avector or matrix quantity, upper subscripts will refer to therelay node index and lower subscripts refer to the elementof a vector or matrix. We define the following notation thatshall be used throughout,g = g1:L =

(g(1), · · · , g(L)

); g[n]

refers to the state of the Markov chain at iterationn; g[n]i

refers to thei-th element ofg[n]. We define the generictransition kernel for the Markov chain asq

(θ∗ ← θ[n−1]

),

which updates the Markov chain for the posterior parameters,θ, from the(n− 1)-th Markov chain step to then-th via theproposalq.

II. BAYESIAN SYSTEM MODEL AND MAP DETECTION

In this section we introduce the system model and aBayesian model for inference on the system model parameters.In our system model, the channels in the relay network aremodelled stochastically, where we do not knowa priori therealized channel coefficient values. Instead, we consider partialchannel state information (CSI), in which we assume knownstatistics of the distribution of the channel coefficients.

A. System model and assumptions

Here we present the system model and associated assump-tions. The system model is depicted in Fig. 1.

1) Assume a wireless relay network with a single sourcenode, transmitting sequences ofK symbols denoteds =s1:K , to a single destination viaL relay nodes.

2) The symbols in the sequence ofK symbolss are takenfrom a digital constellation with cardinalityM .

3) There areL relays which cannot receive and transmit onthe same time slot and on the same frequency band. Wethus consider a half duplex system model in which thedata transmission is divided into two steps. In the firststep, the source node broadcasts a code words ∈ Ωfrom the codebook to all theL relay nodes. In thesecond step, the relay nodes then transmit their signalsto the destination node on orthogonal non-interferingchannels. We assume that all channels are independentwith a coherence interval larger than the codewordlengthK.

4) Assume imperfect CSI in which noisy estimates of thechannel model coefficients for each relay link are known.This is a standard assumption based on the fact that atraining phase has been performed a priori. This in-volves an assumption regarding the channel coefficientsas follows:

• From source to relay there areL i.i.d. channels pa-

rameterized byH(l) ∼ F

(h(l), σ2

h

)L

l=1, where

F (·) is the distribution of the channel coefficients,h(l) is the estimated channels coefficient andσ2

h isthe associated estimation error variance.

3

• From relay to destination there areL i.i.d. channelsparameterized by

G(l) ∼ F

(g(l), σ2

g

)L

l=1, where

F (·) is the distribution of the channel coefficients,g(l) is the estimated channels coefficient andσ2

g isthe associated estimation error variance.

5) The received signal at thel-th relay is a random vectorgiven by

R(l) = SH(l) +W(l), l ∈ 1, · · · , L , (1)

whereH(l) is the channel coefficient between the trans-mitter and thel-th relay, S ∈ ΩM is the transmittedcode-word andW(l) is the noise realization associatedwith the relay receiver.

6) The received signal at the destination is a random vectorgiven by

Y(l) = f (l)(R(l)

)G(l) +V(l), l ∈ 1, · · · , L , (2)

whereG(l) is the channel coefficient between thel-threlay and the receiver,

f (l)(r(l)

),

[f (l)

(r(l)1

), . . . , f (l)

(r(l)K

)]⊤is the mem-

oryless relay processing function (with possibly differentfunctions at each of the relays) andV(l) is the noiserealization associated with the relay receiver.

7) Conditional onh(l), g(l)

L

l=1, we have that all re-

ceived signals are corrupted by zero-mean additive whitecomplex Gaussian noise. At thel-th relay the noisecorresponding to thel-th transmitted symbol is denotedby random variableW (l)

i ∼ CN(0, σ2

w

). At the receiver

this is denoted by random variableV (l)i ∼ CN

(0, σ2

v

).

Additionally, we assume the following properties:

E

[W

(l)i W

(m)

j

]= E

[V

(l)i V

(m)

j

]= E

[W

(l)i V

(m)

j

]= 0,

∀i, j ∈ 1, . . . ,K , ∀l,m ∈ 1, . . . , L , i 6= j, l 6= m,whereW j denotes the complex conjugate ofWj .

B. Prior specification and posterior

Here we present the relevant aspects of the Bayesian modeland associated assumptions. In order to construct a Bayesianmodel we need to specify the likelihoodp (y|s,h,g), and thepriors for the parameterss,h,g which are combined underBayes theorem to obtain a posteriorp (s,g,h|y). We beginby specifying the prior models for the sequence of symbolsand the unknown channel coefficients. At this stage we notethat in terms of system capacity it is only beneficial to transmita sequence of symbols if it aids detection. This is achieved byhaving correlation in the transmitted symbol sequences1:K .We assume that since this is part of the system design, theprior structure forp (s1:K) will be known and reflect thisinformation.

1) Under the Bayesian model, the symbol sequence istreated as a random vectorS = S1:K . The priorfor the random symbols sequence (code word)S1:K isdefined on a discrete support denotedΩ with |Ω| =MK elements and probability mass function denoted byp (s1:K).

2) The assumption of imperfect CSI is treated under aBayesian paradigm by formulating priors for the chan-nel coefficients as follows:

• From source to relay there areL i.i.d. channels pa-

rameterized byH(l) ∼ CN

(h(l), σ2

h

)L

l=1, where

h(l) is the estimated channels coefficient andσ2h the

associated estimation error variance.• From relay to destination there areL i.i.d. channels

parameterized byG(l) ∼ CN

(g(l), σ2

g

)L

l=1, where

g(l) is the restituted channels coefficient andσ2g the

associated estimation error variance.

C. Inference and MAP sequence detection

Since the primary concern in designing a relay networksystem is on SER versus SNR, our goal is oriented towardsdetection of transmitted symbols and not the associated prob-lem of channel estimation. We will focus on an approachwhich samplesS1:K ,G = G1:L,H = H1:L jointly from thetarget posterior distribution, since our model also considersthe channels to be stochastic and unknown.

In particular we consider the maximuma posteriori(MAP)sequence detector at the destination node. Therefore the goalis to design a MAP detection scheme fors at the destination,based on the received signals

y(l)

L

l=1, the noisy channels

estimates as given by the partial CSIh(l)

L

l=1,g(l)

L

l=1,

σ2h, σ2

g , and the noise variances,σ2w andσ2

v.Since the channels are mutually independent, the received

signalsr(l)

L

l=1and

y(l)

L

l=1are conditionally independent

givens,g,h. Thus, the MAP decision rule, after marginalizingout the unknown channel coefficients, is given by

s = argmaxs∈Ω

p (s|y)

= argmaxs∈Ω

L∏

l=1

∫ ∫p(s, g

(l), h

(l)|y(l))dhdg

= argmaxs∈Ω

L∏

l=1

∫ ∫p(y(l)|s, h(l)

, g(l))p (s) p (g) p (h) dhdg.

(3)

Therefore, in order to perform detection of the transmittedsymbols, we need to evaluate the likelihood function of themodel. In the next section we will demonstrate that the like-lihood function in our model is intractable and as a result wedevelop the likelihood-free based methodology and associatedMCMC-ABC algorithm to perform the MAP detection.

D. Evaluation of the likelihood function

The likelihood modelp(y(l)|s, h(l), g(l)

)for this relay

system is in general computationally intractable as we nowshow. There are two potential difficulties that arise whendealing with non-linear relay functions. The first relates tofinding the distribution of the signal transmitted from eachrelay to the destination. This involves finding the density ofthe random vectorf (l)

(SH(l) +W(l)

)G(l) conditional on

realizationsS = s,H = h,G = g. This is not always possiblefor a general non-linear multivariate functionf (l). Conditional

4

on S = s,H = h,G = g, we know the distribution ofR(l)|s,g,h,

pR (r|s,g,h)

= p(sh(l) +w(l)|s, h(l), g(l)

)= CN

(sh(l), σ2

wI).

However, finding the distribution of the random vector afterthenon-linear function is applied i.e. the distribution off

(R(l)

),

f(R(l)

)G(l) given s, h(l), g(l), involves the following change

of variable formula

p(f(r(l)

)|s, h(l), g(l)

)

= pR

((f (l))−1

(r(l)

)|s, h(l), g(l)

) ∣∣∣∣∣∂ f (l)

∂r(l)

∣∣∣∣∣

−1

,

which can not always be written down analytically for arbitraryf . The second more serious complication is that even in caseswhere the density for the transmitted signal is known, one mustthen solve aK-fold convolution to obtain the likelihood:

p(y(l)|s,g,h

)= p

(f(r(l)

)|s,g,h

)∗ pV(l)

=

∫ ∞

−∞

· · ·

∫ ∞

−∞

p(f (z|s,g,h)

)pV(l)

(y(l) − z

)dz1 . . . dzK ,

(4)

where∗ denotes the convolution operation. Typically thiswill be intractable to evaluate pointwise. However, in the mostsimplistic case of a linear relay function, that is,f (l) (x) = x,the likelihood can be obtained analytically as

p(y(l)|s, h(l), g(l)

)= CN

(sh(l)g(l),

(∣∣∣g(l)∣∣∣2

σ2w + σ2

v

)I

),

whereI is the identity matrix.Hence, the resulting posterior distribution in (3) involves

combining the likelihood given in (4) with the priors for thesequence of symbols and the channel coefficients.

III. L IKELIHOOD-FREE METHODOLOGY

In this section, we provide a concise background intro-duction on likelihood-free methodology, also known as ap-proximate Bayesian computation, and the sampling algorithmsutilised to obtain samples from the ABC posterior model,based on [18], [22], [26], [17]. Likelihood-free inferencedescribes a suite of methods developed specifically for work-ing with models in which the likelihood is computationallyintractable. We consider the likelihood intractability toarise inthe sense that we may not evaluate the likelihood pointwise,such as in (4). In particular, for the relay models considered,we can only obtain a general expression for the likelihoodin terms of multivariate convolution integrals and hence wedo not have an explicit closed form expression for which toevaluate the likelihood pointwise.ABC Methodology and MCMC-ABC:All ABC methodologies are based upon the observation thatone may replace the problem of evaluating the intractable like-lihood point-wise with the typically trivial exercise of simulat-ing from the likelihood model. Simulation from the likelihoodusually involves trivial generation of random variables from

a known parametric distribution, followed by application of atransformation corresponding to the imposed physical model.

It is shown in [27] and further developed in [18] thatthe ABC method embeds an “intractable” target posteriordistribution (resulting from the intractability of the likelihoodfunction (4)), denoted byp (s|y), into an augmented posteriormodel,

p (s,x|y) ∝ p (y|x, s; ǫ) p (x|s) p (s) ,

wherex ∈ X is an auxiliary vector on the same space asobservationsy. In this augmented Bayesian model, the densityp (y|x, s; ǫ) acts as a weight for the intractable posterior. Itcompares the observationsy with the auxiliary (”syntheticdata”x) generated from the likelihood model, and determineswhether or not they are within anǫ tolerance of each other.Choices for this weighting,p (y|x, s; ǫ), will be explainedbelow. For detailed justification for this framework see [18],[16] and [26]. In this paper we consider marginal sampler.Summarising the ideas in these papers, we note that the targetmarginal posteriorp(s|y) can be represented in the ABCframework as follows

p (s|y) ∝ p (s) p (y|s)

= p (s)

X

p(y|x, s; ǫ)p(x|s)dx

= p (s)Ep(x|s) [p(y|x, s; ǫ)]

≈p (s)

D

D∑

d=1

p(y|xd, s; ǫ) := p (s|y) .

(5)

Presenting the ABC approximate posterior in this way allowsus to provide intuition for the ABC mechanism. To understandthis, we note that the Monte Carlo approximation, denoted byp (s|y), of the integral with respect to the intractable likeli-hood, is approximated viaD draws of auxiliary variablesxd

(”synthetic data”) fromp(x|s). Therefore, this demonstratesexplicitly how one can replace the evaluation of the likelihoodwith simulation from the likelihood model. As discussed in[16], [17], the choice of weighting function can affect theMonte Carlo estimate. The functionp(y|x, s; ǫ) is typically astandard smoothing kernel [28] with scale parameterǫ whichweights the intractable posterior with high values in regionswhen the observed datay and auxiliary datex are similar. Forexample, uniform kernels are commonplace in likelihood-freemodels (e.g. [26]), although alternatives such as Epanechnikov[15] and Gaussian kernels [17] provide improved efficiency.

In this paper we consider two popular choices for weightingfunctions, p(y|x, s; ǫ), which are easily interpreted as hardand soft decision functions [17]. The popular ”Hard Decision”(HD) is given by

p (y|x, s; ǫ) ∝

1, if ρ (y,x) ≤ ǫ,

0, otherwise.(6)

This makes a hard decision to reward those simulated auxiliaryvariables,x, that are within anǫ-tolerance of the actualobserved data,y, as measured by distance metricρ. Anotherexample is a ”Soft Decision” (SD) weighting function given

5

by

p (y|x, s; ǫ) ∝ exp

(−ρ (y,x)

ǫ2

). (7)

An important practical difference between the HD and SDkernels is that, even though the weighting of the intractableposterior, obtained by the SD density, may be small, it willremain non-zero unlike the HD rule.

The choice ofǫ is therefore important for performances ofthe ABC methodology. Ifǫ is too large, the approximationp (s|y) is poor; for example whenǫ→∞ the resulting ABCposterior p (s|y) in (5) corresponds to the prior since theweighting function is uniform for all values of the posteriorparameters. However, ifǫ is sufficiently small,p (s|y) is agood approximation ofp (s|y). We note that there is noapproximation whenǫ = 0. From a practical computationalcost perspective, we will discuss why it is not feasible to setǫ = 0 and instead we must settle for selecting anǫ > 0 viaa trade-off between accuracy and computation cost. There areseveral guidelines to specify a sequence of tolerances thatonemay follow to achieve a givenǫ, see [29], [17], [21].

The following two sections provide specific backgrounddetail related to specifications of the weighting functions. Wethen finish this section by presenting the generic MCMC-ABCalgorithm used to obtain samples from the marginal posteriorp (s|y) in order to solve the MAP detection problem in (3).

A. Data Summaries

We note that when the total number of observations is large,then comparing directly the simulated datax to the observeddatay in the weighting function is not computationally effi-cient. Therefore, one considers dimension reduction. The stan-dard approach to dimension reduction is to compare summarystatistics, T (y), which should summarise the informationpresent in the data. The summary statisticT (y) is called asufficient statistic fory if and only if p (s|y) ∝ p (s|T (y)) ,i.e., according to the Neayman-Fisher factorisation theorem,we can replace the conditional distribution ofy given theparameters with the conditional distribution summary statisticsgiven the parameters and a constant with respect to theparameters. If a sufficient statistic is available, then usingthe summary statistics is essentially the same as using thewhole data. Therefore, asǫ → 0, the ABC approximationp(s|T (y))→ p(s|y)). However, in most real practical models,sufficient statistics are unknown and one must make alternativechoices.

B. Distance Metrics

Having obtained summary statistic vectorsT (y) andT (x),likelihood-free methodology then measures the distance be-tween these vectors using a distance metric, denoted generi-cally by ρ (T (y) ,T (x)). The most popular example involvesthe basic Euclidean distance metric which sums up the squarederror between each summary statistic as follows:

ρ (T (y) ,T (x)) =

dim(T )∑

i

(Ti (y) − Ti (x))2. (8)

Recently more advanced choices have been proposed and theirimpact on the methodology has been assessed [17]. Theseinclude: scaled Euclidean distance given by the square rootof

ρ (T (y) ,T (x)) =

dim(T )∑

i

Λi (Ti (y) − Ti (x))2 ; (9)

Mahalanobis distance:

ρ (T (y) ,T (x)) = (T (y)− T (x))Σ−1 (T (y) − T (x))T ; (10)

Lp norm:

ρ (T (y) ,T (x)) =

dim(T )∑

i

[|Ti (y)− Ti (x)|p]1/p

; (11)

and city block distance:

ρ (T (y) ,T (x)) =

dim(T )∑

i

|Ti (y)− Ti (x)| . (12)

In particular, we note that distance metrics which includeinformation regarding correlation between elements of eachsummary statistic vector, result in improved estimates of themarginal posterior.

C. MCMC-ABC Samplers

MCMC-based likelihood-free samplers were introduced toavoid the basic rejection sampling algorithms (see for example[30]) which are inefficient when the posterior and prior aresufficiently different [26]. Therefore, an MCMC approach thatis based on the ABC methodology has been devised [26]. Thegeneric MCMC-ABC sampler is presented in Algorithm 1.

Algorithm 1 generic MCMC-ABC sampler

Input: p (s)

Output:s[n]

N

n=1, samples fromp (s|y)

1: Initialise s[1] to an arbitrary staring point2: for n = 2, . . . , N do3: Proposes∗ ∼ q(s∗ ← s[n−1])4: Simulate synthetic data from the model,X ∼ p (x|s∗)5: if ρ (T (y) ,T (x)) ≤ ǫ then6: Compute the MCMC acceptance probability:

α(s∗, s[n−1]

)= min

1, p(s∗)

p(s[n−1])× q(s[n−1]←s

∗)q(s∗←s[n−1])

7: Generateu ∼ U [0, 1]8: if u ≤ α

(s∗, s[n−1]

)then

9: s[n] = s∗

10: else11: s[n] = s[n−1]

12: end if13: else14: s[n] = s[n−1]

15: end if16: end for

In the next section we present details of choices that mustbe made when constructing a likelihood-free inference model.

6

IV. MCMC-ABC BASED DETECTOR

We now relate the MCMC-ABC generic sampler to thespecific problem presented in this paper. As mentionedpreviously, the ABC method we consider embeds an in-tractable target posterior distribution, in our case denoted byp (s1:K , h1:L, g1:L|y), into a general augmented model

p (s1:K , h1:L, g1:L,x|y) ∝ p (y|x, s1:K , h1:L, g1:L)

p (x|s1:K , h1:L, g1:L) p (s1:K) p (h1:L) p (g1:L) ,(13)

wherex is an auxiliary vector on the same space asy. Inthis paper we make the standard ABC assumption, for theweighting function,p (y|x, s1:K , h1:L, g1:L) = p (y|x), see[21].

Hence, in the ABC context, we obtain a general ap-proximation to the intractable full posterior, denoted byp (s1:K , h1:L, g1:L|y). We are interested in the marginal tar-get posterior,p (s1:K |y), which, in the ABC framework isapproximated by

p (s1:K |y)

∫ ∫ ∫p (y|x, s1:K , h1:L, g1:L; ǫ) p (x|s1:K , h1:L, g1:L) p (s1:K)

p (h1:L) p (g1:L) dhdgdx

≈ p (s1:K)N∑

n=1

D∑

d=1

p(y|xd,[n]

, s1:K , h[n]1:L, g

[n]1:L; ǫ

)p(h[n]1:L

)p(g[n]1:L

)

:= p (s1:K |y) ,(14)

wherexd,[n] represents thed-th realisation at then-th stepof the Markov chain. In this paper we are interested in themarginal posterior ABC approximationp (s1:K |y), as we wishto formulate the MAP detector for the symbols.

Next we present the resulting MCMC-ABC algorithm toperform MAP detection of a sequence of transmitted symbols.In particular our MCMC sampler is comprised of a randomscan Metropolis-Hastings (MH) within Gibbs sampler [31].The Gibbs sampler is a special case of the MH algorithm.It is typically used when one wishes to update sub-blocks ofthe posterior parameters at each iteration of the Markov chain.This is especially beneficial when the full conditional posteriordistributions for each sub-block can be expressed in closedform expressions and sampled. In this case the transition kernelfor the Markov chain on the full set of posterior parametersbecomes a product of the full conditional posterior densities,resulting in acceptance probability of one, see [32]. If however,one cannot sample easily from any of the conditional posteriordensities, an MH algorithm is used to produce a sample. Thisalgorithm is referred to as a Metropolis-Hastings within Gibbssampler.

Algorithm 2 presents the details of the sampler, where weuse the compact notationΘ = (S,G,H). In order to utilise theMetropolis-Hastings algorithm, we now specify the Markovchain transition kernels forS,G,H, otherwise known as theproposal distributions for the sampler. We design the samplerto update a single component of the posterior parameters, atevery iteration of the algorithm, as it allows us to designproposals that insure a reasonable acceptance probabilityin theMarkov chain rejection step. The transition kernels, denotedgenerically byq

(θ∗ ← θ[n−1]

), which updates the Markov

Algorithm 2 MAP sequence detection algorithm usingMCMC-ABC

Initialize Markov chain state:1: Initialize n=1,S[1] ∼ p (s), g[1]1:L = g1:L, h[1]

1:L = h1:L

2: for n = 2, . . . , N doPropose new Markov chain state:Θ∗ given Θ[n−1].

3: Draw an indexi ∼ U [1, . . . ,K + 2L]4: Draw proposal

Θ∗ =[θ[n−1]1:i−1 , θ

∗, θ[n−1]i+1:K+2L

]from proposal distribu-

tion q(θ∗ ← θ[n−1]i ). (Note the proposal will depend on

which element of theΘ vector is being sampled.)ABC posterior (14):

5: Generate auxiliary variablesx(l)1 , . . . , x

(l)K from the

model, p(x(l)|θ∗

), for l = 1, . . . , L, to obtain a

realization ofX = x =[x(1), . . . ,x(L)

]⊤by:

(5.a) SampleW(l)∗ ∼ CN(0, σ2

wI), l ∈ 1, · · · , L.

(5.b) SampleV(l)∗ ∼ CN(0, σ2

vI), l ∈ 1, · · · , L.

(5.c) EvaluateX(l) = f (l)(S∗h(l)∗ +W(l)∗

)g(l)∗

+V(l)∗, l ∈ 1, · · · , L.6: Calculate a measure of distanceρ (T(y),T(x))7: Evaluate the acceptance probability

α(Θ[n−1]

,Θ∗

)= min

1,

p (θ∗|y, ǫn) q(θ∗ → θ

[n−1])

p (θ[n−1]|y, ǫn−1) q(θ[n−1] → θ∗)

,

wherepABC (θ∗|y, ǫn), depending whether HD or SDis used, is given by:

HD : p (θ∗|y, ǫn)

p (s∗1:K) p (h∗

1:L) p (g∗

1:L) , if ρ (T(y),T(x)) ≤ ǫn,

0, otherwise;

SD : p (θ∗|y, ǫn)

∝ exp

(−ρ (T(y),T(x))

ǫ2n

)p (s∗1:K) p (h∗

1:L) p (g∗

1:L) .

8: Sample random variateu, whereU ∼ U [0, 1].9: if u ≤ α

(Θ[n−1],Θ∗

)then

10: Θ[n] = Θ∗

11: else12: Θ[n] = Θ[n−1].13: end if14: end for

chain for the posterior parameters,θ, from the (n− 1)-thMarkov chain step to then-th via the proposalq. In thiscase we decomposeq into two components: the first, denotedby q(i), specifies the transition probability mass function forsampling a proposed element of the posterior parameter vectorto be updated; and the second, denoted byq

(θi|θ

[n−1]i

),

specifies the probability density for proposing a new Markovchain state for elementθi conditional on its previous state atiterationn−1. The specific transition kernels we specified forour algorithm are given by:

• Transition kernel forS1:K : draw a proposalS∗1:K from

7

distribution

q(s∗1:K ← s[n−1]1:K ) = q(i)q(si|s

[n−1]i )

=1

K

1

log2 Mδs[n−1]1:i−1

δs[n−1]i+1:K

.(15)

• Transition kernel forGi: draw proposalG∗i from distri-bution

q(g∗ ← g[n−1]i ) = q(l)q(g

(l)i |g

[n−1]i ) =

1

LCN

(g[n−1]i , σ

2g rw

).

(16)

• Transition kernel forHi: draw proposalH∗i from distri-bution

q(h∗ ← h[n−1]i ) = q(l)q(h

(l)i |h

[n−1]i ) =

1

LCN

(h[n−1]i , σ

2h rw

),

(17)

whereδφ denotes a dirac mass on locationφ, and q(i) andq(l) respectively denote the uniform probabilities of choosingindicesi ∈ 1, . . . ,K and l ∈ 1, . . . , L, andσ2

g rw, σ2h rw

represent the transition kernels variance. Thus, in every iter-ation we update a single component of the symbolss

[n]1:K , a

single component of the transmitter-relay channelsg[n]i , and a

single component of the relay-receiver channelsh[n]i .

We now present details about the choices made for the ABCcomponents of the likelihood-free methodology.

A. Observations and synthetic data

The datay = y1:K correspond to the observed sequence ofsymbols at the receiver. The generation of the synthetic datain the likelihood-free approach involves generating auxiliaryvariablesx(l)

1 , . . . , x(l)K from the model,p

(x(l)|s,g,h

), for l =

1, . . . , L, to obtain a realizationx =[x(1), . . . ,x(L)

]⊤. This

is achieved under the following steps:

a. SampleW(l)∗ ∼ CN(0, σ2

wI).

b. SampleV(l)∗ ∼ CN(0, σ2

vI)

c. Evaluate the system model in (2),

X(l) = f

(l)(S∗h(l)∗ +W

(l)∗)g(l)∗ +V

(l)∗, l ∈ 1, · · · , L .

B. Summary statistics

As discussed in Section III-A, summary statisticsT(·) areused in the comparison between the synthetic data and theobserved data via the weighting function. There are manypossible choices of summary statistic. Most critically, wewantthe summary statistics to be as close to sufficient statistics aspossible whilst low in dimension. The simplest choice is touseT (y) = y i.e. the complete dataset. This is optimal inthe sense that it does not result in a loss of information fromsummarising the observations. The reason this choice is rarelyused is that typically it will result in poor performance ofthe MCMC-ABC as discussed previously. To understand this,consider the HD rule weighting function in (6). In this case,even if the true MAP estimated model parameters were utilizedto generate the synthetic datax, it will still become improbableto realise a non-zero weight asǫ → 0. This is made worseas the number of observations increases, through the curseof dimensionality. As a result, the acceptance probabilityinthe MCMC-ABC algorithm would be zero for long periods,

resulting in poor sampler efficiency. See [25] for a relateddiscussion on chain mixing. We note that the exception tothis rule is when a moderate toleranceǫ and small number ofobservations are used.

A popular and practical alternative to utilizing the entiredata set, is to use empirical quantile estimates of the datadistribution. Here we adopt the vector of quantilesT (y) =[q0.1 (y) , . . . , q0.9 (y)], whereqα (y) denotes theα-level em-pirical quantile ofy, see for example [20], [24].

C. Distance metric

For the distance metricρ (T(y),T(x)), as a component ofthe weighting function, we use the Mahalanobis distance

ρ (T (y) ,T (x)) = [T (y) − T (x)]⊤

Σ−1x [T (y)− T (x)] ,

whereΣx is the covariance matrix ofT(x). In Section VII wecontrast this distance metric with scaled Euclidean distance,obtained by substituting diag(Σx) for Σx in the above.

We estimateΣx as the sample covariance matrix ofT(x),

based on2, 000 likelihood draws fromp(y|s1:K , h1:L, g1:L

)

where s1:K is a mode of the prior for the symbols andthe channel coefficients are replaced with the partial CSIestimates.

We note that in principal, the choice of matrixΣx isimmaterial, in the sense that in the limit asǫ → 0, thenp (s1:K |y) → p (s1:K |y) assuming sufficient statisticsT(x).However, in practice, algorithm efficiency is directly affectedby the choice ofΣx. We demonstrate this in Section VII.

D. Weighting function

For weighting functionp (y|x, s1:K , h1:L, g1:L) = p (y|x)we consider the HD weighting function in (6) and the SDweighting function in (7).

E. Tolerance schedule

With a HD weighting function, an MCMC-ABC algorithmcan experience low acceptance probabilities for extended pe-riods, particularly when the chain explores the tails of theposterior distribution (this is known as “sticking” c.f. [25]). Inorder to achieve improved chain convergence, we implementan annealed tolerance schedule during Markov chain burn-in,throughǫn = max

N − 10n, ǫmin

, whereǫn is the tolerance

at time n in the Markov chain,N is the overall number ofMarkov chain samples, andǫmin denotes the target toleranceof the sampler.

There is a trade-off between computational overheads (i.e.Markov chain acceptance rate) and the accuracy of the ABCposterior distribution relative to the true posterior. In thispaper we determineǫmin via preliminary analysis of theMarkov chain sampler mixing rates for a transition kernel withcoefficient of variation set to one. In general practitioners willhave a required precision in posterior estimates. This precisioncan be directly used to determine, for a given computationalbudget, a suitable toleranceǫmin.

8

F. Performance diagnostic

Given the above mixing issues, one should carefully monitorconvergence diagnostics of the resulting Markov chain for agiven tolerance schedule. For a Markov chain of lengthN ,the performance diagnostic we consider is the autocorrelationevaluated onN = N − Nb post-convergence samples afteran initial burn-in periodNb. Denoting byθ[n]i n=1:N theMarkov chain of thei-th parameter after burn-in, we definethe autocorrelation estimate at lagτ by

ACF (θi, τ) =1

(N − τ)σ (θi)

N−τ∑

n=1

[θ[n]i −µ (θi)][θ

[n+τ ]i −µ (θi)],

(18)whereµ (θi) and σ (θi) are the estimated mean and standarddeviation ofθi.

V. AUXILIARY VARIABLE MCMC APPROACH

In this section we demonstrate an alternative solution tothe previously presented MCMC-ABC detector. We will showthat at the expense of increasing the dimension of the pa-rameter vector, one can develop a standard MCMC algorithmwithout the requirement of the ABC methodology. We aug-ment the parameter vector with the unknown noise realiza-tions at each relay,w1:K , to obtain a new parameter vector(s1:K , g1:L, h1:L,w1:K). The resulting posterior distributionp (s1:K , g1:L, h1:L,w1:K |y) may then be decomposed into thefull conditional distributions:

p (s1:K |g1:L, h1:L,w1:K ,y) ∝ p (y|s1:K , g1:L, h1:L,w1:K) p (s1:K) ,(19a)

p (g1:L|s1:K , h1:L,w1:K ,y) ∝ p (y|s1:K , g1:L, h1:L,w1:K) p (g1:L) ,(19b)

p (h1:L|s1:K , g1:L,w1:K ,y) ∝ p (y|s1:K , g1:L, h1:L,w1:K) p (h1:L) ,(19c)

p (w1:K |s1:K , g1:L, h1:L,y) ∝ p (y|s1:K , g1:L, h1:L,w1:K) p (w1:K) ,(19d)

which form a block Gibbs sampling framework. Conditioningon the unknown noise random variables at the relays permitsa simple closed form solution for the likelihood and resultsin tractable full conditional posterior distributions for(19a)-(19d). In this case the likelihood is given by

p (y|s1:K , g1:L, h1:L,w1:K) =

L∏

l=1

p(y(l)|s1:K , g(l), h(l),w(l)

),

where

p(y(l)|s1:K , g

(l), h

(l),w

(l))= CN

(f(l)

(Sh

(l) +W(l))g(l), σ

2vI).

The resulting Metropolis-within Gibbs sampler for this blockGibbs framework is outlined in Algorithm 3, where we definethe joint posterior parameter vectorΘ = (S,G,H,W). TheMetropolis-Hastings proposals used to sample from each fullconditional distribution were given by eqs. (15)-(17), andtheadditional proposal for the auxiliary variables, given by

• Transition kernel forWi : draw proposalW∗i from

distribution

q(w∗ ← w[n−1]i ) = q(l)q(i)q(w

(l)i |w

[n−1]i )

=1

KLCN

(w

[n−1]i , σ2

w rw

).

(20)

Algorithm 3 MAP sequence detection algorithm using AV-MCMC

Initialize Markov chain state:1: Initialize n=1, S[1] ∼ p (s), g

[1]1:L = g1:L, h

[1]1:L = h1:L,

W(0) ∼ p (w)2: for n = 1, . . . , N do

Propose new Markov chain state:Θ∗ given Θ[n−1].3: Draw an indexi ∼ U [1, . . . ,K + 2L+KL]

4: Draw proposalΘ∗ =[θ[n−1]1:i−1 , θ

∗, θ[n−1]i+1:K+2L+KL

]from

proposal distributionq(θ

[n−1]i → θ∗).

(Note, the proposal will depend on which element oftheΘ vector is being sampled.)

5: Evaluate the acceptance probability

α(Θ[n−1],Θ∗

)= min

1,

p (θ∗|y) q(θ∗ → θ[n−1])

p(θ[n−1]|y

)q(θ[n−1] → θ∗)

.

6: Sample random variateu, whereU ∼ U [0, 1].7: if u ≤ α

(Θ[n−1],Θ∗

)then

8: Θ[n] = Θ∗

9: else10: Θ[n] = Θ[n−1].11: end if12: end for

The MCMC-AV approach presents an alternative to thelikelihood-free Bayesian model sampler, and produces exactsamples from the true posterior following chain convergence.While the MCMC-AV sampler still performs joint channelestimation and detection, the trade-off is that sampling thelarge number of extra parameters will typically result the needfor longer Markov chains, to achieve the same performance asthe ABC algorithm (in terms of joint estimation and detectionperformance). This is especially true in high dimensionalproblems, such as when the sequence of transmitted symbolsK is long and the number of relaysL present in the systemis large, or when the posterior distribution of the additionalauxiliary variables exhibits strong dependence.

VI. A LTERNATIVE MAP DETECTORS AND LOWER BOUNDS

One can define a suboptimal solution to the MAP detector,even with an intractable likelihood, involving a naive, highlycomputational algorithm based on a Zero Forcing (ZF) so-lution. The ZF solution is popular in simple system modelswhere it can be efficient and performs well.

Under a ZF solution one conditions on some knowledgeof the partial channel state information, and then perform anexplicit search over the set of all possible symbol sequences.To our knowledge a ZF solution for MAP sequence detectionin arbitrary non-linear relay systems has not been defined.Accordingly we define the ZF solution for MAP sequencedetection as the solution which conditions on the mean ofthe noise at the relay nodes, and also uses the noisy channelestimates given by the partial CSI information, to reduce thedetection search space.

9

A. Sub-optimal exhaustive search Zero Forcing approach

In this approach we condition on the mean of the noiseW(l) = 0, and use the partial CSI estimates of channels

coefficients,h(l), g(l)

L

l=1, to reduce the dimensionality of

the MAP detector search space to just the symbol spaceΩ.The SES-ZF-MAP sequence detector can be expressed as

s = argmaxs∈Ω

L∏

l=1

p(s|y(l)

, G(l) = g

(l),H

(l) = h(l),W

(l) = 0)

= argmaxs∈Ω

L∏

l=1

p(y(l)|s, G(l) = g

(l),H

(l) = h(l),W

(l) = 0)p (s) .

(21)

Thus, the likelihood model results in a complex Gaussiandistribution for each relay channel, as follows

p(y(l)|s, G(l) = g(l), H(l) = h(l),W(l) = 0

)

= CN

(f (l)

(sh(l)

)g(l), σ2

vI).

(22)

As a result, the MAP detection can be solved exactly usingan explicit search.

Note however that this approach to symbol detection alsoinvolves a very high computational cost, as one must evaluatethe posterior distribution for allMK code words inΩ. It isusual for communications systems to utiliseM as either64-ary PAM or 128-ary PAM and the number of symbols canbe anything fromK = 1 to K = 20 depending on thechannel capacity budget for the designed network and thetypical operating SNR level. Typically this explicit search isnot feasible to perform. However, the sub-optimal ZF-MAPdetector provides a comparison for the MCMC-ABC approach,which at low SNR should be a reasonable upper bound for theSER and for high SNR an approximate optimal solution.

The SES-ZF-MAP sequence detector can be highly sub-optimal for low SNR values. This is trivial to see, sincewe are explicitly setting the noise realisations to zero whenthe variance the noise distribution is large. For the samereasoning, in high SNR values, the ZF approach becomes closeto optimal.

B. Lower bound MAP detector performance

We denote the theoretical lower bound for the MAP detectorperformance as the oracle MAP detector (OMAP). The OMAPdetector involves conditioning on perfect oracular knowledgeof the channels coefficients

h(l), g(l)

L

l=1and of the realized

noise sequence at each relayW(l). This results in the likeli-hood model for each relay channel being complex Gaussian,resulting in an explicit solution for the MAP detector. Ac-cordingly, the OMAP detector provides the lower bound forthe SER performance. The OMAP detector is expressed as

s = argmaxs∈Ω

L∏

l=1

p(s|y(l)

,Π)= argmax

s∈Ω

L∏

l=1

p(y(l)|s,Π

)p (s) ,

whereΠ :=(G(l) = g(l), H(l) = h(l),W(l) = W(l)

).

In this case, the likelihood model results in a complex Gaus-

sian distribution for each relay channel, as follows

p(y(l)|s, G(l) = g(l), H(l) = h(l),W(l) = W(l)

)

= CN

(f (l)

(sh(l) +W(l)

)g(l), σ2

vI).

(23)

However, clearly this is impossible to evaluate in a real system,since oracular knowledge is not available.

VII. R ESULTS

We compare the performance of joint channel estimationand detection under the ABC relay methodology versus theauxiliary MCMC approach for different model configurations.Additionally, we compare the detection performance of theABC relay methodology, the auxiliary MCMC approach, theoptimal oracle MAP sequence detector and the SES-ZF MAPsequence detector. Each presented Monte Carlo sampler hasa burn-in of 5, 000 iterations followed by a further15, 000recorded iterations (N = 20, 000). Random walk proposalvariances for each of the parameters were tuned off-line toensure the average acceptance probability (post burn-in) wasin the range0.3 to 0.5.

The following specifications for the relay system model areused for all the simulations performed: the symbols are takenfrom a constellation which is4-PAM at constellation points in−3,−1, 1, 3; each sequence containsK = 2 symbols; theprior for the sequence of symbols isp ((s1, s2) = [1, 1]) =p ((s1, s2) = [−1, 1]) = 0.3 otherwise, all other are equi-probable; the partial CSI is given byσ2

g = σ2h = 0.1; the

nonlinear relay function is given byf (·) = tanh (·). Thesesystem parameters were utilised as they allow us to performthe zero forcing solution, without a prohibitive computationalburden.

A. Analysis of mixing and convergence of MCMC-ABC

We analyze the impact that the ABC tolerance levelǫmin

has on estimation performance of channel coefficients andthe mixing properties of the Markov chain for the MCMC-ABC algorithm. The study involves joint estimation of channelcoefficients and transmitted symbols at an SNR level of15dB,with L = 5 relays present in the system. We adopt a scaledEuclidean distance metric with the HD weighting function(6) and empirically monitor the mixing of the Markov chainthrough the autocorrelation function (ACF).

In Fig. 2 we present a study of the ACF of the Markovchains for the channel estimations ofG1 andH1 as a functionof the toleranceǫ, and the associated estimated marginalposterior distributionsp(g1|Y) and p(h1|Y). For largeǫ theMarkov chain mixes over the posterior support efficiently,since when the tolerance is large, the HD weighting functionand therefore the acceptance probability, will regularly benon-zero. In addition, with a large tolerance the posterioralmost exactly recovers the prior given by the partial CSI.This is expected since a large tolerance results in a weakcontribution from the likelihood. As the toleranceǫ decreases,the posterior distribution precision increases and there is atranslation from the prior partial CSI channel estimates to

10

−2−1

01

2

12

34

50

50

100

p(g1|y)ε0 100 200 300 400 500

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Lag

Estim

ate

d A

CF

(G

1)

5) ε = 0.005

4) ε = 0.1

3) ε = 0.5

2) ε = 1

1) ε = 10

−2−1

01

2

01

23

450

20

40

60

80

p(h1|y)

ε0 100 200 300 400 500

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Estim

ate

d A

CF

(H

1)

Fig. 2. Comparison of performance for MCMC-ABC with HardDecision weighting and Scaled Euclidean distance metric. Subplotson the left of the image display how the estimated ACF changesas a function of tolerance levelǫ for the estimated channels of therelay system. Subplots on the right of the image display a sequenceof smoothed marginal posterior distribution estimates forthe firstchannel of the relay, as the tolerance decreases. Note, the indexingof each marginal distribution with labels1, 2, . . . corresponds to thetolerance given in the legend on the left hand plots.

the posterior distribution over the true generated channelcoefficients for the given frame of data communications.

It is evident that the mixing properties of the MCMC-ABC algorithm are impacted by the choice of tolerance level.A decreasing tolerance results in more accurate posteriordistribution, albeit at the price of slower chain mixing. Clearly,the ACF tail decay rate is significantly slower as the tolerancereduces.

Note, that although the results are not presented here, wealso performed analysis for all aspects of Algorithm 1 underthe setting in which the relay function is linear. We confirmedthe MMSE estimates of the channel coefficients and theMAP sequence detector results were accurate for a range ofSNR values. The results presented here are for a much morechallenging setting in which the relay is highly non-linear,given by a hyperbolic tangent function.

B. Analysis of ABC model specifications

We now examine the effect of the distance metric andweighting function on the performance of the MCMC-ABCalgorithm as a function of the tolerance. We consider HD andSD weighting functions with both Mahalanobis and scaledEuclidean distance metrics. We consider the estimated ACFof the Markov chains of each of the channel coefficientsG

andH . The SNR level was set to15dB andL = 5 relays werepresent. The results are presented for one channel; since allchannels are i.i.d. this will be indicative of the performanceof all channels.

For comparison, an equivalent ABC posterior precisionshould be obtained under each algorithm. Since, the weightingand distance functions are different, this will result in different

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Abs

olut

e es

timat

ed m

axim

um d

ista

nce

betw

een

edf a

nd ‘‘

true

" ed

f

εmin

SD MahalanobisHD MahalanobisHD scaled EuclideanSD scaled Euclidean

Fig. 3. Maximum distance between the EDF and the baseline “true”EDF for the first channel, estimated cdf forG1, averaged over 20independent data realisations.

ǫmin values for each choice in the MCMC-ABC algorithm.As a result, analysis proceeded by first taking a minimumbase epsilon value,ǫb = 0.2 and running the MCMC-ABC with soft decision Gaussian weighting and Mahalanobisdistance for100, 000 simulations. We ensured that the averageacceptance rate was between[0.1, 0.3]. This produced a “true”empirical distribution function (EDF) which we used as ourbaseline comparison estimate of the true cdf. Now, for arange of tolerance valuesǫi = [0.25, 0.5, 0.75, 1] we ran theMCMC-ABC algorithm with soft decision Gaussian weightingand Mahalanobis distance for20, 000 iterations, ensuring theaverage acceptance rate was in the interval[0.1, 0.3]. Thisproduced a set of random walk standard deviationσb(i),one for each tolerance, that we used for the analysis inthe remaining choices of the MCMC-ABC algorithms. Forcomparison purposes we recorded the estimated maximumerror between the estimated EDF for each algorithm and thebaseline “true” EDF. We repeated the simulations for eachtolerance on20 independent generated data sets.

In Fig. 3 we present the results of this analysis, whichdemonstrate that the algorithm producing the most accurateresults utilised the soft decision and Mahalanobis distance.The worst performance involved the hard decision and scaledEuclidean distance. In this case, at low tolerances the aver-age distance between the EDF and the baseline “true” EDFwas a maximum, since the algorithm was not mixing. Thisdemonstrates that such low tolerances under this setting ofthe MCMC-ABC algorithm will produce poor performance,relative to the soft decision with Mahalanobis distance.

C. Comparisons of detector performance

Finally, we present an analysis of the symbol error rate(SER) under the MCMC-ABC algorithm (with a SD weightingfunction and Mahalanobis distance), the MCMC-AV detectoralgorithm, and the SES-ZF and Oracle detectors. Specifi-cally, we systematically study the SER as a function ofthe number of relays,L ∈ 1, 2, 5, 10 and the SNR∈0, 5, 10, 15, 20, 25, 30.

11

0 5 10 15 20 25 30

10−2

10−1

SNR [dB]

Sym

bo

l err

or

rate

L=1

OMAPSES−ZFMCMC−AVMCMC−ABC

0 5 10 15 20 25 30

10−3

10−2

10−1

SNR [dB]

Sym

bo

l err

or

rate

L=2

0 5 10 15 20 25 3010

−4

10−3

10−2

10−1

SNR [dB]

Sym

bo

l err

or

rate

L=5

0 5 10 15 20 25 3010

−4

10−3

10−2

10−1

SNR [dB]

Sym

bo

l err

or

rate

L=10

Fig. 4. SER performance of each of the proposed detector schemesas a function of the number of relay links,L. For each relay set upthe SER is reported as a function of the SNR.

The results of this analysis are presented in Fig. 4. In sum-mary these results demonstrate that under our proposed systemmodel and detection algorithms, spatial diversity stemmingfrom an increasing number of relays results in measurableimprovements in the SER performance. For example Fig.4 demonstrates that forL = 1, there is an insignificantdifference between the results obtained for algorithms MCMC-ABC, MCMC-AV and SES-ZF. However, asL increases, SES-ZF has the worst performance and degrades relative to theMCMC-based approaches, demonstrating the utility obtainedby developing a more sophisticated detector algorithm. It isclear that the SES-ZF suffers from an error-floor effect: asthe SNR increases the SER is almost constant for SNR valuesabove15dB.

Also in Fig. 4 the two MCMC-based approaches demon-strate comparable performance for smallL. However asLincreases, in the high SNR region, the difference in perfor-mance between the MCMC-AV and MCMC-ABC algorithmsincreases. This could be due to the greater numbers of auxil-iary parameters to be estimated in the auxiliary-based approachasL increases. In particular we note that adding an additionalrelay introducesK additional nuisance parameters into theauxiliary model posterior.

VIII. C ONCLUSIONS

In this paper, we proposed a novel cooperative relay systemmodel and then obtained novel detector algorithms. In particu-lar, this involved an approximated-MAP sequence detector fora coherent multiple relay system, where the relay processingfunction is non-linear. Using the ABC methodology we wereable to perform ”likelihood free” inference on the parametersof interest. Simulation results validate the effectiveness of theproposed scheme. In addition to the ABC approach, we devel-oped an alternative exact novel algorithm, MCMC-AV, basedon auxiliary variables. Finally, we developed a sub-optimalzero forcing solution. We then studied the performance of each

algorithm under different settings of our relay system model,including the size of the network and the noise level present.As a result of our findings, we recommend the use of theMCMC-ABC detector especially when there are many relayspresent at the network, or the number of symbols transmittedin each frame is large. In settings where the number of relaysismoderate, one could consider using the MCMC-AV algorithm,as its performance was on par with the MCMC-ABC resultsand does not involve an ABC approximation.

Future research includes the design of detection algorithmsfor relay systems with partial CSI in which the relay systemtopology may contain multiple hops on a given channel, orthe relay network topology may be unknown. This can includeaspects such as an unknown number of relay channels or hopsper relay channel.

AcknowledgementsThe authors would also like to thank the reviewers for theirthorough and thoughtful comments. GWP thanks Prof. ArnaudDoucet and Dr. Mark Briers for useful discussion and the De-partment of Mathematics and Statistics at UNSW (through anAPA) and CMIS, CSIRO for financial support. This materialwas based upon work partially supported by the National Sci-ence Foundation under Grant DMS-0635449 to the Statisticaland Applied Mathematical Sciences Institute, North Carolina,USA. Any opinions, findings, and conclusions or recommen-dations expressed in this material are those of the author(s) anddo not necessarily reflect the views of the NSF. SAS and YFwere supported by the Australian Research Council throughthe Discovery Project scheme (DP0664970 and DP1092805).IN and JY were supported by the Australian Research Councilthrough the Discovery Project scheme (DP0667030).

REFERENCES

[1] A. Nosratinia, T. Hunter, and A. Hedayat, “Cooperative communicationin wireless networks,”IEEE Communications Magazine, vol. 42, no. 10,pp. 74–80, 2004.

[2] J. Laneman, D. Tse, and G. Wornell, “Cooperative diversity in wirelessnetworks: Efficient protocols and outage behavior,”Information Theory,IEEE Transactions on, vol. 50, no. 12, pp. 3062–3080, 2004.

[3] E. van der Meulen, “Three-terminal communication channels,” Adv.Appl. Prob, vol. 3, no. 1, pp. 120–154, 1971.

[4] J. Laneman and G. Wornell, “Distributed space-time-coded protocolsfor exploiting cooperative diversity in wireless networks,” InformationTheory, IEEE Transactions on, vol. 49, no. 10, pp. 2415–2425, 2003.

[5] T. Cover and A. Gamal, “Capacity theorems for the relay channel,”Information Theory, IEEE Transactions on, vol. 25, no. 5, pp. 572–584,1979.

[6] D. Chen and J. Laneman, “Modulation and Demodulation forCoop-erative Diversity in Wireless Systems,”IEEE Transactions on WirelessCommunications, vol. 5, no. 7, pp. 1785–1794, 2006.

[7] G. Kramer, M. Gastpar, and P. Gupta, “Cooperative Strategies andCapacity Theorems for Relay Channels,”IEEE Transactions on Infor-mation Theory, vol. 51, no. 9, pp. 3037–3063, 2005.

[8] M. Khojastepour, A. Sabharwal, and B. Aazhang, “On the capacity ofcheap relay networks,” pp. 12–14, 2003.

[9] K. Gomadam and S. Jafar, “Optimal relay functionality for SNRmaximization in memoryless relay networks,”IEEE Journal on SelectedAreas in Communications, vol. 25, no. 2, pp. 390–401, 2007.

[10] L. Ghabeli and M. Aref, “A new achievable rate and the capacity ofsome classes of multilevel relay network,”EURASIP Journal on WirelessCommunications and Networking, vol. 2008, p. 38, 2008.

[11] D. Crouse, C. Berger, S. Zhou, and P. Willett, “Optimal memorylessrelays with noncoherent modulation,”IEEE Trans. Signal Processing,vol. 56, no. 12, p. 1, 2008.

12

[12] S. Yao, M. Khormuji, and M. Skoglund, “Sawtooth relaying,” IEEECommunication Letters, vol. 12, no. 9, pp. 612–615, 2008.

[13] T. Cui, T. Ho, and J. Kliewer, “Some results on relay strategies formemoryless two-way relay channels,”Proc. of Information Theory andApplications Workshop, pp. 158–164, 2008.

[14] O. Ratmann, O. Jørgensen, T. Hinkley, M. Stumpf, S. Richardson,and C. Wiuf, “Using likelihood-free inference to compare evolutionarydynamics of the protein networks of H. pylori and P. falciparum,” PLoSComput Biol, vol. 3, no. 11, pp. 2266–2276, 2007.

[15] M. Beaumont, W. Zhang, and D. Balding, “Approximate BayesianComputation in Population Genetics,”Genetics, vol. 162, no. 4, pp.2025–2035, 2002.

[16] G. Peters, M. Wuthrich, and P. Shevchenko, “Chain Ladder Method:Bayesian Bootstrap versus Classical Bootstrap,”Accepted for publica-tion in Insurance: Mathematics and Economics.

[17] G. Peters, Y. Fan, and S. Sisson, “On sequential Monte Carlo,partial rejection control and approximate Bayesian computation,”arXiv:0808.3466v2, 2008.

[18] S. Sisson, G. Peters, Y. Fan, and M. Briers, “Likelihood-free samplers,” Preprint - UNSW statistics.

[19] S. Sisson and Y. Fan, “Likelihood-free Markov chain Monte Carlo,”Arxiv preprint arXiv:1001.2058, 2010.

[20] P. Bortot, S. Coles, and S. Sisson, “Inference for Stereological Ex-tremes,” Journal-American Statistical Association, vol. 102, no. 477,pp. 84–92, 2007.

[21] O. Ratmann, C. Andrieu, C. Wiuf, and S. Richardson, “Model criticismbased on likelihood-free inference, with an application toprotein net-work evolution,” Proceedings of the National Academy of Sciences ofthe United States of America, 2009.

[22] T. Toni, D. Welch, N. Strelkowa, A. Ipsen, and M. Stumpf,“Approxi-mate Bayesian computation scheme for parameter inference and modelselection in dynamical systems,”Journal of The Royal Society Interface,vol. 6, no. 31, p. 187, 2009.

[23] G. Hamilton, M. Currat, N. Ray, G. Heckel, M. Beaumont, and L. Ex-coffier, “Bayesian estimation of recent migration rates after a spatialexpansion,”Genetics, vol. 170, no. 1, pp. 409–417, 2005.

[24] M. Tanaka, A. Francis, F. Luciani, and S. Sisson, “UsingapproximateBayesian computation to estimate tuberculosis transmission parametersfrom genotype data,”Genetics, vol. 173, no. 3, p. 1511, 2006.

[25] G. Peters and S. Sisson, “Bayesian inference, Monte Carlo sampling andoperational risk,”Journal of Operational Risk, vol. 1, no. 3, pp. 27–50,2006.

[26] P. Marjoram, J. Molitor, V. Plagnol, and S. Tavare, “Markov chain MonteCarlo without likelihoods,”Proceedings of the National Academy ofSciences, vol. 100, no. 26, pp. 15 324–15 328, 2003.

[27] R. R. W. and P. A. N., “A theoretical framework for approximateBayesian computation.”Proceedings of the International Workshop forStatistical Modelling, pp. 393–396, 2005.

[28] M. Blum, “Approximate Bayesian Computation: a non-parametric per-spective,”Arxiv preprint arXiv:0904.0635, 2009.

[29] P. Del Moral, A. Doucet, and A. Jasra, “An adaptive sequential MonteCarlo method for approximate Bayesian computation,”preprint, 2009.

[30] V. Plagnol and S. Tavare,Approximate Bayesian computation andMCMC. Springer Verlag, 2004.

[31] S. Chib and E. Greenberg, “Understanding the Metropolis-Hastingsalgorithm,” The American Statistician, vol. 49, no. 4, pp. 327–335, 1995.

[32] W. Gilks and D. Spiegelhalter,Markov chain Monte Carlo in practice.Chapman & Hall/CRC, 1996.