A low complexity branch-and-bound-based decoder for V-BLAST systems with PSK signals

9
A low complexity branch-and-bound-based decoder for V-BLAST systems with PSK signals Minh-Tuan Le a , Van-Su Pham b , Linh Mai b , Giwan Yoon b, a Faculty of Electronics No.1, Posts and Telecommunications Institute of Technology (PTIT), Km 10, Nguyen Trai Street, Ha Dong City, Ha Tay, Viet Nam b School of Engineering, Information and Communications University (ICU), 103-6 Munji-dong, Yusong-gu, Taejon 305-732, Republic of Korea article info Article history: Received 6 April 2008 Received in revised form 7 August 2008 Accepted 20 August 2008 Available online 29 August 2008 Keywords: Space-time coding Multiple-input multiple-output Maximum likelihood detection Diversity Wireless communication abstract In this paper, a fast sphere decoder is proposed for the joint detection of phase-shift keying (PSK) signals in uncoded Vertical Bell Laboratories Layered Space Time (V-BLAST) systems. The proposed decoder consists of preprocessing stage and search stage. The preprocessing stage is based on the enhancement of the ordering mean square error decision feedback equalizer (MMSE-DFE). Its role is to generate a tree structure suitable for the search stage. The search stage relies on the depth-first brand-and-bound (BB) algorithm with ‘‘best-first’’ orders stored in lookup tables. Simulation results show that the proposed decoder is able to provide the system with the maximum likelihood (ML) performance at low complexity. & 2008 Elsevier B.V. All rights reserved. 1. Introduction Recently, research on the application of multiple transmit and receive antennas, i.e., multiple input multiple output (MIMO) systems, to wireless communication systems has been an important issue because MIMO systems are theoretically shown to significantly improve spectral efficiencies [1]. To achieve high spectral efficien- cies, a MIMO technique, known as the Vertical Bell Laboratories Layered Space Time (V-BLAST) architecture [2], has been reported. In order to achieve an optimal system performance, maximum likelihood (ML) decoder is required for the V-BLAST. However, brute-force maximum likelihood (BFML) detection is not a practical approach due to its complexity, which is exponential with the number of transmit antennas. In order to achieve ML performance at reduced complex- ity, a class of detection algorithms, collectively referred to as sphere decoders (SDs), has been developed [3–11]. Both computer simulation and theoretical analysis showed that the average complexity of SDs was remarkably smaller than that of the BFML decoder in many practical scenarios [6,7]. Most of the SDs [3–8] were originally designed based on integer lattice theory. In this paper, these SDs are referred to as the real SDs. In order for the real SDs to be employed, a complex MIMO system needs to be de- coupled into its real and imaginary parts so as to form an equivalent real-valued system. Therefore, real SDs are most appropriate for lattice-based modulation schemes such as quadrature amplitude modulation (QAM) or pulse amplitude modulation (PAM). For other complex constel- lations such as phase-shift keying (PSK), the real SDs are inefficient due to the existence of invalid candidates. In [9], Cui et al. proposed an efficient method of eliminating invalid candidates when a real SD was used to detect PSK signals. Thanks to this method, a real SD can provide a PSK-modulated MIMO system with ML performance. The problem of invalid candidates facing the real SDs can also be solved by using a complex SD introduced in [10] because this complex SD does not require the decoupling of the complex system. A reduced complexity Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/sigpro Signal Processing ARTICLE IN PRESS 0165-1684/$ - see front matter & 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2008.08.005 Corresponding author. Tel.: +82 42 866 6201; fax: +82 42 866 6227. E-mail addresses: [email protected] (M.-T. Le), [email protected] (V.-S. Pham), [email protected] (L. Mai), [email protected] (G. Yoon). Signal Processing 89 (2009) 197–205

Transcript of A low complexity branch-and-bound-based decoder for V-BLAST systems with PSK signals

ARTICLE IN PRESS

Contents lists available at ScienceDirect

Signal Processing

Signal Processing 89 (2009) 197–205

0165-16

doi:10.1

� Cor

E-m

(V.-S. P

journal homepage: www.elsevier.com/locate/sigpro

A low complexity branch-and-bound-based decoder for V-BLASTsystems with PSK signals

Minh-Tuan Le a, Van-Su Pham b, Linh Mai b, Giwan Yoon b,�

a Faculty of Electronics No. 1, Posts and Telecommunications Institute of Technology (PTIT), Km 10, Nguyen Trai Street, Ha Dong City, Ha Tay, Viet Namb School of Engineering, Information and Communications University (ICU), 103-6 Munji-dong, Yusong-gu, Taejon 305-732, Republic of Korea

a r t i c l e i n f o

Article history:

Received 6 April 2008

Received in revised form

7 August 2008

Accepted 20 August 2008Available online 29 August 2008

Keywords:

Space-time coding

Multiple-input multiple-output

Maximum likelihood detection

Diversity

Wireless communication

84/$ - see front matter & 2008 Elsevier B.V. A

016/j.sigpro.2008.08.005

responding author. Tel.: +82 42 866 6201; fax:

ail addresses: [email protected] (M.-T. Le), van

ham), [email protected] (L. Mai), gwyoon@icu

a b s t r a c t

In this paper, a fast sphere decoder is proposed for the joint detection of phase-shift

keying (PSK) signals in uncoded Vertical Bell Laboratories Layered Space Time (V-BLAST)

systems. The proposed decoder consists of preprocessing stage and search stage. The

preprocessing stage is based on the enhancement of the ordering mean square error

decision feedback equalizer (MMSE-DFE). Its role is to generate a tree structure suitable

for the search stage. The search stage relies on the depth-first brand-and-bound (BB)

algorithm with ‘‘best-first’’ orders stored in lookup tables. Simulation results show that

the proposed decoder is able to provide the system with the maximum likelihood (ML)

performance at low complexity.

& 2008 Elsevier B.V. All rights reserved.

1. Introduction

Recently, research on the application of multipletransmit and receive antennas, i.e., multiple input multipleoutput (MIMO) systems, to wireless communicationsystems has been an important issue because MIMOsystems are theoretically shown to significantly improvespectral efficiencies [1]. To achieve high spectral efficien-cies, a MIMO technique, known as the Vertical BellLaboratories Layered Space Time (V-BLAST) architecture[2], has been reported. In order to achieve an optimalsystem performance, maximum likelihood (ML) decoder isrequired for the V-BLAST. However, brute-force maximumlikelihood (BFML) detection is not a practical approach dueto its complexity, which is exponential with the number oftransmit antennas.

In order to achieve ML performance at reduced complex-ity, a class of detection algorithms, collectively referred to as

ll rights reserved.

+82 42 866 6227.

[email protected]

.ac.kr (G. Yoon).

sphere decoders (SDs), has been developed [3–11]. Bothcomputer simulation and theoretical analysis showed thatthe average complexity of SDs was remarkably smaller thanthat of the BFML decoder in many practical scenarios [6,7].

Most of the SDs [3–8] were originally designed basedon integer lattice theory. In this paper, these SDs arereferred to as the real SDs. In order for the real SDs to beemployed, a complex MIMO system needs to be de-coupled into its real and imaginary parts so as to form anequivalent real-valued system. Therefore, real SDs aremost appropriate for lattice-based modulation schemessuch as quadrature amplitude modulation (QAM) or pulseamplitude modulation (PAM). For other complex constel-lations such as phase-shift keying (PSK), the real SDs areinefficient due to the existence of invalid candidates. In[9], Cui et al. proposed an efficient method of eliminatinginvalid candidates when a real SD was used to detect PSKsignals. Thanks to this method, a real SD can provide aPSK-modulated MIMO system with ML performance.

The problem of invalid candidates facing the real SDscan also be solved by using a complex SD introduced in[10] because this complex SD does not require thedecoupling of the complex system. A reduced complexity

ARTICLE IN PRESS

M.-T. Le et al. / Signal Processing 89 (2009) 197–205198

version of the complex SD in [10], referred to as the PhamSD, was presented in [11]. The key idea behind the PhamSD is the utilization of the depth-first branch-and-boundalgorithm [12]. In the Pham SD, potential candidatesat each layer are determined based on the original ideain [10]. Then, an ordering procedure is applied to thosecandidates so as to maximize the probability that the firstfeasible solution is the optimal one. In addition, insteadof restarting the search from the root node, the searchresumes with the path, which is closest to completion.As a consequence, the Pham SD offers a noticeablereduction in the computational load as compared to theoriginal SD in [10]. Nonetheless, one disadvantage of boththe original complex SD in [10] and the Pham SD is thatthey have to deal with the computationally inefficientcos�1 operation, thereby slowing down the decodingprocess.

In this paper, we present a fast sphere decoder for thedetection of PSK signals in V-BLAST systems. The proposeddecoder, called sphere decoder for PSK signals (or PSD forshort), is composed of preprocessing and search stages.The preprocessing is based on the enhanced MMSE-DFE(E-MMSE-DFE) technique along with the sorted QRdecomposition [15]. It has been shown that the MMSE-DFE filter can be established via the definition of theaugmented channel matrix [13] or the extended channelmatrix [14]. By the E-MMSE-DFE technique we meanthat the extended channel matrix is given in a generalform. With the aid of the preprocessing stage, the PSD iscapable of detecting PSK signals in a V-BLAST systemhaving any number of receive antennas without introdu-cing any significant performance loss as compared to theoptimal performance. The search stage of the PSD is alsobuilt on the depth-first brand-and-bound (BB) algorithmas the Pham SD. However, having the same workingprinciple as the PMLD decoder presented in [16], it isbasically different from the Pham SD in the followingpoints:

1.

It eliminates the evaluation of the cos�1 operation. 2. It performs the ordering of distance metrics in a

simpler way without having to explicitly computingall the distance metrics.

3.

The optimal (best-first) orders are stored in lookuptables in terms of indexes of symbols in the transmis-sion constellation.

Simulation results show that, in comparison with thePham SD [11], the proposed PSD offers a remarkabledecrease in the computational load without causing anysignificant performance degradation.

The rest of the paper is organized as follows. In Section2, the system model is introduced. Section 3 investigatesthe preprocessing stage. In Section 4, we briefly review theoperation of the Pham SD and present the search stage ofthe PSD. Section 5 discusses the impact of the preproces-sing stage on the performance and complexity of the PSD.Simulation results and discussion are presented in Section6. And the final section concludes this paper.

2. System model

We consider an uncoded V-BLAST configuration withnT transmit and nR receive antennas, denoted as ðnT;nRÞ

system.At the transmitter, the input data sequence is parti-

tioned into nT sub-streams (layers), each of which is thenmodulated by an M-PSK modulation scheme, M ¼ 2n forsome positive integer n, and transmitted from eachdifferent transmit antenna. For the sake of simplicity, weinvestigate one-time-slot complex baseband signal model,where at each symbol period a nT � 1 transmit signalvector s consisting of nT symbols, si, i ¼ 1; . . . ;nT, is sentthrough nT transmit antennas. Under the assumptionsthat the signals are narrow-band and the channel is quasi-static, i.e., it remains constant during some block ofarbitrary length and changes from one block to another,the relationship between transmitted and received signalscan be expressed in the following form:

r ¼ Hsþw (1)

where r ¼ ½r1; . . . ; rnR�T is the nR � 1 received signal vector,

ð�ÞT denotes the transpose operator, w ¼ ½w1; . . . ;wnR

�T

represents the noise samples at nR receive antennas,which are modelled as independent samples of a zero-mean complex Gaussian random variable with noisevariance s2, H is the nR � nT channel matrix, whoseentries are the path gains between transmit and receiveantennas modelled as the samples of a zero-meancomplex Gaussian random variable with equal varianceof 0.5 per real dimension. Besides, we assume that thesignals transmitted from individual antenna have equalpowers of P=nT, i.e., EfssHg ¼ P=nTInT

, where ð:ÞH denotesthe Hermitian transpose operator, InT

indicates the nT � nT

identity matrix, and Ef�g denotes the expectation operator.Under the assumption that H is perfectly known at the

receiver, the transmitted vector can be recovered by usingML decoder according to:

s ¼ arg mins2Okr�Hsk2 (2)

where O is the transmission constellation, kAk denotes theEuclidean norm of matrix A defined by kAk2 ¼ trðAHAÞ,trðAÞ denotes the trace of A.

3. Preprocessing stage of the PSD

Based on the concept of extended channel matrix andextended received signal vector in [14], we define a ðnT þ

nRÞ � nT extended channel matrix H and a ðnT þ nRÞ � 1extended received signal vector r as follows:

H ¼H

bInT

" #and r ¼

r

0nT�1

" #(3)

where b is an arbitrary real number.By applying the sorted QR decomposition [15] to H we

obtain

H ¼ QR (4)

where Q is a ðnT þ nRÞ � nT matrix with orthonormalcolumns and R is a nT � nT upper triangular matrix.

ARTICLE IN PRESS

Fig. 1. Illustration of the search criterion of the Pham SD.

M.-T. Le et al. / Signal Processing 89 (2009) 197–205 199

Rather than detecting the transmitted vector s basedon (2), the proposed PSD decoder will detect s using

s ¼ arg mins2Okv� Rsk2 (5)

¼ arg mins2O

XnT

k¼1

jxk � Rk;kskj2 (6)

where v ¼ Q H r is the nT � 1 received signal vector afterbeing filtered by the feed-forward matrix Q , xk ¼ vk�PnT

i¼kþ1Rk;isi.From (3) one can see that if b ¼ 0, Q and R are the

zero-forcing DFE forward and backward filters. Whereas, ifb ¼ bMMSE ¼

ffiffiffiffiffiffiffiffiffiffinT=g

p, where g ¼ P=s2 is the SNR at each

receive antenna (see Appendix A), Q and R become theMMSE-DFE forward and backward filters. b can also be setequal to any non-zero value. Therefore, the exploitedpreprocessing technique is referred to as the E-MMSE-DFEpreprocessing. It is worth noting that by selecting ba0,the augmented channel matrix H in (3) is always of full-column rank for any nR, thus resulting in the uppertriangular matrix R with all positive diagonal entries.Therefore, the E-MMSE-DFE preprocessing enables thePSD to work with any number of receive antennas.

4. Search stage of the PSD

For completeness, in this section, we first show in briefhow the Pham SD [11] operates assuming that it is used tosolve (5). Readers are referred to [11] for the details of thePham SD.

4.1. Working principle of the Pham SD

The Pham SD mainly involves the computation ofcoordinate bounds on each element of s, i.e., the boundsfor each layer, and the search for the ML solution utilizingthe bounds and the depth-first BB algorithm.

Computation of coordinate bounds: In the Pham SD, thebounds on sk, k ¼ 1; . . . ;nT, are calculated in exactly thesame way as done by the original SD [10]. The idea of [10]is to represent a PSK constellation as a concentric ring,which is defined by rejy, where r is the ring radius,j ¼

ffiffiffiffiffiffiffi�1p

, and y 2 ½0;2p� is the phase associated with someconstellation point on that ring. The bound correspondingto sk will be determined by the intersection between theconstellation ring and a circle (or a disk) of square radius C

centered at xk.To obtain the bounds, we define

lk ¼xk

Rk;k¼

1

Rk;kvk �

XnT

i¼kþ1

Rk;isi

!(7)

where si is the tentative decision of si, and xk is xk

evaluated at si ¼ si, i.e., xk ¼ xkjsi¼si. We also define the

quantity

Z ¼ 1

2rjlkjr2 þ jlkj

2 �C

R2k;k

!(8)

where C is the current square sphere radius.Now, the bounds are specified based on Z as follows. If

Zo� 1, all the signal points on the ring are the candidates

for sk. If Z41, there are no points on the ring being thecandidates for sk. For the case �1pZp1, only the pointswhose phases stay within the range

½ylk� cos�1ðZÞ; ylk

þ cos�1ðZÞ� (9)

are candidates for sk, where ylkis the phase of lk, and

0pcos�1ð�Þpp.Searching strategy: The search strategy of the Pham SD

is illustrated in Fig. 1 for nT ¼ 3.In Fig. 1, the boldface number indicates the order of

that node visited by the search process and the number inparenthesis indicates the total distance metric accumu-lated to that point. In the Pham SD, after finding a feasiblesolution, rather than restarting the search from the rootnode, the search resumes with the path closest to thecompletion so that new feasible solutions are obtained asquickly as possible. For example, after obtaining the firstfeasible solution via the path 0! 1! 2! 3, the decoderselects node 4 at level k ¼ 2. The order of selecting a nodeat a level is decided by a ‘‘best-first’’ rule, that is, the nodehaving the least accumulated distance metric will beselected first. The selection of node 4 at level 2 leads to anew feasible solution indicated by the path 0! 1! 4!5 with an accumulated metric of 1.9. After updating thenew solution and current best distance metric, thedecoder selects the remaining node 6 at level 2 and findsthat the metric of this node is larger than 1.9. Thus, itterminates the search at level 2 and resumes with node 7at level 3. The overall decoding process finishes when allnodes at level 3 are visited or when the metric accrued tosome node at this level is greater than the current bestdistance metric.

One can observe from the operation principle of thePham SD that it requires the evaluation of the cos�1

function, the computation and sort of the distance metricsat each level. Now we will present our method toeliminate the burden of the cos�1 computation and thesort of the distance metrics at each level so as to reducedetection complexity.

4.2. The search stage of the PSD

From the Pham SD, we see that at level k if Zo� 1,then all the constellation points will be the candidates forsk. In this case we do not have to evaluate the cos�1

function. This implies that, at level k, the Pham SD canoperate by taking all the constellation points into

ARTICLE IN PRESS

M.-T. Le et al. / Signal Processing 89 (2009) 197–205200

consideration without having to calculate the bounds in(9). However, when all the constellation points are thecandidates for sk, the Pham SD will have to compute andsort the distance metrics for all these points, leading toanother computational bottleneck. Therefore, our task isto efficiently resolve this computational problem. Illu-strated below is the way to determine the best-first rulebased on lookup tables without having to explicitlycompute and sort the distance metrics for all constellationpoints at level k.

Consider a M-PSK constellation defined by

O ¼ frejðð2nþ1ÞpÞ=M : n ¼ 0;1; . . . ;M � 1g (10)

Let x ¼ ðx1; . . . ; xMÞ be an M � 1 vector containing M

symbols 2 O in accordance with the ordern ¼ 0;1; . . . ;M � 1. From Eq. (6), the problem of findingthe best-first order at level k amounts to the problem ofdetermining which ones among M values of Rk;kxi,i ¼ 1; . . . ;M, are the first closest, second closest, and soon, to the value of xk.

For illustration, let us consider the case M ¼ 8, i.e., 8-PSK. Shown in Fig. 2 are the values Rk;kxi, i ¼ 1; . . . ;8, andxk in a complex plane. Since Rk;k is a real and positivenumber, the phases of Rk;kxi are the same as those of xi forall i. Fig. 2 suggests a very simple way to obtain the best-first order of all signal points at level k just by comparingthe slopes of different straight lines passing the origin[16].

First, the slope of the straight line passing the originand the point xk can be computed as

ak ¼Ifxkg

Rfxkg(11)

Fig. 2. Illustration of determining the ‘‘best-first’’ order of symbols at

level k for a 8-PSK constellation.

where Rfag and Ifag, respectively, denote the real andimaginary parts of complex number a. Then, by comparingak in (11) with the slopes b1, b2, b3, and b4 of the four solidlines (referred to as the ‘‘b-boundaries’’ for short), we areable to locate the region where the point xk belongs to. Asa consequence, the first symbol to be visited, namely, xl,and the slope cl of the dashed line passing the origin andthe point Rk;kxl can easily be obtained. For simplicity,those dashed lines are called the ‘‘c-boundaries’’. Note thateach PSK constellation has its own b-boundaries and c-boundaries, which can be predetermined. Finally, bycomparing ak with cl, the best-first order of the remainingsymbols can be achieved without any difficulty.

For example, one can observe from Fig. 2 that b2oakob3,thus x2 will be the first symbol to be visited. In addition, sinceakoc2, the best-first order of eight constellation points isðx2; x1; x3; x8; x4; x7; x5; x6Þ. In case ak4c2, the orderbecomes ðx2; x3; x1; x4; x8; x5; x7; x6Þ.

In the proposed PSD, the best-first order at level k isstored in the lookup table ik ¼ ði

ðkÞ1 ; . . . ; iðkÞM Þ in terms of the

indexes of signal points in x. For instance, iðkÞ1 ¼ 6 indicatesthat, at level k, x6 will be the first signal point in O to bevisited. It is clear that instead of having to compute andsort the distance metrics for all constellation points, theproposed approach is much simpler because it enables thedecoder to get the best-first order of the signal points atlevel k with only a real division.

The search stage of the PSD is summarized as follows.

PSD SEARCH STAGE: Input: v;R;x;M;C. Output: s ¼ðs1; s2; . . . ; snT

Þ

Step 1. (Initialization) Set k:¼nT, Tk:¼0, xk:¼vk, Dmin:¼C. � Step 2. (Searching) Use xk to generate lookup table ik

for level k, set lk:¼1.

� Step 3. If (lkpM), then {get the complex symbol sk by

setting sk:¼xiðkÞlk

and let D:¼jxk � Rk;kskj2 þ Tk}.

Step 4. If (DXDmin) or (lk4M), then if (k ¼ nT),terminate, else {set k:¼kþ 1, lk:¼lk þ 1, and go toStep 3}. P � Step 5. If (k41), then {let xk�1:¼vk�1 �

nT

i¼kRk;isi,Tk�1:¼D, k:¼k� 1, and go to Step 2}, else {set Dmin:¼D,save new solution s:¼s, let k:¼2, lk:¼lk þ 1, and go toStep 3}.

5. Impact of the preprocessing stage on performanceand complexity of the PSD

In Section 3, we have defined the extended channelmatrix with a parameter b. This section investigates howthe parameter b affects performance and complexity ofthe PSD.

5.1. Impact of b on performance of the PSD

With the equalities H ¼ QR in (4), Q HQ ¼ InT, and

v ¼ Q H r, it is straightforward to verify that

kv� Rsk2 ¼ k r�H sk2 � rH rþvHv (12)

ARTICLE IN PRESS

M.-T. Le et al. / Signal Processing 89 (2009) 197–205 201

Therefore, the decision rule (5) becomes

s ¼ arg mins2Okv� Rsk2

¼ arg mins2Ok r�H sk2 � rH rþvHv

¼ arg mins2Ok r�H sk2 (13)

Eq. (13) holds since the term ð�rH rþvHvÞ is a constant forany s 2 O. Eqs. (5) and (13) imply that s is the estimate ofthe transmitted signal vector s in a MIMO system with theinput–output relation defined by

r ¼ H sþw (14)

where w ¼w

�bs

" #is the additive noise. Clearly, the

MIMO system (14) is generally different from the V-BLAST

system (1) except for b ¼ 0. The additive noise w in (14)contains both a Gaussian component w and a signal-

dependent (non-Gaussian) component �bs. Its autocorre-lation matrix is given by

Efw wHg ¼s2InR

0

0 b2 PnT

InT

24

35 (15)

From Eq. (15) it follows that depending on the choice of

b, the additive noise w can be color or white, and isobviously non-Gaussian and data dependent. Conse-quently, in this aspect, the decision rule (13) is subopti-mal, and hence so is the decision rule (5). Interestingly, for

a non-zero value of b, the noise w is white when

b ¼ bMMSE. Consequently, b ¼ bMMSE is expected to makethe decision rule (5) slightly suboptimal.

The above analysis is, in general, true for thetransmitted vector s drawn from any M-level modulationscheme. Now, let us take a look from another aspect. Dueto the equality k r�H sk2 ¼ kr�Hsk2 þ b2sHs, Eq. (13) canbe rewritten as

s ¼ arg mins2Okv� Rsk2

¼ arg mins2Okr�Hsk2 þ b2sHs (16)

When s is drawn from the M-PSK constellation (10), wehave b2sHs ¼ b2r2nT, which is a constant for a given b.Hence, Eq. (16) becomes

s ¼ arg mins2Okv� Rsk2

¼ arg mins2Okr�Hsk2 (17)

This equation means that, for PSK signals, the solution to(5) is also the solution to (2). In other words, for PSKsignals, the decision rule (5) is optimal. Hence, whendetecting PSK signals in V-BLAST systems, the proposedPSD is always an optimal decoder irrespective of b.

Fig. 3. Effect of Rk;k on the number of constellation points to be visited at

layer k, and thus on the detection complexity; 8-PSK modulation.

5.2. Impact of b on complexity of the PSD

In this subsection, we qualitatively investigate the roleof b in the PSD’s complexity. For simplicity, let us considera V-BLAST system with 8-PSK modulation. For this system,

the PSD tries to test only those s 2 O that satisfy:

kv� Rsk2 ¼XnT

k¼1

jxk � Rk;kskj2pC (18)

Assuming that at layer k ¼ mþ 1, the PSD has successfullyselected the tentative decisions snT

; . . . ; smþ1, respectively,for transmitted symbols snT

; . . . ; smþ1 such that Tmþ1 ¼PnT

k¼mþ1jxk � Rk;kskj2oC. Then, at layer k ¼ m, a constella-

tion symbol xi, i ¼ 1; . . . ;8, is to be selected as the candidatefor sk must satisfy

Tmþ1 þ jxk � Rk;kxij2oC (19)

Or, equivalently,

jxk � Rk;kxij2oC � Tmþ1 (20)

This equation implies that, in a complex plane, if Rk;kxi

lies inside a circle whose center and radius are, respec-tively, given by xk and

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiC � Tmþ1

p, then xi can be a

candidate for sk. Therefore, for a fixed radiusffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiC � Tmþ1

pand a certain noise variance, Rk;k plays an important rolein the detection complexity. As illustrated in Fig. 3, a largevalue of Rk;k makes the distances among the constellationpoints widened, thereby reducing the number of signalpoints to be examined. In contrast, a small value of Rk;k

causes the distances among the signal points to reduce,leading to more signal points to be visited, and thus ahigher complexity. Because, for a given channel realiza-tion, Rk;k is proportional to b, a large value should beassigned to b in order to have low detection complexity.

On the other hand, if we partition the ðnT þ nRÞ � nT

matrix Q into the nR � nT matrix Q 1 and the nT � nT

matrix Q 2, then the signal vector v in (5) can be rewrittenas

v ¼ RsþQ H1 w� bQ H

2 s (21)

From (21) one can see that the power of the data-dependent noise term, bQ H

2 s, is in proportion to b2.Therefore, if b is large, the power of this noise term islarge, thereby making the transmitted symbols becomemore uncertain, and thus leading to a high detectioncomplexity.

ARTICLE IN PRESS

M.-T. Le et al. / Signal Processing 89 (2009) 197–205202

From the above analyses we come to a conclusion thatthere exists an optimal value of b, which results in the PSDwith the lowest detection complexity. However, findingthe optimal b through an exact theoretical analysis isbeyond the paper’s scope and is left as an open problemfor future work. What we can suggest in this paper is thatin order for the PSD to achieve reasonably low complexity,b should not be too large or too small.

6. Simulation results and discussion

This section verifies performance and complexity ofthe proposed decoder by applying it to different V-BLASTarchitectures with different PSK modulation schemes. Inthe simulations, signals are transmitted in a burst by burstbasis with burst length of 100 symbol durations. Inaddition, the channel matrix H is assumed to remainfixed within one burst and changes randomly from thisburst to the next. Thus, the preprocessing is carried outonce per burst. The initial square sphere radius C is setequal to 100. If no signal point exists inside the sphere, theradius will be increased by a step of 0:2C until a point isfound. The algorithms are all implemented in floating-point in C programming language then converted intomex files and used in MATLAB 6. We use the number offloating point operations (flops), i.e., addition, subtraction,multiplication, division, and cos�1 operation, as a measurefor the complexity. We only count the flops of thesearching stage without taking into account the complex-ity of the preprocessing stage.

Figs. 4 and 5, respectively, compare the bit error rate(BER) performance and complexity of the PSD with thoseof the Pham SD and of the Schnorr–Euchner spheredecoder (SE-SD) [6] in (4, 4) and (8, 8) systems employing8-PSK modulation. The SE-SD is incorporated with the

Fig. 4. Performance of the Pham SD and PS

method proposed by Cui et al. in [9] to eliminate invalidcandidates. Since the SE-SD also gives the systems MLperformance, for the sake of readability of the paper, weomit its BER curves. As can be seen from Fig. 4,performance of the PSD is almost identical to that of thePham SD in both systems for b ¼ bMMSE and 0:5 as well. Itis clear that the proposed PSD is an optimal decoder forany value of b. Moreover, one can observe from Fig. 5 thatnoticeable drops in complexity are offered by the PSD ascompared to the Pham SD and the SE-SD. For example, atSNR ¼ 24 dB and for b ¼ bMMSE, the complexities of thePSD are approximately 2.5 and 2.9 times lower than thoseof the Pham SD in (4, 4) and (8, 8) systems, respectively.For the same simulation conditions, the complexities ofthe PSD are around 2.3 and 5.9 times lower than those ofthe SE-SD. It is worth emphasizing that when the SE-SD isapplied to a PSK-modulated MIMO system, it has to dealwith more signal points in the constellation than usual.For example, when it detects 8-PSK modulated signal, ithas to deal with an equivalent constellation of 16 signalpoints, leading to a complexity-inefficient algorithm.Simulation results in Fig. 5 also indicate that a significantcomplexity reduction can be achieved for a wide range ofSNR when b is fixed at 0.5.

Unlike the Pham SD and the SE-SD, the PSD is able towork with underdetermined V-BLAST systems, i.e., sys-tems having nRonT. Shown in Fig. 6 are the BER curves ofthe BFML, Pham SD, and PSD in (6, 6) and (6, 3) systemswith 4-PSK modulation. With similar reason in Fig. 4, theBER curves of the SE-SD in these systems are omitted.From Fig. 6, we can see that the PSD has almost the sameperformance as that of the BFML in the underdetermined(6, 3) system.

The important point is that the PSD provides the (6, 3)system with ML performance at relatively low complexity.As illustrated in Fig. 7, the complexity of the PSD in the

D in (4, 4) and (8, 8) systems; 8-PSK.

ARTICLE IN PRESS

Fig. 5. Complexity versus SNR of the Pham SD, SE-SD, and PSD in (4, 4) and (8, 8) systems; 8-PSK.

Fig. 6. BER curves of the BFML, Pham SD, and PSD in (6, 6) and (6, 3) systems; 4-PSK.

M.-T. Le et al. / Signal Processing 89 (2009) 197–205 203

(6, 3) system is lower than those of both the Pham SD andthe SE-SD in the (6, 6) system at low SNR and is gettingcomparable in the medium and high SNR regions. Besides,Fig. 7 demonstrates that, in the (6, 3) system, b ¼ 0:5allows the system to have lower complexity than b ¼bMMSE at high SNR. This can be explained as follows. SincebMMSE is inversely proportional to the SNR, bMMSE becomessmaller and smaller as the SNR increases. Therefore, forunderdetermined systems, selecting b ¼ bMMSE at highSNR would result in small diagonal elements at lowerlayers, (i.e., Rk;k at layers k ¼ nT;nT � 1; . . . ; are small),

leading to a high complexity. Because underdeterminedV-BLAST systems usually operate at high SNR, in order tohave reasonably low complexity, b should be chosen as anappropriate constant when the PSD is used to detectsignals in those systems.

7. Conclusion

A computationally efficient sphere decoder, called PSD,is proposed for detecting PSK signals in V-BLAST systems.

ARTICLE IN PRESS

Fig. 7. Complexity of the Pham SD, SE-SD, and PSD in (6, 6) and (6, 3) systems; 4-PSK.

M.-T. Le et al. / Signal Processing 89 (2009) 197–205204

The PSD consists of preprocessing stage and BB-basedsearch stage. The heart of the preprocessing stage is theE-MMSE-DFE technique with a selectable parameter b.b has been shown to have no effect on the optimality ofthe PSD. Nevertheless, it plays a key role in determiningthe PSD’s complexity. A qualitative analysis shows thatthere exists an optimal b, which allows the PSD to havethe lowest complexity. Yet, the problem of theoreticallyfinding the exact optimal b is left open for future work.Simulation results show that, while having almostidentical BER performance, the proposed PSD offers anaverage complexity considerably lower than that of thePham SD and SE-SD when b ¼ bMMSE or even when b isfixed at 0.5.

Acknowledgment

This work was supported by the ERC program of MOST/KOSEF (Intelligent Radio Engineering Center) at ICU,Republic of Korea.

Appendix A. Derivation of bMMSE

For the V-BLAST system in Eq. (1), we have thefollowing assumptions:

EfssHg ¼P

nTInT

(A.1)

EfwwHg ¼ s2InR(A.2)

EfswHg ¼ EfwsHg ¼ 0 (A.3)

Under those assumptions, the correlation matrix of thereceived signal vector r is

EfrrHg ¼ HEfssHgHHþ EfwwHg

¼P

nTHHH

þ s2InR(A.4)

The MMSE-DFE filter matrix, GMMSE, is determined asfollows

GMMSE ¼ arg minG

Efks� Grk2g (A.5)

Using the orthogonality principle, we can write

EfðGMMSEr� sÞrHg ¼ 0 (A.6)

Or, equivalently

GMMSEEfrrHg ¼ EfsrHg (A.7)

The cross-correlation matrix between s and r is given by

EfsrHg ¼ EfssHgHH

¼P

nTHH (A.8)

From (A.4), (A.7), and (A.8), we can write

GMMSEP

nTHHH

þ s2InR

� �¼

P

nTHH (A.9)

After some manipulations, we get

GMMSE ¼ HHHþnT

g InT

� ��1

HH (A.10)

where g ¼ P=s2 is the SNR per receive antenna.The output of the MMSE-DFE filter is given by

~sMMSE ¼ GMMSEr (A.11)

With the definition of the extended channel matrix andextended received signal vector in Eq. (3), by setting b ¼bMMSE the filter output (A.11) can be re-expressed as [14]

~sMMSE ¼ ðHH H Þ�1HH r

¼ ðHHHþ b2MMSEInT

Þ�1HHr (A.12)

ARTICLE IN PRESS

M.-T. Le et al. / Signal Processing 89 (2009) 197–205 205

From Eqs. (A.10)–(A.12), it follows that

bMMSE ¼

ffiffiffiffiffinT

g

r(A.13)

References

[1] G.J. Foschini, M.J. Gans, On limits of wireless communications in afading environment when using multiple antennas, Wireless Per.Commun. 6 (1998) 311–335.

[2] G.D. Golden, G.J. Foschini, R.A. Valenzuela, P.W. Wolniansky,Detection algorithm and initial laboratory results using theV-BLAST space-time communication architecture, Electron. Lett.35 (1) (January 1999) 14–15.

[3] E. Viterbo, J. Boutros, A universal lattice code decoder for fadingchannels, IEEE Trans. Inf. Theory 45 (July 1999) 1639–1642.

[4] M.O. Damen, A. Chkeif, J. Belfiore, Lattice code decoder for space-time codes, IEEE Commun. Lett. 4 (May 2000) 161–163.

[5] E. Agrell, T. Eriksson, A. Vardy, K. Zeger, Closest point search inlattices, IEEE Trans. Inf. Theory 48 (8) (August 2002) 2201–2214.

[6] M.O. Damen, H.E. Gamal, G. Caire, On maximum-likelihooddetection and the search for the closest lattice point, IEEE Trans.Inf. Theory 49 (10) (October 2003).

[7] B. Hassibi, H. Vikalo, On the sphere-decoding algorithm I. Expectedcomplexity, IEEE Trans. Signal Process. 53 (8) (August 2005)2806–2818.

[8] A.M. Chan, I. Lee, A new reduced-complexity sphere decoderfor multiple antenna systems, in: Proceedings of the IEEEInternational Conference on Communications, vol. 1, May 2002,pp. 460–464.

[9] T. Cui, C. Tellambura, Joint channel estimation and data detectionfor OFDM systems via sphere decoding, in: IEEE Proceedings ofGlobecom’04, vol. 6, December 2004, pp. 3656–3660.

[10] B.M. Hochwald, S.T. Brink, Achieving near-capacity on amultiple-antenna channel, IEEE Trans. Commun. 51 (3) (March2003).

[11] D. Pham, K.R. Pattipati, P.K. Willett, J. Luo, An improved complexsphere decoder for V-BLAST systems, IEEE Signal Process. Lett. 11 (9)(September 2004) 748–751.

[12] J. Luo, K. Pattipati, P. Willett, L. Brunel, Branch-and-bound-basedfast optimal algorithm for multiuser detection in synchronousCDMA, in: Proceedings of the IEEE International Conference onCommunications, vol. 5, May 2003, pp. 3336–3340.

[13] A.D. Murugan, H.E. Gamal, M.O. Damen, G. Caire, A unifiedframework for tree search decoding: rediscovering the sequentialdecoder, IEEE Trans. Inf. Theory 52 (3) (March 2006).

[14] D. Wubben, R. Bohnke, V. Kuhn, K.D. Kammeyer, MMSE extension ofV-BLAST based on sorted QR decomposition, in: IEEE Proceedings ofthe VTC 2003-Fall, vol. 1, October 2003, pp. 508–512.

[15] D. Wubben, R. Bohnke, J. Rinas, V. Kuhn, K.D. Kammeyer, Efficientalgorithm for decoding layered space-time codes, IEE Electron. Lett.37 (22) (October 2001) 1348–1350.

[16] M.-T. Le, V.-S. Pham, L. Mai, G. Yoon, Rate-one full-diversity quasi-orthogonal STBCs with low decoding complexity, IEICE Trans.Commun. E 89-B (12) (December 2006) 3376–3385.