Joint Stream-Wise THP Transceiver Design for the Multiuser MIMO Downlink

10
IEICE TRANS. COMMUN., VOL.E92–B, NO.1 JANUARY 2009 209 PAPER Joint Stream-Wise THP Transceiver Design for the Multiuser MIMO Downlink Wei MIAO a) , Xiang CHEN , Student Members, Ming ZHAO , Member, Shidong ZHOU , and Jing WANG , Nonmembers SUMMARY This paper addresses the problem of joint transceiver de- sign for Tomlinson-Harashima Precoding (THP) in the multiuser multiple- input-multiple-output (MIMO) downlink under both perfect and imperfect channel state information at the transmitter (CSIT). For the case of perfect CSIT, we dier from the previous work by performing stream-wise (both inter-user and intra-user) interference pre-cancelation at the transmitter. A minimum total mean square error (MT-MSE) criterion is used to formulate our optimization problem. By some convex analysis of the problem, we obtain the necessary conditions for the optimal solution. An iterative al- gorithm is proposed to handle this problem and its convergence is proved. Then we extend our designed algorithm to the robust version by minimiz- ing the conditional expectation of the T-MSE under imperfect CSIT. Sim- ulation results are given to verify the ecacy of our proposed schemes and to show their superiorities over existing MMSE-based THP schemes. key words: multiuser MIMO, THP, transceiver, interference pre- cancelation, robust design 1. Introduction The multiuser multiple-input-multiple-output (MIMO) downlink has attracted great research interest because of its potential of increasing the system capacity [1]–[4]. Many transmitter precoding schemes have been reported in or- der to mitigate the cochannel interference (CCI) as well as exploiting the spatial multiplexing of the multiuser MIMO downlink. Tomlinson-Harashima precoding (THP) has be- come a promising scheme since the successive interference pre-cancelation structure makes THP outperform linear pre- coding schemes [5], [6] with only a small increase in com- plexity. Many THP schemes based on dierent criteria have been reported in the literature [7]–[12]. A THP scheme based on zero-forcing (ZF) criterion was proposed in [7] for the multiuser multiple-input-single-output (MISO) system where each user has a single antenna. Later in [8] ZF-THP was extended to the multiuser MIMO system where each user has multiple antennas. The minimum mean square er- Manuscript received December 7, 2007. Manuscript revised June 27, 2008. The authors are with the State Key Laboratory on Mi- crowave and Digital Communications, Tsinghua National Labo- ratory for Information Science and Technology, and Department of Electronic Engineering, Tsinghua University, Beijing 100084, P.R. China. This work was supported by Tsinghua-Philips/NXP cooper- ation project, China’s 863 Project-No. 2006AA01Z282 and Pro- gram for New Century Excellent Talents in University. Part of this paper was presented at IEEE Wireless Communications and Net- working Conference 2008. a) E-mail: [email protected] DOI: 10.1587/transcom.E92.B.209 ror (MMSE) criterion has also been widely used in the area of THP design. The authors in [9] designed THP based on the MMSE criterion for the MISO system where the users are restricted to use a common scalar receiving weight. This restriction was relaxed in [10], i.e., the users may use dif- ferent scalar receiving weights, where the authors used the MSE duality between the uplink and downlink and an ex- haustive search method to tackle the problem. The problem of joint THP transceiver design for multiuser MIMO sys- tems has been studied in [11] based on the MMSE criterion. However, a per-user power constraint is imposed, which may not be reasonable in the downlink. Morevoer, only the inter-user interference is pre-canceled nonlinearly, whereas the data streams of the same user are linearly precoded. The work of [11] has been improved in [12] under a total trans- mit power constraint, where the users apply the MSE dualtiy between the uplink and downlink and the projected gradient algorithm to calculate the solution iteratively. Again, only the inter-user interference is pre-subtracted. The above schemes have a common assumption that the base station (BS), i.e., the transmitter, has perfect chan- nel state information (CSI). In a realistic scenario, how- ever, the CSI is generally imperfect due to limited number of training symbols or channel time-variations. Therefore, the robust transceiver design which takes into account the uncertainties of CSI at the transmitter (CSIT) is required. Several robust schemes have been proposed for THP in the multiuser MISO downlink, which can be classified into the worst-case approach [13], [14] and the stochastic approach [14]–[16]. The worst-case approach optimizes the worst system performance for any channel error in a predefined uncertainty region. In [13] a robust power allocation scheme for THP was proposed, which maximizes the achievable rates for the worst-case errors in the CSI in the small errors regime. The authors of [14] designed the THP transmitter to minimize the worst-case MSE over all admissible channel uncertainties subject to power constraints on each antenna, or a total power constraint. On the other hand, the stochastic approach optimizes a statistical measure of the system per- formance assuming that the statistics of the uncertainty is known. A robust nonlinear transmit zero-forcing filter with THP was presented in [15] using a conditional-expectation approach, and has been extended lately in [14] by relax- ing the zero-forcing constraint and using the MMSE crite- rion. The problem of combined optimization of channel es- timation and THP was considered in [16] and a conditional- Copyright c 2009 The Institute of Electronics, Information and Communication Engineers

Transcript of Joint Stream-Wise THP Transceiver Design for the Multiuser MIMO Downlink

IEICE TRANS. COMMUN., VOL.E92–B, NO.1 JANUARY 2009209

PAPER

Joint Stream-Wise THP Transceiver Design for the MultiuserMIMO Downlink∗

Wei MIAO†a), Xiang CHEN†, Student Members, Ming ZHAO†, Member, Shidong ZHOU†,and Jing WANG†, Nonmembers

SUMMARY This paper addresses the problem of joint transceiver de-sign for Tomlinson-Harashima Precoding (THP) in the multiuser multiple-input-multiple-output (MIMO) downlink under both perfect and imperfectchannel state information at the transmitter (CSIT). For the case of perfectCSIT, we differ from the previous work by performing stream-wise (bothinter-user and intra-user) interference pre-cancelation at the transmitter. Aminimum total mean square error (MT-MSE) criterion is used to formulateour optimization problem. By some convex analysis of the problem, weobtain the necessary conditions for the optimal solution. An iterative al-gorithm is proposed to handle this problem and its convergence is proved.Then we extend our designed algorithm to the robust version by minimiz-ing the conditional expectation of the T-MSE under imperfect CSIT. Sim-ulation results are given to verify the efficacy of our proposed schemes andto show their superiorities over existing MMSE-based THP schemes.key words: multiuser MIMO, THP, transceiver, interference pre-cancelation, robust design

1. Introduction

The multiuser multiple-input-multiple-output (MIMO)downlink has attracted great research interest because of itspotential of increasing the system capacity [1]–[4]. Manytransmitter precoding schemes have been reported in or-der to mitigate the cochannel interference (CCI) as well asexploiting the spatial multiplexing of the multiuser MIMOdownlink. Tomlinson-Harashima precoding (THP) has be-come a promising scheme since the successive interferencepre-cancelation structure makes THP outperform linear pre-coding schemes [5], [6] with only a small increase in com-plexity. Many THP schemes based on different criteria havebeen reported in the literature [7]–[12]. A THP schemebased on zero-forcing (ZF) criterion was proposed in [7] forthe multiuser multiple-input-single-output (MISO) systemwhere each user has a single antenna. Later in [8] ZF-THPwas extended to the multiuser MIMO system where eachuser has multiple antennas. The minimum mean square er-

Manuscript received December 7, 2007.Manuscript revised June 27, 2008.†The authors are with the State Key Laboratory on Mi-

crowave and Digital Communications, Tsinghua National Labo-ratory for Information Science and Technology, and Departmentof Electronic Engineering, Tsinghua University, Beijing 100084,P.R. China.

∗This work was supported by Tsinghua-Philips/NXP cooper-ation project, China’s 863 Project-No. 2006AA01Z282 and Pro-gram for New Century Excellent Talents in University. Part of thispaper was presented at IEEE Wireless Communications and Net-working Conference 2008.

a) E-mail: [email protected]: 10.1587/transcom.E92.B.209

ror (MMSE) criterion has also been widely used in the areaof THP design. The authors in [9] designed THP based onthe MMSE criterion for the MISO system where the usersare restricted to use a common scalar receiving weight. Thisrestriction was relaxed in [10], i.e., the users may use dif-ferent scalar receiving weights, where the authors used theMSE duality between the uplink and downlink and an ex-haustive search method to tackle the problem. The problemof joint THP transceiver design for multiuser MIMO sys-tems has been studied in [11] based on the MMSE criterion.However, a per-user power constraint is imposed, whichmay not be reasonable in the downlink. Morevoer, only theinter-user interference is pre-canceled nonlinearly, whereasthe data streams of the same user are linearly precoded. Thework of [11] has been improved in [12] under a total trans-mit power constraint, where the users apply the MSE dualtiybetween the uplink and downlink and the projected gradientalgorithm to calculate the solution iteratively. Again, onlythe inter-user interference is pre-subtracted.

The above schemes have a common assumption thatthe base station (BS), i.e., the transmitter, has perfect chan-nel state information (CSI). In a realistic scenario, how-ever, the CSI is generally imperfect due to limited numberof training symbols or channel time-variations. Therefore,the robust transceiver design which takes into account theuncertainties of CSI at the transmitter (CSIT) is required.Several robust schemes have been proposed for THP in themultiuser MISO downlink, which can be classified into theworst-case approach [13], [14] and the stochastic approach[14]–[16]. The worst-case approach optimizes the worstsystem performance for any channel error in a predefineduncertainty region. In [13] a robust power allocation schemefor THP was proposed, which maximizes the achievablerates for the worst-case errors in the CSI in the small errorsregime. The authors of [14] designed the THP transmitter tominimize the worst-case MSE over all admissible channeluncertainties subject to power constraints on each antenna,or a total power constraint. On the other hand, the stochasticapproach optimizes a statistical measure of the system per-formance assuming that the statistics of the uncertainty isknown. A robust nonlinear transmit zero-forcing filter withTHP was presented in [15] using a conditional-expectationapproach, and has been extended lately in [14] by relax-ing the zero-forcing constraint and using the MMSE crite-rion. The problem of combined optimization of channel es-timation and THP was considered in [16] and a conditional-

Copyright c© 2009 The Institute of Electronics, Information and Communication Engineers

210IEICE TRANS. COMMUN., VOL.E92–B, NO.1 JANUARY 2009

expectation approach is adopted to solve the problem. Allthe above robust schemes are designed for the MISO down-link where each user has only one single antenna.

In this paper, we propose novel joint THP transceiverdesigns for the multiuser MIMO downlink with both per-fect and imperfect CSIT. The transmitter performs nonlin-ear stream-wise (both inter-user and intra-user) interferencepre-cancelation. We first consider the transceiver optimiza-tion problem under perfect CSIT and formulate it as mini-mizing the total mean square error (T-MSE) of the downlink[6] under a total transmit power constraint. Under the op-timization criterion of minimizing the T-MSE, the stream-wise interference pre-cancelation structure is superior tothe structure of inter-user only interference pre-cancelationcombined with intra-user linear precoding adopted in [11]and [12], which has already been proven to be true in a par-ticular case, i.e., the single-user MIMO case [17]. By someconvex analysis of the optimization problem, we find thenecessary conditions for the optimal solution, by which theoptimal transmitter and receivers are inter-dependent. Weextend the iterative algorithm developed in [6] to handle ourproblem. Although the iterative algorithm does not assureto converge to the globally optimal solution, it is guaranteedto converge to a locally optimal solution. Then, we make anextension of our design under perfect CSIT to the imperfectCSIT case which leads to a robust transceiver design againstthe channel uncertainty. The robust optimization problemis mathematically formulated as minimizing the expectationof the T-MSE conditioned on the channel estimates at theBS under a total transmit power constraint. An iterative op-timization algorithm similar to its perfect CSIT counterpartcan also be applied. Extensive simulation results are pre-sented to illustrate the efficacy of our proposed schemes andtheir superiority over existing MMSE-based THP schemes.

The remainder of this paper is organized as follows. InSect. 2, the system model for the multiuser MIMO down-link with THP is established. In Sect. 3, we formulate ourjoint THP transceiver design problem under perfect CSITand propose an iterative algorithm to find a locally optimalsolution. Then we tackle the problem of robust transceiverdesign under imperfect CSIT in Sect. 4. The simulation re-sults are given in Sect. 5 to verify the performance of theproposed schemes. Finally, conclusions are drawn in Sect. 6.

Throughout the paper, the symbols (·)T , (·)∗ and (·)H

represent matrix transposition, complex conjugate and Her-mitian, respectively. ‖ · ‖2 and ‖ · ‖F represent the Euclideannorm of a vector and the Frobenius norm of a matrix, respec-tively. tr(·) denotes the trace of a square matrix. [·]m,n rep-resents the element in the mth row and the nth column of amatrix. blockdiag(·, ..., ·) denotes the block diagonalizationof a set of matrices. E{·} represents expectation operator. IN

denotes the N × N identity matrix.

2. System Model of Multiuser MIMO Downlink withStream-Wise THP

Consider a multiuser MIMO downlink composed of a base

Fig. 1 System diagram of Tomlinson-Harashima precoded multiuserMIMO downlink.

station (BS) with M transmit antennas and K users with Nk

receive antennas at the kth user, k = 1, . . . ,K (see Fig. 1).Let Hk ∈ CNk×M denote the channel between the BS and thekth user. The vector dk ∈ CLk×1 represents the transmitteddata vector for user k, where each entry belongs to the inter-val [−τ/2, τ/2)+ j ·[−τ/2, τ/2) (τ is the modulo base of THPas introduced later) and Lk is the number of data streamstransmitted for user k. The data vectors are stacked intod � [dT

1 dT2 . . . dT

K]T , which is first reordered by a permu-tation matrix Π ∈ CL×L (ΠΠT = ΠTΠ = IL, L �

∑Kk=1 Lk)

and then successively precoded using THP (see Fig. 1). Thefeedback matrix F ∈ CL×L is a lower triangular matrixwith zero diagonal. The structure of F enables inter-streaminterference pre-cancelation and is different from the oneused in [12] which only enables inter-user interference pre-cancelation. The modulo device performs a mod τ opera-tion to avoid transmit power enhancement. Each entry ofthe output w of the modulo device is constrained in the in-terval [−τ/2, τ/2) + j · [−τ/2, τ/2). A common assumptionin the literature is that the entries of w are uniformly dis-tributed with unit variance (i.e., τ =

√6) and are mutually

uncorrelated. Then w is linearly precoded by a feedforwardmatrix P ∈ CM×L and transmitted over the downlink channelto the K users.

At the kth receiver, a decoding matrix Gk ∈ CLk×Nk anda modulo device are employed to estimate the data vectordk. Denote the estimate of dk by dk, then it is given by

dk = (GkHkPw +Gknk) mod τ, (1)

in which nk ∈ CNk×1 is the additive Gaussian noise vectorat user k with zero mean and covariance matrix E{nknH

k } =σ2

n,kINk . We assume that there is a total power constraint PT

at the BS so that tr(PHP

)= PT .

3. Transceiver Optimization under Perfect CSIT

In this section, we propose our joint THP transceiver designunder perfect CSIT using a minimum total mean square er-ror (MT-MSE) criterion. Using convex analysis for the opti-mization problem we derive the necessary conditions for theoptimal transceiver in Sect. 3.1. Then the iterative algorithmproposed in [6] is extended in Sect. 3.2 to obtain a locallyoptimal transceiver.

3.1 Problem Formulation

Our design is based on the linear representation [9] (seeFig. 2) of the system in Fig. 1, where the modulo devices

MIAO et al.: JOINT STREAM-WISE THP TRANSCEIVER DESIGN FOR THE MULTIUSER MIMO DOWNLINK211

Fig. 2 Equivalent linear representation of THP in Fig. 1.

at the transmitter and receivers are replaced by the additivevector a � [aT

1 aT2 . . . aT

K]T and −ak, k = 1, ...,K, where

a ∈ τ(Z

L×1 + j · ZL×1)

and ak ∈ τ(Z

Lk×1 + j · ZLk×1). The

vectors a and ak are chosen to make the same w and dk asthe modulo devices at the transmitter and receivers outputrespectively.

Define bk � dk + ak and bk � dk + ak and stackthem into b � [bT

1 . . . bTK]T and b � [b

T1 . . . b

TK]T .

Let H �[HT

1 ... HTK

]T, n �

[nT

1 ... nTK

]Tand G �

blockdiag (G1, ...,GK), then from Fig. 2 we have

Πb + Fw = w ⇒ b = ΠT (IL − F)w (2)

and

b = GHPw +Gn. (3)

We consider the MSE between b and b rather than dand d in order to bypass the impact of the modulo operationsand define it as the total MSE (T-MSE) of the downlink,which is written as follows:

T-MSE = Ew,n

{∥∥∥b − b∥∥∥2

2

}= Ew,n

{∥∥∥∥(GHP −ΠT (IL − F))

w +Gn∥∥∥∥2

2

}=

∥∥∥GHP −ΠT (IL − F)∥∥∥2

F+ tr(GHGΣn), (4)

where Σn � E{nnH} = blockdiag(σ2

n,1IN1 , ..., σ2n,KINK

).

So our transceiver design problem is to find a set{Π, F, P, {Gk}Kk=1

}that minimizes the T-MSE defined in (4)

under a total transmit power constraint. Mathematically itcan be formulated as follows:

minΠ, F, P, {Gk}Kk=1

T-MSE

s.t.tr(PHP

)= PT ,

[F]m,n = 0, ∀ 1 � m, n � L and m � n. (5)

3.2 Iterative Algorithm

In this subsection, through some analysis, we find the neces-sary conditions for the optimal Π, F, P and {Gk}Kk=1, whichform an inter-dependence among them. This kind of inter-dependence leads to an iterative algorithm similar to the oneproposed in [6]. In each iteration, we first determine thesuboptimal reordering matrix Π and update P and F usingthe updated {Gk}Kk=1 in the last iteration, then update {Gk}Kk=1using the above updated Π, P and F.

For ease of derivation, we introduce two new matrixvariables T � β−1P and R � βG to replace P and G, whereβ is a positive real number. Then (4) is rewritten as

T-MSE=∥∥∥RHT−ΠT (IL−F)

∥∥∥2

F+β−2 · tr(RHRΣn). (6)

Moreover, using the total power constraint in (5) we obtain

β = P12T

(tr(TTH)

)− 12 . (7)

Note that F only appears in the first term of (6). We expandthe first term of (6) as follows:∥∥∥RHT −ΠT (IL − F)

∥∥∥2

F

= ‖ΠRHT−(IL−F)‖2F (for ΠTΠ=ΠΠT =IL) (8)

=

L∑i=1

∥∥∥∥∥∥[Ai

Bi

]ti − ei + fi

∥∥∥∥∥∥2

2

, (9)

where[Ai

Bi

]= ΠRH, Ai ∈ Ci×M , Bi ∈ C(L−i)×M, (10)

ti, ei and fi are the ith columns of T, IL and F respectively.The equality in (8) follows from the fact that the Frobeniusnorm of a matrix remains constant after the multiplicationof a unitary matrix [18]. For fixed Π, T and R, each term inthe summation in (9) can be minimized separately. With thelower triangular and zero diagonal structure of F, the opti-mal fi that minimizes the ith term of (9) is easily computedas:

fi = −[0i×M

Bi

]ti, i = 1, . . . , L. (11)

By substituting (7) and (11) into (6), we rewrite the T-MSE as:

T-MSE =L∑

i=1

∥∥∥∥∥∥[Ai

0

]ti − ei

∥∥∥∥∥∥2

2

+ ξ

L∑i=1

tr(tHi ti), (12)

where ξ � P−1T

∑Kk=1 σ

2n,ktr

(RH

k Rk

)is a nonnegative real

number, i.e., ξ ≥ 0.For fixed Π and R, the optimization problem in (5) can

be reformulated as:

minTg(T) �

L∑i=1

∥∥∥∥∥∥[Ai

0

]ti − ei

∥∥∥∥∥∥2

2

+ ξ

L∑i=1

tr(tHi ti). (13)

Notice that by the introduction of β the power constraint inthe original optimization problem has been absorbed into theobjective function, so (13) is an unconstraint optimizationproblem. The Hessian matrix of (12) with respect to ti iscalculated and shown below:

∇tTi

[∇t∗i g(T)

]= AH

i Ai + ξIM � 0. (14)

The Hessian matrix in (14) being positive semidefinite indi-cates that g(T) is convex respect to ti. Then the optimal ti isderived by calculating the first order derivative with respect

212IEICE TRANS. COMMUN., VOL.E92–B, NO.1 JANUARY 2009

Table 1 The suboptimal ordering algorithm for THP.

Initialization:A = RH, Π = 0L×L , ξ = P−1

T

∑Kk=1 σ

2n,ktr

(RH

k Rk

).

For i = L : −1 : 1,

M = A(AHA + ξIM

)−1AH .

l∗ = max1≤l≤L

[M]l,l.

The ith row πi of Π is obtained as: πi = eTl∗ .

Then set the entries of the l∗th row of A to zeros.end

to t∗i and setting it to zero, i.e.,

∂g(T)∂t∗i

= AHi Aiti −

[Ai

0

]H

ei + ξti = 0

⇒ ti =(AH

i Ai + ξIM

)−1 [AH

i 0]

ei. (15)

Now we consider the problem of the optimal ordering,i.e., the optimal Π. (12) can be rewritten as:

T-MSE=L∑

i=1

(tHi

(AH

i Ai + ξIM

)ti

− tHi

[AH

i 0]

ei − eHi

[Ai

0

]ti + 1

). (16)

Substituting (15) into (16) and after some algebraic manip-ulations, we rewrite the T-MSE as

T-MSE=L−L∑

i=1

tr

(eH

i

[Ai

0

] (AH

i Ai+ξIM

)−1 [AH

i 0]

ei

). (17)

The T-MSE in (17) is a function of Π for fixed R. An ex-haustive search is needed to find the optimal reordering ma-trix that minimizes (17). To avoid the high complexity ofthis global optimal approach, we adopt a suboptimal suc-cessive reordering algorithm that only maximizes one termof the summation in (17) and starts from the L-th term tillthe 1st term. The maximization of the ith term determinesthe ith row of Π. The procedure of the reordering algorithmis listed in Table 1.

Till now we have found the suboptimal Π, the optimalF, T and β for fixed R. Next we calculate the optimal Runder fixed Π, F and T.

The T-MSE in (6) can be expanded as the summation ofthe K users’ MSEs, and the MSE for the kth user is writtenas follows:

MSEk = Ew,n

{∥∥∥bk − bk

∥∥∥2

2

}=

∥∥∥RkHkT − ETkΠ

T (IL − F)∥∥∥2

F+ ζk · tr(RH

k Rk), (18)

in which Ek �[e∑k−1

i=1 Li+1, . . . , e∑ki=1 Li

]and ζk �

P−1T σ

2n,ktr(THT) ≥ 0.Since Rk is only related to MSEk, the Hessian matrix

of T-MSE with respect to Rk is equal to that of MSEk, whichis calculated as

∂(vec(Rk))T

(∂T-MSE∂(vec(Rk))∗

)

Table 2 The iterative algorithm for joint THP transceiver design.

Step (1) Set the iteration number n = 0 and initialize Π(0) = IL andR(0)

k = UHk , where Uk comprises the Lk dominant left singular

vectors of Hk.Step (2) Set n = n + 1. Calculate the reordering matrix Π using thealgorithm described in Table 1 and R(n−1).Calculate (17) using Π and R(n−1) and denote the result as C.Calculate (17) using Π(n−1) and R(n−1) and denote the result as C.If C � CΠ(n) = Π,

elseΠ(n) = Π(n−1).

endUpdate transmit processing:

t(n)i =

(A(n),H

i A(n)i + ξIM

)−1 [A(n),H

i 0]

ei,

f(n)i = −

[0

B(n)i

]t(n)i , i = 1, . . . , L,

and β(n) = P12T

(tr(T(n)T(n),H)

)− 12 ,

where[A(n)

i

B(n)i

]= Π(n)R(n−1)H, A(n)

i ∈ Ci×M , B(n)i ∈ C(L−i)×M ,

and ξ = P−1T

∑Kk=1 σ

2n,ktr

(R(n−1),H

k R(n−1)k

).

Update receiver processing:R(n)

k = ETk Π

(n),T (IL − F(n))T(n),HHHk ·(

HkT(n)T(n),HHHk + ζkINk

)−1, k = 1, . . . ,K,

where ζk = P−1T σ

2n,ktr(T(n),HT(n)).

Step (3) If ‖ R(n)k − R(n−1)

k ‖2F≥ ε, ∃ k ∈ {1, . . . ,K}, then goto Step (2). Otherwise, stop the iteration and the solution is given by

Π = Π(n), P = β(n)T(n), F = F(n), β = β(n) and Gk =(β(n)

)−1R(n)

k .

=∂

∂(vec(Rk))T

(∂MSEk

∂(vec(Rk))∗

)

=(HkTTHHH

k + ζkINk

)T ⊗ ILk � 0. (19)

The Hessian matrix in (19) being positive semidefinite indi-cates that the T-MSE is also convex with respect to Rk. Thenthe optimal Rk is calculated in the same way as (15):

∂T-MSE∂R∗k

=∂MSEk

∂R∗k= RkHkTTHHH

k − ETkΠ

T (IL − F)THHHk + ζkRk = 0.

⇒ Rk = ETkΠ

T (IL − F)THHHk

(HkTTHHH

k + ζkINk

)−1.

(20)

As the inter-dependence among the optimalΠ, F, T, βand {Rk}Kk=1 has been found, we now summarize our iterativealgorithm in Table 2, where the notations with the super-script (·)(n) denote the related variables in the nth iteration.

The convergence of our proposed iterative algorithmcan be guaranteed. The proof of convergence is in Ap-pendix.

4. Robust Optimization of Transceivers under Imper-fect CSIT

In the previous section, we have developed an iterative al-gorithm for joint THP transceiver design using MMSE cri-terion under perfect CSIT. When the CSIT is imperfect, one

MIAO et al.: JOINT STREAM-WISE THP TRANSCEIVER DESIGN FOR THE MULTIUSER MIMO DOWNLINK213

possible solution is to neglect the channel errors, and the im-perfect CSIT is used as if it was the perfect one. This simplesolution does not take into account any uncertainty in theCSIT, therefore suffers from severe performance degrada-tion. In this section, we introduce a robust THP transceiverdesign for the multiuser MIMO downlink which is more ef-fective against the uncertainty in the CSIT than the abovesimple solution. The robust optimization problem is math-ematically formulated as minimizing the expectation of theT-MSE conditioned on the channel estimates at the BS un-der a total transmit power constraint. Then the iterative al-gorithm proposed in the previous section is applied to solvethe problem.

4.1 Channel Uncertainty Model

We consider a TDD system where the BS estimates the CSIusing the training sequences in the uplink. The maximum-likelihood estimate of the actual channel matrix Hk can bemodeled as [19] Hk = Hk + ΔHk, where ΔHk denotes theerror matrix whose entries are i.i.d. complex Gaussian dis-tributed with zero mean and variance σ2

e,k. ΔHk is statisti-cally independent of Hk. According to [20], the distributionof Hk conditioned on Hk is Gaussian and can be expressedas

Hk | Hk = ρkHk + ΔHk, (21)

where ρk = σ2h,k/(σ

2h,k+σ

2e,k) and the entries of ΔHk are i.i.d.

complex Gaussian distributed with zero mean and varianceσ2

k = σ2e,kσ

2h,k/(σ

2h,k +σ

2e,k). We assume that the information

of Hk, σ2h,k and σ2

e,k, k = 1, . . . ,K is known at the BS.Note that the channel uncertainty caused by the slow

time-variations of the channel can also be modeled in thesame manner as (21) except that ρk has a different relation-ship with σ2

k [21].

4.2 Robust Optimization Problem Formulation and Itera-tive Algorithm

When only the channel estimates Hk, k = 1, . . . ,K are avail-able at the BS, the definition of T-MSE in (6) and MSEk in(18) cannot be directly applied to the transceiver design. In-stead, the expectation of MSE conditioned on Hk is an ap-plicable performance measure and provides the robustnessagainst the channel uncertainties in an average manner [14],[16]. By (18), the conditional expectation of MSE of user kis expressed as

EHk |Hk{MSEk} = EHk |Hk

{ ∥∥∥RkHkT − ETkΠ

T (IL − F)∥∥∥2

F

+ζk · tr(RHk Rk)

}= tr

(Rk · EHk |Hk

{HkTTHHH

k

}· RH

k − Rk

·EHk |Hk{Hk} · T(

ETkΠ

T (IL − F))H

−ETkΠ

T (IL − F)TH · EHk |Hk

{HH

k

}· RH

k

+ETkΠ

T (IL − F)(ET

kΠT (IL − F)

)H)

+ζk · tr(RHk Rk). (22)

Using (21) it can be easily verified that

EHk |Hk{Hk} = ρkHk, (23)

EHk |Hk

{HH

k

}= ρkH

Hk . (24)

Next we calculate the quadratic term Qk �

EHk |Hk

{HkTTHHH

k

}. Let Hk =

[hT

k,1 hTk,2 . . .h

Tk,Nk

]T, where

hk,m is the mth row of Hk. Then the element at the mth rowand nth column of Qk can be written as[

Qk]m,n =EHk |Hk

{hk,mTTHhH

k,n

}=EHk |Hk

{tr

(hk,mTTHhH

k,n

)}=EHk |Hk

{tr

(hH

k,nhk,mTTH)}

= tr(EHk |Hk

{hH

k,nhk,m

}TTH

). (25)

According to (21), we have

EHk |Hk

{hH

k,nhk,m

}= ρ2

k hHk,nhk,m + E

{Δh

Hk,nΔhk,m

}, (26)

where

E{Δh

Hk,nΔhk,m

}=

{σ2

kIM , m = n0. m � n

Therefore,

[Qk

]m,n =

⎧⎪⎪⎨⎪⎪⎩ ρ2k hk,mTTHh

Hk,n + σ

2k tr

(TTH

), m = n

ρ2k hk,mTTHh

Hk,n. m � n

Finally, Qk can be expressed as

Qk = ρ2kHkTTHH

Hk + σ

2k tr

(TTH

)INk . (27)

By substituting (23), (24) and (27) into (22), we can obtainthe explicit expression of EHk |Hk

{MSEk} as follows:

EHk |Hk{MSEk}=

∥∥∥∥ρkRkHkT − ETkΠ

T (IL − F)∥∥∥∥2

F

+ ζk · tr(RHk Rk), (28)

in which ζk � (P−1T σ

2n,k+ σ

2k) · tr(THT). Then the conditional

expectation of the T-MSE can be expressed as:

EH|H {T-MSE}=K∑

k=1

EHk |Hk{MSEk}

=∥∥∥∥RHT−ΠT (IL−F)

∥∥∥∥2

F+ξ · tr(THT),

(29)

where H �[ρ1H

T1 ... ρKH

TK

]Tand ξ �

∑Kk=1

(P−1

T σ2n,k + σ

2k

)tr

(RH

k Rk

).

214IEICE TRANS. COMMUN., VOL.E92–B, NO.1 JANUARY 2009

Finally, the robust transceiver optimization problem ismathematically formulated as shown below:

minΠ, F, T, {Rk}Kk=1

EH|H {T-MSE}s.t.[F]m,n = 0, ∀ 1 � m, n � L and m � n. (30)

Following the derivation of transceiver design underperfect CSIT, we can easily obtain the necessary conditionsfor the optimal robust transceiver under imperfect CSIT asfollows. For fixed Π and R, the optimal fi and ti (i =1, . . . , L) are listed below:

fi = −[0i×M

Bi

]ti, (31)

ti =

(A

Hi Ai + ξIM

)−1 [A

Hi 0

]ei, i = 1, . . . , L, (32)

where⎡⎢⎢⎢⎢⎢⎣ Ai

Bi

⎤⎥⎥⎥⎥⎥⎦ = ΠRH, Ai ∈ Ci×M , Bi ∈ C(L−i)×M. (33)

By substituting (31) and (32) into (29), we reformulateEH|H {T-MSE} as:

EH|H {T-MSE} = L

−L∑

i=1

tr

⎛⎜⎜⎜⎜⎝eHi

⎡⎢⎢⎢⎢⎣ Ai

0

⎤⎥⎥⎥⎥⎦ (AHi Ai + ξIM

)−1 [A

Hi 0

]ei

⎞⎟⎟⎟⎟⎠ . (34)

Then the suboptimal successive ordering algorithmpresented in Table 1 can be applied with H replacedby H and the expression of ξ replaced by ξ =∑K

k=1

(P−1

T σ2n,k + σ

2k

)tr

(RH

k Rk

).

For fixed Π, F and T, the optimal Rk is given by

Rk = ρkETkΠ

T (IL − F)THHHk(

ρ2kHkTTHHH

k + ζkINk

)−1. (35)

Now that we have found the inter-dependence amongΠ, F, T and {Rk}Kk=1 as shown in (31)–(35), the thread of theiterative algorithm proposed in Sect. 3 can be again adoptedhere to compute the robust transceiver. In each iteration,we first determine the suboptimal reordering matrix Π andupdate T and F using {Rk}Kk=1 updated in the last iteration,then update {Rk}Kk=1 using the above updatedΠ, T and F un-til convergence. The formulation of the algorithm is similarto that described in Table 2 except for some notation andexpression changes so we do not list the details of the algo-rithm.

The convergence of the iterative algorithm is also guar-anteed. The proof of convergence is also similar to that inAppendix.

5. Simulation Results

In this section, extensive simulation results are presented toshow the performance superiority of our proposed joint THP

Fig. 3 Performance comparison of different schemes with M = 6, K = 3,Nk = 2, Lk = 2, ∀k and QPSK.

Fig. 4 Performance comparison of different schemes with M = 6, K = 3,Nk = 2, Lk = 2, ∀k and 16-QAM.

transceiver designs in comparison with some existing THPschemes. The illustration is divided into two parts. The firstpart of this section illustrates the performance under perfectCSIT and the second part focuses on the performance un-der imperfect CSIT. We assume quasi-static i.i.d. Rayleighflat fading channel with unit channel variance between eachtransmit antenna at the BS and each receive antenna at eachuser. We also assume σ2

n,k = 1, k = 1, . . . ,K. The signal-to-noise ratio (SNR) in the following figures is defined asSNR � 10 · log10PT . Both QPSK and 16-QAM modula-tions are used in the simulations. We set the convergencethreshold in the iterative algorithms ε = 10−5.

5.1 Performance under Perfect CSIT

We examine the performance of our proposed joint THPtransceiver design under perfect CSIT in comparison withsome existing MMSE-based THP schemes.

Figures 3–6 compare our proposed stream-wise THPtransceiver (denote as “SW-THP”) with the user-wise THPtransceiver in [12] (denoted as “UW-THP”) and “TxWF-

MIAO et al.: JOINT STREAM-WISE THP TRANSCEIVER DESIGN FOR THE MULTIUSER MIMO DOWNLINK215

Fig. 5 Performance comparison of different schemes with M = 6, K = 3,Nk = 3, Lk = 2, ∀k and QPSK.

Fig. 6 Performance comparison of different schemes with M = 6, K = 3,Nk = 3, Lk = 2, ∀k and 16-QAM.

THP” in [9] in terms of average uncoded bit error rate(BER). In Fig. 3 and Fig. 4, we set M = 6, K = 3, Nk = N =2, Lk = L = 2, ∀k and use QPSK and 16-QAM respectively.For “TxWF-THP” we assume that the 2 antennas at the samereceiver are decentralized, i.e., there are 6 virtual users inthe system. From the simulation results we can see that ourscheme clearly outperforms the other two schemes. The su-periority over “UW-THP” comes from that our scheme per-forms stream-wise interference pre-cancelation while “UW-THP” only enables inter-user interference pre-cancelationand multiple streams of the same user are linearly precoded.Moreover, our scheme outperforms “TxWF-THP” becausein our scheme the received signals from multiple antennasof one user can be jointly processed while in “TxWF-THP,”the receive antennas of the same user are assumed to be de-centralized and a common scaling factor is imposed for eachsingle receive antenna, which is apparently suboptimal.

In Fig. 5 and Fig. 6, we increase N by 1 and keep theother parameters unchanged. Since N > L, “TxWF-THP”cannot be directly applied. Here we simply select UH

k as the

Fig. 7 Average uncoded 16-QAM BER performance of SW-THP underdifferent number of iterations at SNR = 22 dB.

receiver for user k, where Uk comprises the Lk dominant leftsingular vectors of Hk, then apply UH

k Hk as the equivalentLk×M channel matrix, which is suitable for implementationof “TxWF-THP.” It is shown in these two figures that ourscheme still performs best.

It is an interesting phenomenon that the comparison re-sults of “UW-THP” and “TxWF-THP” in Fig. 3 and Fig. 5are just opposite, so are those in Fig. 4 and Fig. 6. Thiscan be explained that for Figs. 3 and 4, more interferencesare pre-canceled in “TxWF-THP” than in “UW-THP.” ForFigs. 5 and 6, however, the additional receive antenna ateach user provides more diversity for each data stream in“UW-THP” through joint transceiver design while the sim-ple and suboptimal receiver structure we have adopted for“TxWF-THP” restricts its performance improvement.

We also present some simulation results to verify theconvergence of our proposed iterative algorithm. We use thenotation [M,N1(L1), ...,NK(LK)] to denote a K-user MIMOsystem that has M antennas at the BS and Nk antennasat the kth receiver to support Lk data streams for user k,k = 1, ...,K. Figure 7 shows the average uncoded BER of16-QAM versus the number of iterations for our scheme un-der two system configurations at a fixed SNR = 22 dB. Thedashed lines denote the steady state performance. It can beseen that after about 10 iterations (for [6, 2(2), 2(2), 2(2)]) or25 iterations (for [6, 3(2), 3(2), 3(2)]) the BER performanceis quite close to the steady state performance. Similar resultshold for QPSK, thus are not shown here for brevity.

5.2 Performance under Imperfect CSIT

In this subsection, we examine the performance of SW-THPintroduced in Sect. 3 and the robust version of SW-THP in-troduced in Sect. 4 under imperfect CSIT.

Figure 8 and Fig. 9 compare “SW-THP” and “RobustSW-THP” for M = 6, K = 3, Lk = 2, Nk = N = 2 and 3(the parameters are consistent with those in Sect. 5.1). Weassume σ2

e,k = σ2e = 0.03, k = 1, . . . ,K for QPSK and 0.01

216IEICE TRANS. COMMUN., VOL.E92–B, NO.1 JANUARY 2009

Fig. 8 Performance comparison of non-robust and robust SW-THP withM = 6, K = 3, L = 2, σ2

e = 0.03 and QPSK.

Fig. 9 Performance comparison of non-robust and robust SW-THP withM = 6, K = 3, L = 2, σ2

e = 0.01 and 16-QAM.

for 16-QAM since 16-QAM is more sensitive to channel er-rors than QPSK. It can be seen that the non-robust and ro-bust schemes have close performance at low SNR since thenoise is dominant in this regime. However, the gap betweenthe two schemes enlarges as the SNR increases because thechannel uncertainty gradually dominates and the efficacy ofthe robust scheme shows up.

In Fig. 10 and Fig. 11 we test the performance of “SW-THP” and “Robust SW-THP” under different channel errors.The antenna configurations are the same as above, i.e., M =6, K = 3, Lk = 2, Nk = N = 2 and 3. We fix SNR = 30 dB,and change σ2

e from 0.02 to 0.2 for QPSK in Fig. 10, and fixSNR = 40 dB, and change σ2

e from 0.01 to 0.1 for 16-QAM in Fig. 11. The simulated curves reveal that “RobustSW-THP” always holds a considerable superiority over thenon-robust “SW-THP” as the channel error increases.

We also compare our robust THP scheme with “Statis-tically Robust Tomlinson-Harashima precoding” (denotedas “Stat. Robust THP”) reported lately in [14]. The re-

Fig. 10 Performance comparison of non-robust and robust SW-THP withM = 6, K = 3, L = 2, SNR = 30 dB and QPSK for different channel errors.

Fig. 11 Performance comparison of non-robust and robust SW-THP withM = 6, K = 3, L = 2, SNR = 40 dB and 16-QAM for different channelerrors.

sults are shown in Fig. 12 and Fig. 13. We choose M =

4, K = 2, Lk = L = 2, Nk = N = 2. For QPSK weset σ2

e,k = σ2e = 0.02 and 0.04, while for 16-QAM we set

σ2e,k = σ

2e = 0.005 and 0.01. Since “Stat. Robust THP” is

designed only for the MISO downlink, it cannot be directlyapplied here. So for “Stat. Robust THP,” we assume that the2 antennas at the same receiver are decentralized, i.e., thereare 4 virtual users in the system. From Fig. 12 and Fig. 13we can find out that “Robust SW-THP” outperforms “Stat.Robust THP” in the entire SNR regime and also has a muchlower BER floor at high SNR. The advantage of our schemecomes from two aspects. First, we jointly optimize the trans-mitter, the receivers and the precoding order with consider-ation of the channel errors. In contrast, only the transmitterof “Stat. Robust THP” is robustly designed and the receiversand the precoding order are determined as if the estimatedCSI was the true one. Second, the received signals from thetwo antennas of the same user can be jointly processed using

MIAO et al.: JOINT STREAM-WISE THP TRANSCEIVER DESIGN FOR THE MULTIUSER MIMO DOWNLINK217

Fig. 12 Performance comparison of robust SW-THP and Stat. RobustTHP in [14] with M = 4, K = 2, N = 2, L = 2 and QPSK.

Fig. 13 Performance comparison of robust SW-THP and Stat. RobustTHP in [14] with M = 4, K = 2, N = 2, L = 2 and 16-QAM.

“Robust SW-THP” but can only be processed independentlyby “Stat. Robust THP.”

6. Conclusion

We have jointly designed the THP transceiver for multiuserMIMO systems under both perfect and imperfect CSIT inthis paper. First, we deal with the perfect CSIT case. Basedon the equivalent linear representation of the THP downlink,we formulate a T-MSE minimization problem under the to-tal power constraint. After some analysis, an iterative algo-rithm is proposed to obtain a locally optimal solution. Next,we consider the imperfect CSIT case. By minimizing theconditional-expectation of the T-MSE, we extend the algo-rithm derived under perfect CSIT to the robust version thatis more effective against the channel uncertainty. The twocharacteristics of our schemes, i.e., the stream-wise inter-ference pre-cancelation and the joint design of transceiver,ensure its performance superiority over the existing MMSE-based THP schmes, which has been verified by the presented

simulation results.

References

[1] G. Caire and S. Shamai (Shitz), “On the achievable throughput of amulti-antenna Gaussian broadcast channel,” IEEE Trans. Inf. The-ory, vol.49, no.7, pp.1691–1706, July 2003.

[2] P. Viswanath and D. Tse, “Sum capacity of the vector Gaussianbroadcast channel and uplink-downlink duality,” IEEE Trans. Inf.Theory, vol.49, no.8, pp.1912–1921, Aug. 2003.

[3] S. Vishwanath, N. Jindal, and A. Goldsmith, “Duality, achievablerates and sum capacity of Gaussian MIMO broadcast channels,”IEEE Trans. Inf. Theory, vol.49, no.10, pp.2658–2668, Oct. 2003.

[4] H. Weingarten, Y. Steinberg, and S. Shamai (Shitz), “The capac-ity region of the Gaussian multiple-input multiple-output broad-cast channel,” IEEE Trans. Inf. Theory, vol.52, no.9, pp.3936–3964,Sept. 2006.

[5] R.L.U. Choi and R.D. Murch, “A transmit pre-processing techniquefor multiuser MIMO systems: A decomposition approach,” IEEETrans. Wireless Commun., vol.3, no.1, pp.20–24, Jan. 2004.

[6] J. Zhang, Y. Wu, S. Zhou, and J. Wang, “Joint linear transmitter andreceiver design for the downlink of multiuser MIMO systems,” IEEECommun. Lett., vol.9, no.11, pp.991–993, Nov. 2005.

[7] C. Windpassinger, R. Fischer, T. Vencel, and J.B. Huber, “Precodingin multiantenna and multiuser communications,” IEEE Trans. Wire-less Commun., vol.3, no.4, pp.1305–1316, July 2004.

[8] V. Stankovic and M. Haardt, “Successive optimization Tomlinson-Harashima precoding (SO THP) for multi-user MIMO systems,”Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing2005, vol.3, pp.1117–1120, Philadelphia, USA, March 2005.

[9] M. Joham, J. Brehmer, and W. Utschick, “MMSE approachesto multiuser spatio-temporal Tomlinson-Harashima precoding,” 5thInt. ITG Conf. on Source and Channel Coding (SCC04), Erlangen,Germany, Jan. 2004.

[10] M. Schubert and S. Shi, “MMSE transmit optimization with inter-ference pre-compensation,” Proc. Vehicular Technology Conf. 2005-Spring, vol.2, pp.845–849, Stockholm, Sweden, May 2005.

[11] R. Doostnejad, T.J. Lim, and E. Sousa, “Joint precoding and beam-forming design for the downlink in a multiuser MIMO system,”Proc. Conf. on Wireless and Mobile Computing, Networking andCommunications 2005 (WiMob’05), vol.1, pp.153–159, Montreal,Canada, Aug. 2005.

[12] A. Mezghani, R. Hunger, M. Joham, and W. Utschick, “Itera-tive THP transceiver optimization for multi-user MIMO systemsbased on weighted sum-mse minimization,” Proc. IEEE 7th Work-shop on Signal Processing Advances in Wireless Communications(SPAWC’06), pp.1–5, Cannes, France, July 2006.

[13] M. Payaro, A. Pascual-Iserte, A. Perez-Neira, and M. Lagunas, “Ro-bust design of spatial Tomlinson-Harashima precoding in the pres-ence of errors in the CSI,” IEEE Trans. Wireless Commun., vol.6,no.7, pp.2396–2401, July 2007.

[14] M.B. Shenouda and T.N. Davidson, “Tomlinson-Harashima precod-ing for broadcast channels with uncertainty,” IEEE J. Sel. AreasCommun., vol.25, no.7, pp.1380–1389, Sept. 2007.

[15] R. Hunger, F. Dietrich, M. Joham, and W. Utschick, “Robust trans-mit zero-forcing filters,” Proc. ITG Workshop Smart Antennas,pp.130–137, Munich, Germany, March 2004.

[16] F. Dietrich, P. Breun, and W. Utschick, “Robust Tomlinson-Harashima precoding for the wireless broadcast channel,” IEEETrans. Signal Process., vol.55, no.2, pp.631–644, Feb. 2007.

[17] M.B. Shenouda and T.N. Davidson, “A framework for designingMIMO systems with decision feedback equalization or Tomlinson-Harashima precoding,” IEEE J. Sel. Areas Commun., vol.26, no.2,pp.401–411, Feb. 2008.

[18] R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge Univer-sity Press, 1985.

[19] B. Hassibi and B. Hochwald, “How much training is needed in

218IEICE TRANS. COMMUN., VOL.E92–B, NO.1 JANUARY 2009

multiple-antenna wireless links?,” IEEE Trans. Inf. Theory, vol.49,no.4, pp.951–963, April 2003.

[20] S.M. Kay, Fundamentals of Statistical Signal Processing, Volume I:Estimation Theory, Prentice Hall PTR, 1993.

[21] N. Khaled, G. Leus, C. Desset, and H.D. Man, “A robust joint linearprecoder and decoder MMSE design for slowly time-varying MIMOchannels,” Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Pro-cessing, vol.4, pp.485–488, Montreal, Canada, May 2004.

Appendix: Proof of the Convergence of the ProposedIterative Algorithm

We use the notation f(Π(n),F(n),T(n),R(n)

)to represent the

T-MSE calculated after the nth iteration of our iterative al-gorithm. If we fix Π(n) and R(n), and update F(n) and T(n)

using the optimal conditions derived in (11) and (15), thendue to the convexity of T-MSE with respect to ti, we havethe following inequality:

f(Π(n),F(n),T(n),R(n)

)� f

(Π(n),F′,T′,R(n)

), (A· 1)

where F′ and T′ are the updated matrices using Π(n) andR(n).

We define h (Π,R) � minF,T

f (Π,F,T,R). Obviously

h (Π,R) is equal to (17). From the definition of h we knowthat

h(Π(n),R(n)

)= f

(Π(n),F′,T′,R(n)

). (A· 2)

We denote the permutation matrix calculated using R(n) andthe suboptimal ordering algorithm in Table 1 as Π′. Asstated in Table 2, if Π′ leads to a smaller value of (17) thanΠ(n), then Π(n+1) = Π′. Otherwise, Π(n+1) = Π(n). Then thefollowing inequality holds:

h(Π(n),R(n)

)� h

(Π(n+1),R(n)

). (A· 3)

From the definition of h and iteration procedure, we have:

h(Π(n+1),R(n)

)= f

(Π(n+1),F(n+1),T(n+1),R(n)

). (A· 4)

Since f is also convex with respect to Rk, the following in-equality holds:

f(Π(n+1),F(n+1),T(n+1),R(n)

)� f

(Π(n+1),F(n+1),T(n+1),R(n+1)

). (A· 5)

Combing (A· 1)–(A· 5) we finally get:

f(Π(n+1),F(n+1),T(n+1),R(n+1)

)� f

(Π(n),F(n),T(n),R(n)

). (A· 6)

The inequality in (A· 6) indicates that during the iterations ofour proposed iterative algorithm, the T-MSE is decreasing.Moreover, T-MSE is obviously lower bounded by 0. There-fore, the iterative algorithm guarantees the convergence.

Thus, we have completed the proof.

Wei Miao received the B.S. degree fromthe Department of Electronic Engineering, Ts-inghua University, Beijing, China, in 2004, andis currently pursuing the Ph.D. degree with theWireless and Mobile Communication Technol-ogy R&D Center, Research Institute of Infor-mation Technology, Tsinghua University, Bei-jing, China. His research interests are in the areaof multi-user multi-antenna communication sys-tems.

Xiang Chen received the B.S. degree fromthe Department of Electronic Engineering, Ts-inghua University, Beijing, China, in 2002, andis currently pursuing the Ph.D. degree with theWireless and Mobile Communication Technol-ogy R&D Center, Research Institute of Informa-tion Technology, Tsinghua University, Beijing,China. His research interests include statisticalsignal processing, digital signal processing, andwireless communications.

Ming Zhao received the B.S. and Ph.D. de-grees in electronic engineering from TsinghuaUniversity, Beijing, China, in 1993 and 1998,respectively. He has been on the faculty of Ts-inghua University since 1998. He is currentlyan associate professor. His research interests arein the area of wireless digital communications,including information theory, channel coding,multi-user detection, statistical signal process-ing, estimation theory, spread-spectrum com-munications, and multiple antenna systems.

Shidong Zhou is a professor at TsinghuaUniversity, China. He received a Ph.D. degree incommunication and information systems fromTsinghua University in 1998. His B.S. and M.S.degrees in wireless communications were re-ceived from Southeast University, Nanjing, in1991 and 1994, respectively. From 1999 to 2001he was in charge of several projects in China3G Mobile Communication R&D Project. Heis now a member of the China FuTURE Project.His research interests are in the area of wireless

and mobile communications.

Jing Wang received B.S. and M.S. degreesin electronic engineering from Tsinghua Univer-sity, Beijing, China, in 1983 and 1986, respec-tively. He has been on the faculty of TsinghuaUniversity since 1986. He currently is a pro-fessor and the vice dean of the Tsinghua Na-tional Laboratory for Information Science andTechnology. His research interests are in thearea of wireless digital communications, includ-ing modulation, channel coding, multi-user de-tection, and 2D RAKE receivers. He has pub-

lished more than 100 conference and journal papers. He is a member ofthe Technical Group of China 3G Mobile Communication R&D Project.He serves as an expert of communication technology in the National 863Program. He is also a member of the Radio Communication Committee ofChinese Institute of Communications and a senior member of the ChineseInstitute of Electronics.