Combining beamforming and orthogonal space-time block coding

17
KUNGL TEKNISKA HÖGSKOLAN Institutionen för Signaler, Sensorer & System Signalbehandling 100 44 STOCKHOLM ROYAL INSTITUTE OF TECHNOLOGY Department of Signals, Sensors & Systems Signal Processing S-100 44 STOCKHOLM Combining Beamforming and Orthogonal Space-Time Block Coding George J¨ ongren, Mikael Skoglund and Bj¨ orn Ottersten 2001-10-01 IR–S3–SB–0127 Accepted for publication in IEEE Transactions on Information Theory

Transcript of Combining beamforming and orthogonal space-time block coding

KUNGL TEKNISKA HÖGSKOLANInstitutionen förSignaler, Sensorer & SystemSignalbehandling100 44 STOCKHOLM

ROYAL INSTITUTEOF TECHNOLOGY

Department ofSignals, Sensors & Systems

Signal ProcessingS-100 44 STOCKHOLM

Combining Beamforming and Orthogonal

Space-Time Block Coding

George Jongren, Mikael Skoglund and Bjorn Ottersten

2001-10-01

IR–S3–SB–0127

Accepted for publication in IEEE Transactions on Information Theory

1

Combining Beamforming and OrthogonalSpace-Time Block Coding

George Jongren, Mikael Skoglund and Bjorn Ottersten

Abstract— Multiple transmit and receive antennas can be

used in wireless systems to achieve high data rate commu-

nication. Recently, efficient space-time codes have been de-

veloped that utilize a large portion of the available capacity.

These codes are designed under the assumption that the

transmitter has no knowledge about the channel. In this

work, on the other hand, we consider the case when the

transmitter has partial, but not perfect, knowledge about

the channel and how to improve a predetermined code so

that this fact is taken into account. A performance crite-

rion is derived for a frequency-non-selective fading channel

and then utilized to optimize a linear transformation of the

predetermined code. The resulting optimization problem

turns out to be convex and can thus be efficiently solved us-

ing standard methods. In addition, a particularly efficient

solution method is developed for the special case of indepen-

dently fading channel coefficients. The proposed transmis-

sion scheme combines the benefits of conventional beam-

forming with those given by orthogonal space-time block

coding. Simulation results for a narrowband system with

multiple transmit antennas and one or more receive anten-

nas demonstrate significant gains over conventional methods

in a scenario with non-perfect channel knowledge.

Keywords— Array processing, space-time codes, diversity,

fading channels, wireless communication.

I. Introduction

The use of transmit diversity in wireless communicationsystems has recently received considerable attention [1–4].By utilizing antenna arrays at both the transmitter as wellas the receiver the limitations of the radio channel may beovercome and the data rates increased. The high data ratesthat these multi-input multi-output (MIMO) systems mayoffer were demonstrated in [5, 6]. Therein, calculations ofthe information theoretic capacity assuming a flat Rayleighfading environment were presented. This triggered the de-velopment of space-time codes that utilize both the spatialand temporal dimension to achieve a significant portion ofthe aforementioned capacity [2–4].

The present work considers how side information in theform of channel estimates at the transmitter can be used inconjunction with certain space-time codes. A transmissionscheme is developed which adapts a predetermined space-time code to the available channel knowledge by means ofa linear transformation. Accurate estimates are in practicenot always possible, particularly in environments where theparameters of the channel are rapidly time-varying. Thisfact is taken into account in the proposed transmissionscheme by assuming that the channel knowledge is non-

This paper was presented in part at the Asilomar Conference onSignals, Systems and Computers, October 1999 and at the IEEESensor Array and Multichannel Signal Processing Workshop, March2000. The authors are with the Department of Sensors, Signals andSystems, Royal Institute of Technology, SE-100 44 Stockholm, Swe-den.

perfect. In fact, the scheme continues to work well alsowhen the quality of the side information is low.

Early attempts at designing transmission schemes forexploiting the potential offered by antenna arrays at thetransmit side are generally concerned with increasing thediversity order of the system. Examples of such work in-clude techniques for introducing artificial frequency/phaseoffsets [7–9] and time offsets [10] between the transmittedsignals. The latter technique is commonly referred to as de-lay diversity and is closely related to the layered space-timearchitecture in [11]. A more systematic approach to findappropriate codes was pioneered in [2,12]. A major contri-bution in [2, 12] was the development of a design criterioninvolving the rank and eigenvalues of certain matrices. Thedesign criterion was later generalized to multiple receive an-tennas and to other channel models in [3], where the nowpopular notion of space-time coding was introduced. Ex-amples of trellis codes based on the design criterion werealso provided. A simple block code for two transmit an-tennas that leads to a low-complexity receiver was devel-oped in [13]. In [4], this concept was extended to up toeight transmit antennas. Furthermore, it was shown thatthese so called orthogonal space-time block codes satisfythe rank constraint of the previously mentioned design cri-terion. Consequently, these codes also provide the systemwith its maximum diversity order.

Common to the space-time coding schemes mentionedabove is that they do not exploit channel knowledge at thetransmitter. Information about the channel realization, ifit is available, should of course be utilized in order to maxi-mize the performance. As a simple example consider a sce-nario with perfect channel state information at both sidesof the communication link. It is well-known that by appro-priate linear processing at the transmitter and the receiver,the system can be transformed into a set of parallel scalarchannels. The available transmit power may then be opti-mally allocated to the individual channels. An example ofcombining such an approach with coding is found in [14].

In this paper we present a transmission scheme thatcombines the two extremes regarding the degree of chan-nel knowledge. Codes belonging to the class of orthogonalspace-time block codes [4] are linearly processed in order totake the side information into account. The side informa-tion is modeled using a purely statistical approach. Pre-vious and related work include [15, 16] and [17–19]. Whilea similar model for the side information was utilized in[17, 18] for determining suitable transmit beamformers inthe presence of channel estimation errors, those papers nei-ther considered space-time codes nor multiple antennas atthe receiver. Another possibility for modeling partial chan-

2

nel knowledge is to take on a physical perspective. This isthe approach used in [19] for adapting a predeterminedspace-time code to the particular channel. The assump-tion in that work is that the signals propagate along afinite number of directions, known at the transmitter, be-fore entering a rich local scattering environment near thereceiver.

The main contribution of the present work is the devel-opment and study of a low complexity transmission schemewhich combines the benefits of transmit beamforming andorthogonal space-time block coding. To accomplish this,a performance criterion is derived which takes the avail-able side information into account. The performance cri-terion could, in principle, be used for designing codes fromscratch. Although the derivation is to some extent similarin spirit to the derivation of the design criterion in [3], thatpaper did not treat side information and therefore utilizedcertain approximations which prohibit the resulting designcriterion from exploiting any channel knowledge. The setupconsidered herein may also be motivated by current stan-dardization proposals for the WCDMA system, where anorthogonal space-time block code is used in one of the pro-posed transmission modes whereas one of the other pro-posed modes uses transmit beamforming [20, 21].

The paper is organized as follows. In Section II, the datamodel and the model for the side information is introduced.A derivation of the performance criterion as well as somegeneral interpretations are given in Section III. The pro-posed transmission scheme is described in Section IV, lead-ing to what first seems like a non-trivial optimization prob-lem. However, a simple reformulation of the parametersto be determined turns out to give a convex optimizationproblem which permits a reasonably efficient implementa-tion. Also, closed-form expressions for asymptotically op-timum linear transformations are derived for a number ofcases. These also reveal the highly intuitive behavior of thetransmission scheme. Furthermore, a particularly efficientoptimization algorithm is presented for a class of simpli-fied fading scenarios. This algorithm is then applied inSection V to an actual example of a simplified fading sce-nario. Finally, simulation examples are presented in Sec-tion VI showing significant gains compared to conventionalbeamforming as well as conventional orthogonal space-timeblock coding.

II. System Model

As illustrated in Figure 1, a wireless communication sys-tem employing multiple antennas at both the transmitterand the receiver is considered. The transmitter is assumedto have some, but not necessarily perfect, knowledge aboutthe current channel realization. In order to utilize thischannel knowledge, while at the same time not sacrific-ing the benefits offered by conventional space-time codes,we propose the use of a transmitter consisting of a space-time encoder followed by a linear transformation W. Thespace-time encoder maps the data to be transmitted s(n),where n is a discrete-time index, into codewords that aresplit into a set of parallel and generally different symbol

sequences. These codewords are linearly transformed inorder to adapt the code to the available channel knowl-edge. As a result, a new set of parallel symbol sequencesis formed. Each symbol sequence is first pulse shaped andthen transmitted over the corresponding antenna. Finally,the transmitted data is recovered at the receiver by meansof maximum likelihood (ML) decoding.

PSfrag replacements

Space-TimeW

EncoderReceiver

.

.

.

.

.

.s(n) s(n)

Fig. 1. An overview of the system.

The information carrying signals are transmitted over awireless fading channel. The time dispersion introduced bythe channel is assumed to be short compared with the sym-bol period. Therefore, the individual channel between eachtransmit and receive antenna is frequency-non-selective.Let M and N denote the number of transmit and receiveantennas, respectively. The signal output from each re-ceive antenna is then a weighted superposition of the Mtransmitted signals, corrupted by additive noise.

By collecting the filtered and symbol sampled complexbaseband equivalent outputs from the receiving antennaarray in an N × 1 vector x(n), the received signal at timen can be written as

x(n) = H∗c(n) + e(n) ,

where (·)∗ denotes the complex conjugate transpose opera-tor and where the linearly transformed symbols, transmit-ted from the M antennas at time instant n, are representedby

c(n) =[

c1(n) c2(n) · · · cM (n)]T

= Wc(n) . (1)

Here, c(n) is the corresponding output from the space-timeencoder and W is the previously mentioned linear trans-formation matrix. As will be seen in the following sections,W is determined so as to minimize a certain upper boundon the probability of a codeword error. The noise term,e(n), is assumed to be a zero-mean, temporally and spa-tially white, complex Gaussian random process with co-variance matrix σ2IN . Furthermore, the MIMO channel isdescribed by the M × N matrix

H =

h11 h12 · · · h1N

h21 h22 · · · h2N

......

...hM1 hM2 · · · hMN

,

where h∗ij is a complex scalar denoting the channel between

3

the ith transmit antenna and the jth receive antenna1. TheMIMO channel is also represented by the MN × 1 vector

h = vec (H) ,

where vec(·) denotes the vectorization operator whichstacks the columns of its argument into a vector. The fad-ing of the channel is assumed to obey a complex Gaussiandistribution with mean vector mh and covariance matrixRhh. Note that this assumption includes both independentRayleigh fading and independent Ricean fading as specialcases. In addition, more realistic fading environments withcorrelated channel coefficients may be modeled.

A quasi-static scenario is considered where the channel isassumed to be constant during the transmission of a burstof codewords but may vary from one burst to another in astatistically stationary fashion. For simplicity, the channelrealizations are in this work assumed to be independent.However, it is straightforward to generalize the proposedmethods to a correlated fading scenario.

A. Side Information at the Transmitter

An estimate of the channel realization is assumed to beavailable at the transmitter. There are several examples ofhow such an estimate may be obtained. An explicit chan-nel can be used to feedback the channel estimates from thereceiver or if a time division duplex system is used, it is pos-sible for the transmitter to estimate the channel directly.In order to use the latter method in frequency division du-plex systems, estimates of the channel in the receive modemust first be transformed to the desired carrier frequencybefore they can be utilized in the transmission mode. Ingeneral, the channel estimates are not only noisy but mayalso be outdated due to feedback delay or duplex time. Inaddition, the frequency shift transformation in frequencydivision duplex systems is another possible source of error.

For simplicity, the receiver is from now on assumed tohave perfect channel knowledge. We make this assump-tion in order to simplify the exposition. However, it isstraightforward to extend the following development to alsotake channel estimation errors at the receiver into account.Moreover, keep in mind that the development in this sec-tion is exemplified by, but not limited to, systems wherethe channel estimates are obtained through a dedicatedfeedback channel.

The channel estimates at the transmitter are assumed tobe correlated (to an arbitrary degree) with the true chan-nel realization. This assumption is motivated, for exam-ple, by the well-known Jakes model [22, p. 26], which de-scribes the variations of the channel, due to movement ofthe mobile receiver, as a function of time. In this model,the channel coefficients are samples of a stationary Gaus-sian process with an autocorrelation function proportionalto J0(2πfmτ), where J0(x) is the zero-order Bessel func-tion of the first kind, τ denotes the time lag and fm is the

1Since the focus of this work is on the transmitting side, it is con-venient to define the MIMO channel using H∗, as opposed to H, sothat each column of H represents the vector channel between thetransmitter’s antenna array and the corresponding receive antenna.

maximum Doppler frequency. Hence, the outdated channelestimates available at the transmitter are correlated withthe current channel and the amount of such correlation isdetermined by the time it takes to feedback the estimates.

For the purpose of describing the side information, letthe matrix H, with the corresponding channel coefficientshij , denote the estimate of H available at the transmit-

ter. Let h denote the vectorized counterpart. We assumethat h and h are jointly complex Gaussian. With obviousnotation, the statistics of the side information and its rela-tion to the true channel are now completely described bythe mean vector m

h, the covariance matrix R

hhand the

cross-covariance matrix Rhh

. In view of the Jakes model,the joint Gaussian assumption is reasonable since the sideinformation and the true channel are samples of the sameGaussian random process.

Clearly, the quality of the side information is closely re-lated to the degree of correlation with the true channel, asrepresented by the cross-covariance matrix. A more generalmeasure of the quality of the side information that will beused extensively in this paper is the covariance of the truechannel, conditioned on the side information. Let R

hh|h

denote this quantity. Since Rhh|h describes the remaining

uncertainty when the side information is known, it shouldbe appearant that, loosely speaking, high quality side in-formation corresponds to a small R

hh|h (measured in a

suitable norm) whereas a large Rhh|h corresponds to side

information of low quality. Let us now formally define thetwo notions of “perfect side information” (or “perfect chan-nel knowledge”) and “no side information” (or “no channelknowledge”) as:

• ”Perfect side information” ⇔ ‖Rhh|h‖ → 0

• ”No side information” ⇔ ‖R−1

hh|h‖ → 0

Here, ‖ · ‖ denotes the spectral norm [23, p. 295]. A salientconsequence of this choice of measure is that the distribu-tion of the true channel is considered as part of the channelknowledge as well.

III. Performance Criterion

In this section, we derive a performance criterion forspace-time codes which takes the quality of the side in-formation into account. Rather than at this point limitingthe application of the performance criterion to the pro-posed transmission scheme, the development in this sectionis structured is such a manner that the derived performancecriterion is applicable to the design of a wide class of space-time codes. In the following sections, the performance cri-terion will be applied to the proposed transmission scheme.

As is shown next, a suitable performance criterion isbased on the pairwise codeword error probability, condi-tioned on the side information. We start by motivatingthis fact. Let C = C1, · · · ,CK denote the set of code-words, where K is the number of codewords. Assume thecodewords are of length L. Each codeword is described byan M × L matrix

Ck =[

ck(0) ck(1) · · · ck(L − 1)]

, k = 1, · · · , K ,

4

where ck(n) is the nth transmitted vector of the kth code-word. Assume that a codeword C ∈ C is transmitted. Thereceived signal vectors corresponding to one codeword maythen be arranged in an N × L matrix X, given by

X = H∗C + E ,

where E is a matrix of noise vectors. The receiver is as-sumed to employ ML decoding of the codewords based onideal channel state information. For the problem at handthis amounts to decoding the codewords according to

C = argminC∈C

‖X−H∗C‖2F ,

where C denotes the codeword chosen by the receiver and‖ · ‖F is the Frobenius norm. The previously mentionedpairwise codeword error probability, conditioned on theside information2, can now be given a clear meaning. Itis denoted by P (Ck → Cl|h) and defined, for k 6= l, as

Pr[

‖X−H∗Ck‖2F > ‖X−H∗Cl‖2

F |C = Ck, h]

,

which is the probability that, given a transmitted code-word Ck, the metric corresponding to the codeword Cl issmaller. Obvious variations of this notation will also beused. Let Pr[C 6= C] denote the codeword error proba-bility, i.e. the probability that C is different from C. Theoverall design goal is to minimize this quantity with respectto the set of codewords. Since side information is available,the set of possible codewords, C, is a function of the chan-nel estimate h. Conditioning on the side information givesthe following relation

Pr[C 6= C] =

Pr[C 6= C|h]ph(h)dh ,

where ph(h) is the probability density function (PDF)

of the side information. It is now clear that minimizingPr[C 6= C|h] for each h also minimizes Pr[C 6= C], since

ph(h) > 0. In order to obtain a closed-form expression for

Pr[C 6= C|h], we make the common assumption that thesignal-to-noise-ratio (SNR) is sufficiently high for the unionbound technique to be applicable. More specifically, it isassumed that the largest pairwise codeword error probabil-ity, conditioned on the side information, is the dominatingterm in the union bound. Thus, P (Ck → Cl|h) is a rea-sonable performance criterion. Note that, although basedon a high SNR assumption, we will demonstrate that theperformance criterion can be successfully utilized also insituations when the SNR can not be considered high.

We now turn our attention to deriving a closed-form ex-pression for the performance criterion. Similarly to [3],we start by conditioning on the true channel realizationand utilize a well-known upper bound on the Gaussian tailfunction to arrive at

P (Ck → Cl|h, h) ≤ 1

2e−d2(Ck ,Cl)/4σ2

, (2)

2For notational convenience, we will sometimes refer to this as sim-ply the ”pairwise error probability”.

whered2(Ck,Cl) = ‖H∗(Ck −Cl)‖2

F (3)

is the Euclidean distance between the codewords. At thispoint, the following standard relations, found in e.g. [24,p. 121-122], turn out to be useful,

tr(AB) =(

vec(A∗))∗

vec(B) (4)

vec(ABC) = (CT ⊗A) vec(B) . (5)

Here, ⊗ denotes the Kronecker product and tr(·) is thetrace operator. By utilizing (4) and (5) it is possible torewrite (3) as

d2(Ck,Cl) = tr(

H∗A(Ck,Cl)H)

=(

vec(

A(Ck,Cl)H)

)∗

vec(H)

= h∗(

IN ⊗A(Ck,Cl))

h , (6)

where, similarly to [3],

A(Ck,Cl) = (Ck −Cl)(Ck −Cl)∗

contains the codeword pair. Since the true channel and theside information is jointly complex Gaussian, the PDF ofthe true channel, conditioned on the side information, isalso a complex Gaussian PDF, given by

ph|h(h|h) =

e−(h−m

h|h)∗R−1

hh|h(h−m

h|h)

πMN det(Rhh|h)

, (7)

where det(·) denotes the determinant operator and wherem

h|h and Rhh|h represent the conditional mean and covari-

ance, respectively. By averaging both sides of (2) over thedistribution in (7), an upper bound to the pairwise errorprobability is formed as

P (Ck → Cl|h) ≤∫

1

2e−d2(Ck,Cl)/4σ2

ph|h(h|h)dh . (8)

This upper bound is denoted by V (Ck,Cl). Now, intro-duce the following expression,

Ψ(Ck,Cl) =(

IN ⊗A(Ck ,Cl))

/4σ2 + R−1

hh|h.

After expanding the exponent of (7) and combining it with(6), it is straightforward to verify that the sum of the ex-ponents in the integrand of (8) can be written as

m∗h|h

R−1

hh|h

(

Ψ−1 −Rhh|h

)

R−1

hh|hm

h|h

−(

h− Ψ−1R−1

hh|hm

h|h

)∗Ψ(

h −Ψ−1R−1

hh|hm

h|h

)

,

where the dependence on the codeword pair has been tem-porarily omitted in order to simplify the notation. Theintegral in (8) is now easily solved by making use of thefact that

e−

h−Ψ

−1R

−1

hh|hm

h|h ∗Ψ

h−Ψ

−1R

−1

hh|hm

h|h πMN det(Ψ−1)

dh

5

is the integral of a complex Gaussian PDF and thus equalto one. Consequently, the upper bound in (8) can be ex-pressed as

V (Ck,Cl) =em

∗h|h

R−1

hh|h(Ψ−1−R

hh|h)R−1

hh|hm

h|h

2 det(Rhh|h) det(Ψ)

.

Taking the logarithm and neglecting parameter indepen-dent terms, yields the desired form of the performance cri-terion as

`(Ck,Cl) = m∗h|h

R−1

hh|hΨ(Ck,Cl)

−1R−1

hh|hm

h|h

− log det(

Ψ(Ck,Cl))

. (9)

We stress that this result constitutes a new performancecriterion for channel estimate dependent space-time codes.One possible approach to designing the correspondingcodewords is to minimize the maximum, taken over allcodeword pairs, of `(Ck,Cl). For a similar solution basedon the classic design criterion, see e.g. [25]. However, thisprocedure is deemed too computationally demanding forthe side information model under consideration, since theoptimum codewords depend on the actual channel estimateand the number of possible channel estimate realizations isinfinite. Hence, a complicated optimization problem wouldhave to be solved in realtime for each new channel estimatethat arrives at the transmitter.

On the other hand, for the case of quantized channel esti-mates, such an approach may be perfectly viable due to thefinite number of channel estimate realizations. The entirechannel estimate dependent space-time code can thereforebe precalculated and stored in a lookup table, suitable forrealtime use. Variations of this approach are further ex-plored in [26]. However, such a study is beyond the scopeof the present work, since we focus on unquantized channelinformation. Instead, certain constraints on the code willbe introduced in the sections to follow in order to arrive ata tractable scheme.

A. Interpretations of the Performance Criterion

The two terms in (9), i.e. in the performance criterion,can be given some interesting interpretations. The firstterm mainly deals with the channel knowledge obtainedfrom the actual realization of the channel estimate, as con-tained in m

h|h. The second term, on the other hand, doesnot depend on the realization of the channel estimate andtherefore strives for a code design suitable for an open-loop system, which has no side information except priorknowledge of the distribution of the true channel. Thisinterpretation is further supported by considering the twospecial cases of perfect side information (‖R

hh|h‖ → 0) and

of no side information (‖R−1

hh|h‖ → 0), respectively. In the

first case, the first term is seen to dominate and in the sec-ond case, the second term is dominating. The performancecriterion is in the second case equivalent to

`(Ck,Cl) =1

detA(Ck ,Cl), (10)

which is basically the same as the criterion used in [2,3] fordesigning conventional space-time codes.

As a final remark, it should be emphasized that the pro-posed performance criterion can also be used for designingconventional space-time codes in various scenarios involv-ing open-loop systems. To see this, bear in mind that ifthe side information is statistically independent of the truechannel, the performance criterion reduces to

`(Ck,Cl) = m∗hR

−1hh

Ψ(Ck,Cl)−1R−1

hhmh

− log det(

Ψ(Ck,Cl))

, (11)

where now Ψ(Ck,Cl) =(

IN ⊗ A(Ck,Cl))

/4σ2 + R−1hh

.Thus, the development in the present work also appliesto situations where the transmitter only knows the distri-bution of the true channel and nothing about the currentrealization. This version of the performance criterion istherefore closely related to the design criterion in [2,3]. Infact, after some simple manipulations, it can be shown that(11) includes several of the results from the various fadingscenarios in [2, 3] as special cases.

IV. The Transmission Scheme

This section deals with the construction of a tractabletransmission scheme based on the performance criteriongiven by (9). As discussed in the previous section, one ob-vious alternative is to design a space-time code using theproposed performance criterion and an exhaustive searchover all possible codewords. However, a more viable ap-proach, for the scenario under consideration, is taken inthis work.

As outlined in Section II, we assume that a space-timecode is already determined and try to improve the code bya linear transformation. Hence, from (1) it follows that thekth transformed codeword may be written as

Ck = WCk ,

where W is an M × M matrix, shared by all codewords,and Ck ∈ C is the kth predetermined codeword. Here, thepredetermined set of codewords, C, is defined in a similarmanner as C. In order to limit the average output power,the constraint ‖W‖2

F = 1 is imposed. Furthermore, orthog-onal space-time block codes as found in [4] are considered.These codes are designed for open-loop type of systems andhave the appealing property that

A(Ck,Cl) = µklIM , ∀k 6= l , (12)

where µkl is a scaling factor which depends on the codewordpair. Substituting Ck = WCk, Cl = WCl and (12) into(9) leads to the performance criterion

`(WW∗, µkl) = m∗h|h

R−1

hh|hΨ(WW∗, µkl)

−1R−1

hh|hm

h|h

− log det(

Ψ(WW∗, µkl))

, (13)

where

Ψ(WW∗, µkl) = (IN ⊗WW∗)µkl/4σ2 + R−1

hh|h.

6

As seen, we have made a slight abuse of notation and re-tained `(·) and Ψ(·) as function names, even though thearguments have changed. Obvious variations of this nota-tion will be used in the following.

The dependence on the codeword pair is now onlythrough the scaling factor µkl. Since (13) is a decreas-ing function of µkl, the error probability is dominated bythe codeword pairs corresponding to the minimum µkl.Thus, only one such pair is considered in the optimiza-tion procedure. An optimal W (optimal in the sense thatit minimizes the criterion function under consideration)could now, at least in principle, be determined by mini-mizing `(WW∗, µmin) with respect to W, while satisfyingthe power constraint. Here, µmin denotes the minimum µkl.However, a highly challenging optimization problem, with acriterion function possessing multiple minimas, would needto be solved.

In order to obtain a tractable solution, we take on analternative approach involving a reparameterization. Aninspection of both (13) and the power constraint suggeststhe parameterization Z = WW∗. A two step procedureis now used for finding an optimal solution to the problemoutlined in the previous paragraph. Rewriting the criterionfunction and the constraints in terms of the new parametersgives the following optimization problem

Zopt = arg minZ

Z=Z∗0

tr(Z)=1

`(Z) , (14)

where Z 0 means that Z is positive semidefinite3.With a slight abuse of notation, the performance criterion`(WW∗, µmin) is here written as

`(Z) = m∗h|h

R−1

hh|h

(

(IN ⊗ Z)η + R−1

hh|h

)−1R−1

hh|hm

h|h

− log det(

(IN ⊗ Z)η + R−1

hh|h

)

, (15)

where η = µmin/4σ2. An optimal linear transformation

is finally obtained as Wopt = Z1/2opt , where (·)1/2 is a ma-

trix square root such that Zopt = WoptW∗opt. Note that

a square root always exists since Zopt is a non-negativedefinite matrix. Clearly, the solution is not unique.

The described reformulation is attractive since (14) isnow a convex optimization problem. To see that the cri-terion function is convex, first note that Ψ(Z, µmin) de-fines an affine transformation of Z and that Ψ(Z, µmin) ispositive definite over the set of all positive semidefinite Z.Since affine transformations preserve convexity [27,28], wecan now establish the convexity of the criterion function byshowing that

m∗h|h

R−1

hh|hΨ−1R−1

hh|hm

h|h + log det(Ψ−1) (16)

is convex over the set of positive definite matrices Ψ.To see that the first term is convex, we utilize a theorem

saying that a function is convex over a set X if it is convex

3In general, we take A B and A B to mean that the matrixA − B is positive definite and positive semidefinite, respectively.

when restricted to any line that intersects X [29, p. 94].For this purpose, let Ψ = λΨ1 + (1 − λ)Ψ2, where Ψ1 =Ψ∗

1, Ψ2 = Ψ∗2 are arbitrary positive definite matrices and

0 ≤ λ ≤ 1, represent any line in the set of positive definitematrices. By making use of the identity

∂X(θ)−1

∂θ= −X(θ)−1 ∂X(θ)

∂θX(θ)−1 ,

the second order derivative of the first term with respectto λ is easily obtained as

2m∗h|h

R−1

hh|hΨ−1(Ψ1−Ψ2)Ψ

−1(Ψ1−Ψ2)Ψ−1R−1

hh|hm

h|h .

This quadratic form is non-negative since the matrixR−1

hh|hΨ−1(Ψ1 − Ψ2)Ψ

−1(Ψ1 − Ψ2)Ψ−1R−1

hh|his positive

semi-definite. Because the second order derivative is non-negative it follows that the first term is convex with respectto λ (see e.g. [29, p. 91]) and thus, according to the theoremin [29, p. 94], also over the set of positive definite matricesΨ.

The convexity of the second term is established in e.g.[23, p. 466]. Consequently, (16) is convex over all positivedefinite Ψ. Due to the affine relation between Ψ and Z,the criterion function is convex also with respect to Z. Itis easily verified that the constraints are convex [27, 28].The entire optimization problem is therefore convex, whichimplies that all local minima are also global minima.

We will not go into great detail describing an algo-rithm that solves this particular optimization problem,since there are a number of standard techniques that areapplicable. For example, interior point methods can beused for efficiently solving this kind of problem [30].

A. Asymptotic Properties of the Solution

Although the optimization problem given by (14) mustin general be solved numerically, there are a few specialcases that permit a closed-form solution. These specialcases concern the asymptotic properties of the solution. Inparticular, attention is here turned toward the behavior ofthe solution when the channel quality is perfect and whenthere is no channel information, respectively. In addition,the influence of the SNR level is investigated. The solu-tions turn out to agree well with intuition and allow forsome interesting interpretations. Detailed derivations canbe found in Appendix A.

In the first case, no channel knowledge is assumed, i.e.‖R−1

hh|h‖ → 0. The criterion function used in (14) is then

minimized for Zas = IM/M . As a result, the optimal lin-ear transformation is a scaled unitary matrix, an obviouschoice given by Was = IM/

√M . Thus, the codewords are

transmitted without modification. This makes sense con-sidering the assumptions under which the predeterminedspace-time code was designed. It also makes sense in viewof the fact that the transmitter does not know the channeland therefore has to choose a ”neutral” solution. We re-fer to the resulting transmission technique as conventionalorthogonal space-time block coding (OSTBC).

7

The second case concerns infinite SNR, in the sense thatη = µmin/4σ2 → ∞. Similarly to the case of no channelknowledge, the optimal linear transformation can be cho-sen to be a scaled unitary matrix. This indicates that theusefulness of channel knowledge diminishes as the SNR in-creases. Simulation results described in Section VI furthersupport this claim.

In the third case, the channel knowledge is assumed to

be perfect. Let Ω(m)k denote the kth block of size M × M

on the diagonal of mh|hm

∗h|h

and define Θ as

Θ =

N∑

k=1

Ω(m)k . (17)

To simplify the analysis it is assumed that one of the eigen-values of Θ is strictly larger than all the others. This as-sumption is further commented on in the following. Study-ing the behavior of the solution as ‖R

hh|h‖ → 0 gives thefollowing asymptotically optimal linear transformation,

Was =[

vM 0 · · · 0]

, (18)

where vM is the eigenvector of Θ corresponding to thelargest eigenvalue.

Due to the special structure of orthogonal space-timeblock codes, and since only one column of Was is non-zero,(18) may be interpretated as beamforming in the directionof vM . To see this, consider for example the two trans-mit antenna case and assume that the codewords of thepredetermined space-time code are given by

C =

[

s(n) s(n + 1)−s(n + 1)∗ s(n)∗

]

, (19)

where s(n) is a sequence representing the data symbolsto be transmitted. The code in (19) is the well-knownAlamouti space-time code [13]. By utilizing the asymptoticresult in (18) and the expression for the space-time code itis seen that the signal transmitted over the two antennasduring time instant n and n + 1 can be written as

C = WasC =[

c(n) c(n + 1)]

=[

vMs(n) vMs(n + 1)]

.

Clearly, beamforming in the direction of vM is performed.The present development can be generalized to all the or-thogonal space-time block codes found in [4].

Note that because of the perfect channel knowledge,m

h|h → h in the mean square sense, i.e. mh|h is essentially

the same as h. Consequently, mh|hm

∗h|h

→ hh∗, which in

turn means that Θ → HH∗, both in mean square. Hence,for all practical purposes, vM can be considered equal tothe left singular vector of H corresponding to the largestsingular value [23, p. 414]. Thus, the transmission is nowconducted in much the same way as in a scheme whichutilizes the singular value decomposition of the channelmatrix to convert the MIMO system into a set of paral-lel subchannels. Such a method was examined for exam-ple in [14] where a water-filling procedure was used forallocating transmit power (and thus distribution of data

rates) among all the subchannels. However, our transmis-sion scheme differs, among other things, in that only thestrongest subchannel is used. This is due to the structure ofthe underlying orthogonal space-time block code. Becauseof the orthogonality, the decoding of the constituent datasymbols decouples, allowing the transmission scheme to bestudied by considering the symbols separately from eachother. For example, consider again the Alamouti code in(19) and observe that the contribution to the transmittedsignal from s(n) is

C = Was

[

s(n) 00 s(n)∗

]

=[

c(n) c(n + 1)]

=[

w1s(n) w2s(n)∗]

,

where w1 and w2 are the columns of Was. It is now clearthat to maximize the SNR for both c(n) and c(n + 1), thetwo columns of Was should be matched to the channel, i.e.both should be parallel to the strongest left singular vectorof the channel. Similar arguments apply to s(n + 1) andalso to other orthogonal space-time block codes.

Also note that the assumption that Θ has a strictlylargest eigenvalue is weak. The reason why is because ofthe often random nature of m

h|h (or h, since mh|h → h),

and thus also of Θ, in practical fading scenarios. In par-ticular, the probability that the assumption is violated is,except for some degenerate choices of the first and secondorder moments of the channel, in general vanishingly smallfor the Gaussian fading model used herein. This is for ex-ample the case in the simplified fading scenario describedin Section V.

Finally, in the fourth case we consider an SNR valuetending to zero, i.e. η → 0. It turns out that the resultis similar to the one derived in the previous case. Hence,the asymptotically optimal linear transformation is againgiven by (18). However, Θ is now defined as

Θ =

N∑

k=1

Ω(m)k + Ω

(R)k ,

where Ω(R)k is the kth block of size M ×M on the diagonal

of Rhh|h. Again, the existence of an eigenvalue of Θ that

is strictly larger than all the others is assumed.Note that in the case of one receive antenna, the beam-

forming strategy proposed in [17], although derived using adifferent performance criterion, is seen to also give a beam-former proportional to vM . The approach taken on in[17] was to minimize the average SNR. As also pointedout therein, such a performance criterion makes sense forsmall SNR values. Hence, the result for the fourth case inthis section provides a generalization of the correspondingresult in [17] to multiple receive antennas when a predeter-mined space-time code is used.

B. An Algorithm for a Simplified Scenario

In this section we consider a simplified fading scenarioin order to obtain a semi closed-form solution of the opti-mization problem given in (14). In spite of the existence of

8

a fairly efficient numerical optimization technique for thegeneral case, the complexity of the algorithm described inthis section is substantially lower.

Let us first make the simplifying assumption that theconditional covariance matrix is diagonal, also expressedas R

hh|h = αIMN . Here, α represents the conditionalvariance of the channel coefficients. A scenario where thisassumption is reasonable is considered in Section V. By

introducing Υ = 1α

∑Nk=1 Ω

(m)k , and neglecting parameter

independent terms, it is now possible to rewrite the perfor-mance criterion (15) in the following way

`(Z) =1

αm∗

h|h

(

(IN ⊗ Zαη) + IMN

)−1m

h|h

− log det(

(IN ⊗ Zαη) + IMN

)

=1

αtr(

(

IN ⊗ (Zαη + IM ))−1

mh|hm

∗h|h

)

− log det(

IN ⊗ (Zαη + IM ))

= tr(

(Zαη + IM )−1Υ)

− N log det(Zαη + IM ) ,

where the second equality is due to the well-known relationtr(AB) = tr(BA). In order to minimize this criterion welet Z = VΛV∗ and Υ = VΛV∗ represent the eigenvaluedecomposition (EVD) of Z and Υ, respectively. The diag-onal elements of Λ and Λ, representing the eigenvalues, arehere denoted by λiM

i=1 and λiMi=1, respectively. In each

set the eigenvalues are assumed to be sorted in ascendingorder. It is also assumed that V and V are unitary. Substi-tuting for this new parameterization into `(Z) and into theconstraints results in an equivalent optimization problemgiven by the criterion function

`(λiMi=1,V) = tr

(

(Λαη + IM )−1V∗VΛV∗V)

− N log det(Λαη + IM ) , (20)

subject to the constraints

M∑

k=1

λk = 1 (21)

λi ≥ 0, i = 1, · · · , M (22)

λ1 ≤ λ2 ≤ · · · ≤ λM . (23)

It is seen that V is independent of the constraints and thatit only affects the first term in (20). Keeping Λ constantand following the development in [31, p. 131], the optimumV can then be chosen as V = V. For this to hold, (23) isneeded.

The remaining optimization problem is clearly convex.The solution may therefore be obtained by means of theKarush-Kuhn-Tucker (KKT) conditions [29, p. 164]. Tem-porarily relaxing the problem by omitting the last con-straint, and then finding a set of eigenvalues which sat-isfy the KKT conditions for the relaxed problem, is theapproach used for deriving the solution. A detailed deriva-tion is provided in Appendix B. The optimal eigenvalues

for the relaxed problem turn out to be given by

λi = max

0,αηN +

α2η2N2 + 4αηλiµ

2αηµ− 1

αη

, (24)

where µ is the Lagrange multiplier corresponding to thepower constraint. Note that this is also the optimum forthe original problem since the above solution automaticallysatisfies (23).

The value of µ is obtained by inserting (24) into thepower constraint (21) and solving the resulting equation.One possible procedure for accomplishing this is now de-scribed. To start with, assume that the number of eigen-values equal to zero in the optimum solution is known. Letl − 1 denote this quantity. Inserting (24) into the powerconstraint (21) then gives the equation

1 +M − l + 1

αη−

M∑

k=l

αηN +

α2η2N2 + 4αηλkµ

2αηµ= 0 ,

(25)from which µ can be determined. Let f(µ, l) represent theleft hand side of the equation. Since f(µ, l) is strictly in-creasing as a function of µ, the solution is unique and maybe found numerically. For example, applying Newton’smethod gives rapid convergence. In this case, a suitablestarting value is

µ =αη(M − l + 1)2

(M − l + 1 + αη)2

(

λ1 +N(M − l + 1 + αη)

M − l + 1

)

,

obtained by using equal power on all eigenvectors whoseeigenvalues are assumed to be non-zero. In order to ar-rive at the correct value of l, an iterative approach is usedwhere, starting at l = 1, successive values of l are tried. Analgorithm similar to the one utilized when computing thewell-known water-filling power profile can be used for thispurpose [32, p. 253]. The optimum linear transformation isfinally obtained by an appropriate matrix square root of Z.Thus, the whole procedure can be summarized as follows1. Set l = 12. Solve f(µ, l) = 0 with respect to µ

3. Compute λi =αηN+

√α2η2N2+4αηλiµ

2αηµ − 1αη , i = l, · · · , M

4. If λl < 0 then set λl = 0, l = l + 1 and repeat from 25. Compute Wopt = VΛ1/2

Note that we have tacitly assumed that the predeterminedspace-time code is designed for M transmit antennas, sinceW is a square matrix. However, the algorithm for the sim-plified scenario can easily be adapted to also handle the im-portant case of an M×M ′ linear transformation, where M ′

denotes the number of rows in the predetermined codewordmatrix and where M ′ ≤ M . Towards this end, the start-ing value of l should be modified to M − M ′ + 1 and onlythe columns of Wopt that correspond to λiM

i=M−M ′+1

should be retained from execution step five. In this way, asimple predetermined code, designed for a small number oftransmit antennas, can be used in conjunction with a muchlarger antenna array. Such a transmission scheme is also

9

interesting in view of the fact that orthogonal space-timecodes exist for only a limited number of transmit antennas[4].

V. A Simplified Scenario

Let us now detour from the general complex Gaussianfading assumption and instead consider a simplified fadingscenario. The transmission scheme from Section IV-B willin this section be tailored specifically to this scenario.

In the simplified scenario, it is assumed that the anten-nas at both the transmitter and the receiver are spacedsufficiently far apart so that the fading is independent. Arich scattering environment with non-line-of-sight condi-tions is also assumed. It is then reasonable to model thetrue channel coefficients hij as independent and identicallydistributed (i.i.d.), zero-mean complex Gaussian. Let σ2

h

denote the variance of each individual channel coefficient.The coefficients of the channel estimates hij are modeledin the same way. Similar to as in [17], each estimated

channel coefficient hij is assumed to be correlated withthe corresponding true channel coefficient hij , and uncor-related with all others. In order to describe the degreeof correlation, introduce the normalized correlation coef-ficient ρ = E[hij h

∗ij ]/σ2

h. Thus, assuming hij and hij arejointly complex Gaussian, the distribution of the true chan-nel and the side information is completely characterized bythe covariance matrices Rhh = σ2

hIMN , Rhh

= σ2hρIMN ,

Rhh

= σ2hIMN and the mean vectors mh = m

h= 0.

Straightforward calculations show that this model leads toa conditional channel distribution described by

mh|h = ρh, R

hh|h = σ2h(1 − |ρ|2)IMN . (26)

Although the previous measure of channel quality, as rep-resented by R

hh|h, can be retained in this scenario, we optfor ρ as the quantity describing the channel quality. Sucha quantity was also used in [17]. Perfect channel knowl-edge now corresponds to ρ → 1. As seen from (26), thisin turn implies that ‖R

hh|h‖ → 0. Hence, we also haveperfect channel quality as defined by our original channelquality measure. On the other hand, no channel knowl-edge corresponds to ρ → 0. For this case, ‖R−1

hh|h‖ does

not tend to zero. The two measures thus disagree. How-ever, what seems like an inconsistency is really not, sincethe asymptotically optimum linear transformation can forthis case be shown to be Was = IM/

√M , regardless of

which of the two quality measures is used. The similarityin the asymptotic results is explained by the inherent sym-metry in the distribution implied by (26) as ρ → 0. Dueto the symmetry, the distribution can be considered non-informative from the perspective of a transmitter, resultingin an open-loop type of system. This is clearly not true ingeneral, since even if h and h are uncorrelated, the dis-tribution of the true channel represents a form of channelknowledge on its own.

A. Applying the Transmission Scheme

This subsection deals with how the transmission schemethat was described in Section IV-B can be customized for

the simplified scenario. In addition, the behavior of theoptimal linear transformation is studied.

In order to use the transmission scheme for the simplifiedscenario, α and Υ need to be computed. Based on (26), itis seen that

Υ =1

α

N∑

k=1

Ω(m)k =

|ρ|2α

HH∗ , (27)

where α = σ2h(1− |ρ|2). It is now straightforward to apply

the algorithm described in Section IV-B.Let us investigate how the transmission scheme dis-

tributes the available power. Assuming perfect side infor-mation, i.e. ρ → 1, the asymptotic result for the perfectchannel knowledge case, discussed in Section IV-A, is appli-cable. Hence, all the power is allocated to the eigenvectorvM , corresponding to the largest eigenvalue of Υ, i.e. onlyλM is non-zero. In view of (27), it is clear that vM is alsothe strongest left singular vector of H.

The second case considers no channel knowledge, i.e.ρ → 0. From (27) it is obvious that Υ then tends to zero,

which means that the corresponding eigenvalues λiMi=1

also tend to zero. Hence, from (24) it follows that λiMi=1

will all be equal. As a result, Was = V/√

M . Such a lin-ear transformation implies that Zas = IM/M , which con-stitutes a transmission scheme equivalent4 to conventionalOSTBC.

For the special case when the number of antennas ateither the transmitter or the receiver is two or lower, i.e.minM, N ≤ 2, the transmission scheme can be further

simplified. Only two of the eigenvalues λi are then non-zero, since Υ is the sum of N rank one matrices of size M×M . This allows the transmission scheme to be simplified byreorganizing the terms in (25) and then squaring repeatedlyso that a polynomial equation is obtained. For example,consider a system with one receive antenna and assume thesimplified scenario. It follows that

Υ =|ρ|2α

hh∗ .

Analytical expressions for the eigenvalues, as well as forthe eigenvector corresponding to the largest eigenvalue, areeasily found to be given by

λ1 = · · · = λM−1 = 0, λM =|ρ|2α

‖h‖2

and

vM =h

‖h‖, (28)

respectively. Tedious but straightforward calculations nowshow that the procedure for determining the optimumeigenvalues reduces to1. Let κ = α(M + αη) and compute

µ =η(

κ(2M−1)+|ρ|2‖h‖2+√

2κ(2M−1)|ρ|2‖h‖2+|ρ|4‖h‖4+κ2

)

2(M+αη)2

4Equivalent in the sense that the corresponding values of the crite-rion function are the same.

10

2. Compute λ = 1µ − 1

αη3. If λ > 0 then set λ1 = · · · = λM−1 = λ and compute

λM = 1 − (M − 1)λ4. If λ ≤ 0 then set λ1 = · · · = λM−1 = 0, λM = 1Although we have assumed the simplified fading scenario,the development generalizes easily to all scenarios whereR

hh|h is diagonal. One important example of such a sce-nario is line-of-sight conditions in which the mean valueof the true channel is non-zero, e.g. an environment withRicean fading.

By analyzing the above procedure it is possible to makesome interesting observations regarding the distribution ofpower among the eigenmodes. The expression for λ in thesecond step of the procedure is clearly decreasing as a func-tion of ‖h‖. Hence, when ‖h‖ is above some threshold, theexpression after the comparison in step four will be exe-cuted and all the power is allocated to the direction of thechannel estimate h. On the other hand, falling below thethreshold means that a part of the total power is allocatedto h and the remaining power is divided equally betweenthe M − 1 directions orthogonal to the channel estimate.Recall that σ2 is inversely proportional to η. A slightlymore involved analysis then shows that the power distribu-tion behaves similarly with respect to the noise variance σ2

as well. The opposite behavior is observed for the fadingvariance σ2

h. Thus, when σ2h is below a certain threshold

all power is allocated to h, whereas exceeding the samethreshold leads to a portion of the total power being allo-cated to h and the remaining power equally divided amongthe orthogonal directions. Simulation results presented inSection VI illustrate how the allocation of power affects theperformance.

In the case of only two transmit antennas, a closed-formexpression for Wopt may be formulated. For this purpose,

let h1 and h2 denote the two elements of h. The eigenvec-tor v2 is obtained from (28), whereas the other remainingeigenvector is obtained by forming a vector orthogonal toh. It follows that the optimal linear transformation can bewritten as

Wopt =1

|h1|2 + |h2|2

[

−h∗2 h1

h∗1 h2

][√λ1 00

√λ2

]

.

Thus, our transmission scheme basically consists of athreshold test and some simple computations which areeasily implemented using a lookup table. The complexityof the algorithm must therefore be considered very low.Note that this agrees well with the information theoreticresults outlined in [33], where a similar threshold effect wasobserved.

Based on the assumptions in the present section and us-ing the corresponding transmission scheme, Wopt may nowbe efficiently determined. However, in order for the opti-mization to be carried out, the variances σ2 and σ2

h andthe correlation coefficient ρ must be known. In practice,these may be estimated at the receiver and fed back tothe transmitter. Another approach is to treat them as de-sign parameters chosen such that they roughly match the

conditions the system is operating in. Nevertheless, in thesimulations to follow, we assume these parameters are per-fectly known at the transmitter.

VI. Simulation Results

In order to examine the performance of the proposedtransmission scheme, and to investigate how it compareswith conventional methods, simulations were conducted forseveral different cases. The performance was comparedwith three other methods - conventional OSTBC, conven-tional beamforming and, what is here referred to as, idealbeamforming. Ideal beamforming is similar to conventionalbeamforming except that the beamformer is based on per-fect channel knowledge.

For all examined cases, the simplified scenario with per-fect knowledge of σ2, σ2

h and ρ was assumed. The varianceof the channel coefficients was arbitrarily set at σ2

h = 1.The channel was constant during the transmission of acodeword and independently fading from one codewordto another. Furthermore, the predetermined orthogonalspace-time block codes were taken from the rate one codesfound in [4]. The particular code used in each case is there-fore directly determined by the number of transmit anten-nas. All elements of the codewords were taken from a bi-nary phase shift keying (BPSK) constellation. The inputto the space-time encoder was assumed to form an i.i.d. se-quence of equally probable symbol alternatives. Through-out the simulations, the bit error rate (BER) was used asthe performance measure. Finally, the SNR was defined as

SNR =E[‖H∗C‖2

F]

LMNσ2.

For a conventional OSTBC system, the expression for theSNR is equal to the total received average signal power,divided by the total noise power. Since the codes underconsideration span as many time instants as the number oftransmit antennas, L is here equal to M .

A. Varying the SNR

In the first case, a system with two transmit antennasand one receive antenna was considered. The channel qual-ity was set to ρ = 0.9. The BER as a function of theSNR for the various transmission methods is depicted inFigure 2. As seen, the performance of the proposed trans-mission scheme is for all SNR values better than conven-tional OSTBC but, as expected, worse than ideal beam-forming. As the SNR decreases, the curve for the proposedscheme approaches the one for ideal beamforming whereasfor increasing SNR it approaches the performance of con-ventional OSTBC. Thus, the proposed scheme combinesthe advantages of both beamforming and OSTBC. This isalso in good agreement with both the asymptotic resultsof Section IV-A as well as the observations in Section V-Aregarding the allocation of power among the eigenmodes.Note that the two curves for conventional OSTBC andideal beamforming also show the performance of our trans-mission scheme in the case of ρ → 0 and ρ → 1, respec-tively. Conventional beamforming is seen to give good per-

11

formance at low SNR values, but as the SNR increases,the lack of correct channel knowledge leads to a seriousperformance degradation.

In the second case the number of transmit antennas wasincreased to eight. This was done in order to illustratehow the number of transmit antennas influences the per-formance. The channel quality was now set to ρ = 0.7. TheBER versus the SNR for the four methods are presented inFigure 3. As seen, the potential gains due to channel knowl-edge are now considerably higher. These gains remain toa large extent even when the number of receive antennasis increased, as illustrated in a comparison between theproposed method and conventional OSTBC in Figure 4.Although not presented, simulation results demonstratingsignificant gains were also obtained for scenarios with fewertransmit antennas.

−10 −5 0 5 10 1510

−4

10−3

10−2

10−1

100

Conventional OSTBC / Proposed scheme: ρ→ 0Proposed scheme: ρ=0.9Conventional beamforming: ρ=0.9Ideal beamforming / Proposed scheme: ρ→ 1

PSfrag replacements

BE

R

SNR [dB]

Fig. 2. Two transmit antennas, one receive antenna and BPSKmodulation.

B. Varying the Channel Quality

The third and last case concerns how the channel qualityaffects the performance. Again, a system with two transmitantennas and one receive antenna is considered. The SNRwas set at 10 dB and the BER versus the channel qualitywas plotted. The result is shown in Figure 5, which thusprovides an illustration of how the proposed scheme adaptsto the variations in the channel quality. Hence, when thechannel quality is low it is similar to conventional OSTBCand when it is high it is essentially the same as ideal beam-forming.

VII. Conclusions

In this work, side information was utilized for improv-ing a predetermined orthogonal space-time block code bymeans of a linear transformation. A transmission schemethat effectively combines conventional transmit beamform-ing with orthogonal space-time block coding was proposed.The resulting optimization problem was shown to be con-

−10 −8 −6 −4 −2 0 2 4 6 8 1010

−6

10−5

10−4

10−3

10−2

10−1

100

Conventional OSTBC / Proposed scheme: ρ→ 0Proposed scheme: ρ=0.7Conventional beamforming: ρ=0.7Ideal beamforming / Proposed scheme: ρ→ 1

PSfrag replacements

BE

R

SNR [dB]

Fig. 3. Eight transmit antennas, one receive antenna and BPSKmodulation.

−10 −8 −6 −4 −2 0 2 4 6 8 1010

−6

10−5

10−4

10−3

10−2

10−1

100

Conventional OSTBC: N=1Proposed scheme: N=1, ρ=0.7Conventional OSTBC: N=2Proposed scheme: N=2, ρ=0.7Conventional OSTBC: N=4Proposed scheme: N=4, ρ=0.7Conventional OSTBC: N=8Proposed scheme: N=8, ρ=0.7

PSfrag replacements

BE

R

SNR [dB]

Fig. 4. Eight transmit antennas and BPSK modulation.

vex and could therefore be solved efficiently. Closed-formsolutions were derived under certain asymptotic assump-tions. Furthermore, the assumption of a simplified fadingscenario resulted in a particularly efficient optimization al-gorithm. Numerical results demonstrated significant gainsover both an open-loop system and a system using conven-tional beamforming.

Appendix

I. Asymptotic Results

The strategy for deriving the asymptotic results pre-sented in Section IV-A is to make use of the fact that,under certain conditions, it is possible to interchange the

12

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110

−3

10−2

10−1

Conventional OSTBC / Proposed scheme: ρ→ 0Proposed schemeConventional beamformingIdeal beamforming / Proposed scheme: ρ → 1

PSfrag replacements

BE

R

ρ

Fig. 5. Two transmit antennas, one receive antenna, SNR=10 dBand BPSK modulation.

order of the limit and minimization operator, i.e.

xas = limρ→a

arg minx∈X

Vρ(x)

= arg minx∈X

limρ→a

Vρ(x)

= arg minx∈X

V (x) ,

where X denotes the feasibility set and V (x) ,

limρ→a Vρ(x). From [34, p. 221] it follows that this holds ifVρ(x) converges uniformly in x over X to the limit functionV (x), X is a compact set (i.e. closed and bounded) andV (x) is continuous and has a unique global minimum.

To apply this theorem for the problem at hand, intro-duce a criterion function `′(Z) that is equal to the originalcriterion function `(Z), except for parameter independentterms and factors. Let ¯

i(Z) = lim `(Z), where the limitis taken as either ‖R

hh|h‖ → 0, η → ∞, ‖R−1

hh|h‖ → 0 or

η → 0, depending on the asymptotic case under consid-eration. Moreover, define the set of allowable parametersas

Z(ε) = Z|Z = Z∗ εIM , tr(Z) = 1 . (29)

Clearly, the requirement that Z(ε) is compact is satisfied.Normally, ε is taken to be zero. The set in (29) then cor-responds to the feasibility set of the original optimizationproblem, as described in (14). However, in order to satisfythe requirement of a continuous limit function ¯

i(Z), wewill for some of the cases first restrict Z(ε) by assumingthat ε is small and positive and then argue why we canlet ε = 0 without affecting the result. Proving that `(Z)converges uniformly to ¯

i(Z) over Z(ε) amounts to showingthat

lim supZ∈Z(ε)

|`′(Z) − ¯i(Z)| = 0 .

In order to simplify the notation, the criterion function

is for the remaining part of this section written as

`(Z) = m∗R−1(

(IN ⊗ Zη) + R−1)−1

R−1m

− log det(

(IN ⊗ Zη) + R−1)

,

where we note that η = µmin/4σ2 is a quantity proportionalto the SNR. Furthermore, recall that ‖X‖ = σmax, whereσmax is the maximum singular value of X. Keep also inmind that if the argument is a vector, the result is theusual vector norm.

A. Case 1: No Channel Knowledge

The first case that is considered is no channel knowl-edge, i.e. ‖R−1‖ → 0. To remove parameter independentterms in the limit function, the equivalent criterion func-tion `′(Z) = `(Z) + MN log(η) is considered. For now,assume that ε > 0. Since Z is then non-singular, `′(Z) canbe written as

`′(Z) = m∗R−1(

(IN ⊗ Zη) + R−1)−1

R−1m

− log det(

IMN + (IN ⊗ Zη)−1/2R−1(IN ⊗ Zη)−1/2)

− log det (IN ⊗ Zη) + MN log(η) , (30)

where (·)1/2 is now a matrix square root with Hermitiansymmetry. As shown below, this function converges uni-formly in Z to the limit function

¯1(Z) = lim

‖R−1‖→0`′(Z) = − log det (IN ⊗ Zη) + MN log η

= −N log det(Z) . (31)

The limit function is obviously continuous. By utilizingLagrangian multipliers and an EVD of Z, it is straightfor-ward to show that ¯

1(Z), subject to Z ∈ Z(ε), has a uniqueglobal minimum Zas = IM/M . This fact, and the uniformconvergence in Z over Z(ε), implies that, for a fixed posi-tive ε ≤ 1/M ,

Zas = lim‖R−1‖→0

arg minZ∈Z(ε)

`′(Z) = IM/M . (32)

However, the solution is valid even if ε = 0. The reasonwhy is that `′(Z) is a convex function and the solutionto the above problem does not render the first constrainttight. Hence, relaxing the first constraint to ε = 0 does notchange the optimum. Since WW∗ = Z and Zas = IM/M ,the optimum linear transformation in the case of no channelknowledge may therefore be chosen as Was = IM/

√M .

To see the uniform convergence in Z over Z(ε) as‖R−1‖ → 0, consider the difference

`′(Z) − ¯1(Z) = m∗R−1

(

(IN ⊗ Zη) + R−1)−1

R−1m

− log det(

IMN + (IN ⊗ Zη)−1/2R−1(IN ⊗ Zη)−1/2)

.(33)

From [23, p. 471] it readily follows that

(A + B)−1 B−1, A 0, B 0 , (34)

13

and the first term in (33) is therefore upperbounded as

|m∗R−1(

(IN ⊗ Zη) + R−1)−1

R−1m| ≤ ‖m‖2‖R−1‖ .

Since the determinant equals the product of the eigenval-ues, the second term can be written as

MN∑

k=1

log(1 + λk) ,

where λk is the kth eigenvalue of

(IN ⊗ Zη)−1/2R−1(IN ⊗ Zη)−1/2 .

This matrix is Hermitian and positive definite which meansthat its eigenvalues and its singular values are the same.Thus, the eigenvalues can be upperbounded as

λk = σk ≤ σmax = ‖(IN ⊗ Zη)−1/2R−1(IN ⊗ Zη)−1/2‖

≤ ‖(IN ⊗ Z)−1/2‖2‖R−1‖η

≤ ‖R−1‖εη

,

where σk represents the kth singular value and σmax denotesthe largest singular value. The second equality is due tothe fact that the spectral norm is equal to the maximumsingular value of its argument. An upper bound to thesecond term in (33) may be formed as

MN∑

k=1

log(1 + λk) ≤ MN log

(

1 +‖R−1‖

εη

)

. (35)

By utilizing the triangle inequality it is now clear that

supZ∈Z(ε)

|`′(Z) − ¯1(Z)| ≤ ‖m‖2‖R−1‖

+ MN log

(

1 +‖R−1‖

εη

)

.

Since this expression, for a constant ε > 0, clearly tends tozero as ‖R−1‖ → 0, we have shown that `′(Z) convergesuniformly to ¯

1(Z) within the parameter set defined byZ(ε). This completes the derivation for the no channelknowledge case.

B. Case 2: Infinite SNR

In the second case it is assumed that the SNR tends toinfinity, i.e. η → ∞. Again, we start by assuming ε >0. Similarly to the previous case, an equivalent criterionfunction can be written as in (30), which also in this caseconverges uniformly in Z to

¯2(Z) = lim

η→∞`′(Z) = −N log det(Z) = ¯

1(Z) .

To see that the convergence is uniform, consider the twoterms in (33). Utilizing (34), the first term is now upper-bounded by

‖m‖2‖R−1‖2‖(IN ⊗ Zη)−1‖ ≤ ‖m‖2‖R−1‖2

εη,

whereas the upper bound of the second term is again givenby (35). Hence, we have

supZ∈Z(ε)

|`′(Z) − ¯2(Z)| ≤ ‖m‖2‖R−1‖2

εη

+ MN log

(

1 +‖R−1‖

εη

)

,

which obviously tends to zero as the SNR tends to in-finity. The convergence is therefore uniform. The argu-ments following (32) then show that the asymptoticallyoptimal linear transformation may once more be chosen asWas = IM/

√M . This completes the derivation for the

case of infinite SNR.

C. Case 3: Perfect Channel Knowledge

The third case concerns perfect channel knowledge. Forthe present and the next case, we can let ε = 0. The origi-nal constraints are therefore assumed. Parameter indepen-dent terms and factors in the limit function are removedby considering the equivalent criterion function

`′(Z) =(

`(Z) − log det(R) −m∗R−1m)

= m∗(

IMN + (IN ⊗ Zη)R)−1

R−1m/η

− log det(

IMN + R1/2(IN ⊗ Zη)R1/2)

−m∗R−1m/η . (36)

We start by showing that this function converges uniformlyin Z to the obviously continuous limit function

¯3(Z) = lim

‖R‖→0`′(Z) = −m∗(IN ⊗ Z)m . (37)

The Taylor series [23, p. 301],

(I −X)−1 =

∞∑

k=0

Xk,

valid if ‖X‖ < 1, is used for writing the first term in (36)as

m∗(

R−1 − (IN ⊗ Zη) +

∞∑

k=2

(

− (IN ⊗ Zη)R)k

R−1)

m/η .

By exploiting the triangle inequality and the formula fora geometric series an upper bound of the infinite sum, forsufficiently small η‖R‖, is obtained as

‖∞∑

k=2

(−(IN ⊗ Zη)R)kR−1‖ ≤∞∑

k=2

‖IN ⊗ Zη‖k‖R‖k−1

=‖IN ⊗ Zη)‖2‖R‖

1 − ‖IN ⊗ Zη‖‖R‖

=η2‖Z‖2‖R‖

1 − η‖Z‖‖R‖

≤ η2‖R‖1 − η‖R‖ .

14

For the last inequality, we used the fact that ‖Z‖ ≤ 1,which is due to the trace constraint on Z. Now, let λk

represent the kth eigenvalue of

R1/2(IN ⊗ Zη)R1/2 .

Since it then holds that

λk ≤ ‖R1/2‖2‖IN ⊗ Zη‖ = η‖R‖‖Z‖ ≤ η‖R‖ ,

the second term in (36) is upperbounded by

MN log(1 + η‖R‖)/η .

Finally, collecting the results for the first two terms yields

supZ∈Z(0)

|`′(Z) − ¯3(Z)| ≤ η‖R‖‖m‖2

1 − η‖R‖+ MN log(1 + η‖R‖)/η .

The right hand side clearly tends to zero as ‖R‖ → 0.Hence, the convergence is uniform.

Changing the sign of the limit function in (37) and re-parameterizing using Z = WW∗ shows that the optimumof the limit function ¯

3(Z) is given by

Was = arg maxW

‖W‖2

F=1

m∗(IN ⊗WW∗)m . (38)

To solve this, let Ω = mm∗ and define

Θ =N∑

k=1

Ωk , (39)

where Ωk denotes the kth block of size M × M on thediagonal of Ω. The cost function in the above optimizationproblem can then be written as

m∗(IN ⊗WW∗)m = tr(

(IN ⊗WW∗)Ω)

(40)

= tr(W∗ΘW)

=(

vec(Θ∗W))∗

vec(W)

=(

vec(W))∗

(IM ⊗Θ) vec(W) ,

where the two last equalities are due to (4) and (5), re-spectively. The power constraint is written on the form‖ vec(W)‖ = 1. Such an optimization problem is readilysolved utilizing the EVD of IM ⊗Θ. For this purpose, letλM denote the largest eigenvalue of Θ and recall the as-sumption that it is strictly larger than all the other eigen-values, i.e. λM is unique. It can easily be verified that theeigenvalues of IM ⊗Θ are obtained by repeating the eigen-values of Θ M times. Hence, λM is also the largest eigen-value of IM ⊗Θ, with multiplicity M . The set of optimumsolutions of (38) is therefore given by the eigenspace asso-ciated with λM . Introducing the complex valued scalarsµkM

k=1, the solution can be written on the form

vec(Was) = µ1u1 + µ2u2 + · · · + uM (41)

where

u1 =

vM

0

0...0

,u2 =

0

vM

0...0

, · · · ,uM =

0...0

0

vM

and vM are the eigenvectors of IM ⊗Θ and Θ, respectively,corresponding to λM . Here, 0 is an M × 1 vector with allelements equal to zero. Using (41) all the solutions mayalso be expressed as

Was =[

µ1vM µ2vM · · · µMvM

]

,

implying that Zas = WasW∗as = vMv∗

M

∑Mk=1 |µk|2. Com-

bining this with the power constraint ‖W‖2F = 1 means

that∑M

k=1 |µk|2 = 1, and hence Zas = vMv∗M . Conse-

quently, regardless of the unit norm vector vec(Was) chosenfrom the aforementioned eigenspace, it holds that Zas =vMv∗

M , which is thus a unique minimum point of ¯3(Z).

Accordingly, the use of ¯3(Z) in the asymptotical analysis is

justified. Letting, for example, µ1 = 1, µ2 = · · · = µM = 0and utilizing Zas = WasW

∗as, an asymptotically optimum

solution is given by

Was =[

vM 0 · · · 0]

.

As previously indicated, the solution is not unique. Forexample, permuting the columns gives the same value ofthe cost function.

D. Case 4: Zero SNR

In the fourth case, the SNR is assumed to tend to zero,i.e. η → 0. The derivation is to a large extent similar tothe previous case. The Taylor expansion,

log det(I + X) = tr(X) + O(‖X‖2) ,

where O(·) is the big ordo operator, is used to write thesecond term of (36) as

tr(

R1/2(IN ⊗ Z)R1/2)

+ O(η) .

Combining this with (37) results in the limit function

¯4(Z) = lim

η→0`′(Z)

= −m∗(IN ⊗ Z)m − tr(

R1/2(IN ⊗ Z)R1/2)

(42)

It is now evident from

supZ∈Z(0)

|`′(Z) − ¯4(Z)| ≤ η‖R‖‖m‖2

1 − η‖R‖ + |O(η)|

that the convergence is uniform. Thus, after changing thesign of ¯

4(Z) and parameterizing in terms of W, the costfunction can be taken as

m∗(IN ⊗WW∗)m + tr(

R1/2(IN ⊗WW∗)R1/2)

.

15

Using the relation tr(AB) = tr(BA), this expression canbe rewritten as

tr(

(IN⊗WW∗)(mm∗+R))

= tr(

(IN⊗WW∗)Ω)

, (43)

where now Ω = mm∗ + R. Finally, due to the similaritybetween (43) and (40), the development from the previouscase shows that an asymptotically optimum linear trans-formation is given by

Was =[

vM 0 · · · 0]

,

where vM is the eigenvector corresponding to the largesteigenvalue of Θ. Here, Θ is again defined as in (39).

II. An Algorithm for a Simplified Scenario

In this appendix, the solution of the optimization prob-lem defined by (20)-(23) is derived. It is easily seen thatboth the cost function and the feasibility set are convex.Thus, the solution is given by the KKT conditions. In or-der to simplify the development, the optimization problemis temporarily relaxed by omitting (23). For the remainingproblem, the optimum is given by any λk that satisfy theKKT conditions

M∑

k=1

λk = 1 (44)

λi ≥ 0 (45)

− αηλi

(1 + αηλi)2− αηN

1 + αηλi+ µ − νi = 0 (46)

νi ≥ 0 (47)

νiλi = 0 , (48)

where i = 1, · · · , M and where µ and νi are Lagrange mul-tipliers for the power constraint and the inequality con-straints, respectively. We start by solving for νi in (46)and substituting into (47) and (48). Thus, the last threeconditions reduce to

αηλi

(1 + αηλi)2+

αηN

1 + αηλi≤ µ (49)

λi

(

µ − αηλi

(1 + αηλi)2− αηN

1 + αηλi

)

= 0 . (50)

First, assume that λi > 0. It follows that the second factorin (50) must be zero. Rewriting this condition as

µ(1 + αηλi)2 − αηN(1 + αηλi) − αηλi = 0

and solving for λi gives

λi =αηN +

α2η2N2 + 4αηλiµ

2αηµ− 1

αη, (51)

where the positive root was picked due to (45). Note that(49) is now satisfied by equality. Hence, we have a validsolution as long as (51) gives a positive result and (44) is

satisfied. On the other hand, for the case of a non-positiveresult, we let λi = 0. That this indeed satisfies the KKTconditions is seen by verifying that (49) is true. Since

αηN +

α2η2N2 + 4αηλiµ

2αηµ− 1

αη≤ 0

and λi = 0 implies that

µ ≥ αηλi + αηN , (52)

it is obvious that all the KKT conditions, with the possibleexception of (44), are satisfied. Finally, also (44) can behandled by writing the optimum eigenvalues as

λi = max

0,αηN +

α2η2N2 + 4αηλiµ

2αηµ− 1

αη

(53)

and then solving for µ in (44). It is appearant that (53)gives eigenvalues that are sorted in ascending order. There-fore, the constraint that was initially omitted, i.e. (23),is automatically satisfied. Thus, (53) gives the optimumeigenvalues also for the original problem.

References

[1] J. H. Winters, “The diversity gain of transmit diversity in wire-less systems with rayleigh fading,” IEEE Transactions on Ve-hicular Technology, vol. 47, no. 1, pp. 119–123, Feb. 1998.

[2] Jiann-Ching Guey, Michael P. Fitz, Mark R. Bell, and Wen-YiKuo, “Signal design for transmitter diversity wireless commu-nication systems over rayleigh fading channels,” IEEE Trans-actions on Communications, vol. 47, no. 4, pp. 527–537, Apr.1999.

[3] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-timecodes for high data rate wireless communication: Performancecriterion and code construction,” IEEE Transactions on Infor-mation Theory, vol. 44, no. 2, pp. 744–765, Mar. 1998.

[4] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-timeblock codes from orthogonal designs,” IEEE Transactions onInformation Theory, vol. 45, no. 5, pp. 1456–1467, July 1999.

[5] Emre Telatar, “Capacity of multi-antenna gaussian channels,”AT&T Bell Laboratories Tech. Memo., Mar. 1995.

[6] G. J. Foschini and M. J. Gans, “On limits of wireless communi-cations in a fading environment when using multiple antennas,”Wireless Personal Communications, vol. 6, no. 3, pp. 311–335,Mar. 1998.

[7] Takeshi Hattori and Kenkichi Hirade, “Multitransmitter digitalsignal transmission by using offset frequency strategy in land-mobile radio telephone system,” IEEE Transactions on Vehic-ular Technology, vol. 27, no. 4, pp. 231–238, Nov. 1978.

[8] A. Hiroike, F. Adachi, and N. Nakajima, “Combined effects ofphase sweeping transmitter diversity and channel coding,” inProc. IEEE Vehicular Technology Conference, May 1992.

[9] V. Weerackody, “Characteristics of simulated fast fading indoorradio channel,” in Proc. IEEE Vehicular Technology Conference,1993, pp. 231–235.

[10] A. Wittneben, “A new bandwidth efficient transmit antennamodulation diversity scheme for linear digital modulation,” inProc. IEEE International Conference on Communications, May1993, pp. 1630–1634.

[11] Gerard J. Foschini, “Layered space-time architecture for wire-less communication in a fading environment when using multi-element antennas,” Bell Labs Technical Journal, pp. 41–59,1996.

[12] Jiann-Ching Guey, Michael P. Fitz, and Wen-Yi Kuo, “Signaldesign for transmitter diversity wireless communication systemsover rayleigh fading channels,” in Proc. IEEE Vehicular Tech-nology Conference, 1996, pp. 136–140.

16

[13] S. M. Alamouti, “A simple transmit diversity technique for wire-less communications,” IEEE Journal on Selected Areas in Com-munications, vol. 16, no. 8, pp. 1451–1458, Oct. 1998.

[14] Gregory G. Raleigh and John M. Cioffi, “Spatio-temporal codingfor wireless communication,” IEEE Transactions on Communi-cations, vol. 46, no. 3, pp. 357–366, Mar. 1998.

[15] G. Jongren and B. Ottersten, “Combining transmit antennaweights and orthogonal space-time block codes by utilizing sideinformation,” in Proc. 33th Asilomar Conference on Signals,Systems and Computers, Oct. 1999.

[16] George Jongren, Mikael Skoglund, and Bjorn Ottersten, “Com-bining transmit beamforming and orthogonal space-time blockcodes by utilizing side information,” in Proc. First IEEE Sen-sor Array and Multichannel Signal Processing Workshop, Mar.2000.

[17] A. Narula, M. J. Lopez, M. D. Trott, and G. W. Wornell, “Effi-cient use of side information in multiple-antenna data transmis-sion over fading channels,” IEEE Journal on Selected Areas inCommunications, vol. 16, no. 8, pp. 1423–1436, Oct. 1998.

[18] A. Wittneben, “Optimal predictive TX combining diversity incorrelated fading for microcellular mobile radio applications,” inProc. IEEE Global Telecommunications Conference, 1995, pp.48–54.

[19] R. Negi, A. M. Tehrani, and J. Cioffi, “Adaptive antennas forspace-time coding over block-time invariant multi-path fadingchannels,” in Proc. IEEE Vehicular Technology Conference,May 1999, pp. 70–74.

[20] 3rd Generation Partnership Project (3GPP), “Physical Chan-nels and Mapping of Transport Channels onto Physical Channels(TS25.211 V2.4.0),” 1999.

[21] 3rd Generation Partnership Project (3GPP), “Physical LayerProcedures (3GPP RAN 25.214 V1.3.0),” 1999.

[22] W. C. Jakes, Microwave Mobile Communications, IEEE press,1994.

[23] R. A. Horn and C. R. Johnson, Matrix Analysis, CambridgeUniversity Press, 1996.

[24] Alexander Graham, Kronecker Products and Matrix Calculuswith Applications, Ellis Horwood Ltd, 1981.

[25] Stephan Baro, Gerhard Bauch, and Axel Hansmann, “Improvedcodes for space-time trellis-coded modulation,” IEEE Commu-nications Letters, vol. 4, no. 1, pp. 20–22, Jan. 2000.

[26] George Jongren and Mikael Skoglund, “Design of channel esti-mate dependent space-time codes based on quantized side infor-mation,” submitted to IEEE Transactions on Communications.

[27] S. Boyd and L. Vandenberghe, “Introduction to convexoptimization with engineering applications,” Course Notes,http://www.stanford.edu/class/ee364, 1999.

[28] Lieven Vandenberghe and Stephen Boyd, “Semidefinite pro-gramming,” SIAM Review, vol. 38, no. 1, pp. 49–95, Mar. 1996.

[29] Mokhtar S. Bazaraa and Hanif D. Sherali, Nonlinear Program-ming: Theory and Algorithms, John Wiley & Sons, 2nd edition,1993.

[30] Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Al-gorithms in Convex Programming, SIAM Publications, 1994.

[31] T. W. Anderson, “Asymptotic theory for principal componentanalysis,” Annals of Mathematical Statistics, vol. 34, pp. 122–148, Feb. 1963.

[32] Thomas M. Cover and Joy A. Thomas, Elements of InformationTheory, John Wiley & Sons, 1991.

[33] Eugene Visotsky and Upamanyu Madhow, “Space-time precod-ing with imperfect feedback,” in Proc. IEEE International Sym-posium on Information Theory, June 2000, p. 312.

[34] Torsten Soderstrom and Petre Stoica, System Identification,Prentice Hall, 1989.