Design and implementation of a downlink MC-CDMA receiver

MINH QUANG NGUYEN

DESIGN AND IMPLEMENTATION OF A DOWNLINK MC-CDMA RECEIVER

Thèse présentée à la Faculté des études supérieures de l'Université Laval

dans le cadre du programme de doctorat en génie électrique pour l'obtention du grade de Philosophias Doctor (Ph. D.)

FACULTE DES SCIENCES ET DE GENIE UNIVERSITÉ LAVAL

QUÉBEC

2011

©Minh Quang Nguyen, 2011

This thesis is dedicated to my beloved family members who have supported me all the way since the beginning

of my studies.

I l l

Acknowledgment

I would like to express my gratitude to my supervisor, Dr. Paul Fortier, and co-supervisor, Dr. Sébastien Roy, whose expertise, understanding, providing scholarship and patience, added considerably to my graduate experience. I am grateful to professors Dr. Jean-Yves Chouinard and Dr. Sébastien Roy for teaching of the graduate levels courses.

I also acknowledge Jean-François Beaumont of Defence Research and Development Canada - Ottawa (DRDC-O) for his suggestions, and provision of the materials in this research. I would also like to thank the staff members, technicians, and my colleagues of the Laboratoire de Radiocommunications et de Traitement du Signal (LRTS), particularly Louis Dupont, Viet-Ha Pham and Isabelle LaRoche, for our debates, discussions, exchanges of knowledge, and skills which helped enrich my experience.

I recognize that this research would not have been possible without the assistance of the Canadian Foundation for Innovation (CFI), CMC Microsystems, and the Microsystems Strategic Alliance of Québec (ReSMiQ) for helping to fund the laboratory infrastructure (the "Complex Signal Processing and Virtual Component Laboratory" ) within which this research took place.

Finally, I would also like to thank my family for the support they provided me through my entire life and in particular, I must acknowledge my wife and friends, without whose love and encouragement, I would not have finished this thesis.

Résumé

Cette thèse présente une étude d'un système complet de transmission en liaison descendante utilisant la technologie multi-porteuse avec l'accès multiple par division de code (Multi-Carrier Code Division Multiple Access, MC-CDMA). L'étude inclut la synchronisation et l'estimation du canal pour un système MC-CDMA en liaison descendante ainsi que l'implémentation sur puce FPGA d'un récepteur MC-CDMA en liaison descendante en bande de base. Le MC-CDMA est une combinaison de la technique de multiplexage par fréquence orthogonale (Orthogonal Frequency Division Multiplexing, OFDM) et de l'accès multiple par répartition de code (CDMA), et ce dans le but d'intégrer les deux technologies. Le système MC-CDMA est conçu pour fonctionner à l'intérieur de la contrainte d'une bande de fréquence de 5 MHz pour les modèles de canaux intérieur/extérieur pédestre et véhiculaire tel que décrit par le "Third Genaration Partnership Project" (3GPP). La composante OFDM du système MC-CDMA a été simulée en utilisant le logiciel MATLAB dans le but d'obtenir des paramètres de base. Des codes orthogonaux à facteur d'étalement variable (OVSF) de longueur 8 ont été choisis comme codes d'étalement pour notre système MC-CDMA. Ceci permet de supporter des taux de transmission maximum jusquà 20.6 Mbps et 22.875 Mbps (données non codées, pleine charge de 8 utilisateurs) pour les canaux intérieur/extérieur pédestre et véhiculaire, respectivement. Une étude analytique des expressions de taux d'erreur binaire pour le MC-CDMA dans un canal multivoies de Rayleigh a été réalisée dans le but d'évaluer rapidement et de façon précise les performances. Des techniques d'estimation de canal basées sur les décisions antérieures ont été étudiées afin d'améliorer encore plus les performances de taux d'erreur binaire du système MC-CDMA en liaison descendante. L'estimateur de canal basé sur les décisions antérieures et utilisant le critère de l'erreur quadratique minimale linéaire avec une matrice' de corrélation du canal de taille 64 x 64 a été choisi comme étant un bon compromis entre la performance et la complexité pour une implementation sur puce FPGA. Une nouvelle séquence d'apprentissage a été conçue pour le récepteur dans la configuration intérieur/extérieur pédestre dans le but d'estimer de façon grossière le temps de synchronisation et le décalage fréquentiel fractionnaire de la porteuse dans le domaine du temps. Les estimations fines du temps de synchronisation et du décalage

Résumé v

fréquentiel de la porteuse ont été effectués dans le domaine des fréquences à l'aide de sous-porteuses pilotes. Un récepteur en liaison descendante MC-CDMA complet pour le canal intérieur /extérieur pédestre avec les synchronisations en temps et en fréquence en boucle fermée a été simulé avant de procéder à l'implémentation matérielle. Le récepteur en liaison descendante en bande de base pour le canal intérieur/extérieur pédestre a été implémenté sur un système de développement fabriqué par la compagnie Nallatech et utilisant le circuit XtremeDSP de Xilinx. Un transmetteur compatible avec le système de réception a également été réalisé. Des tests fonctionnels du récepteur ont été effectués dans un environnement sans fil statique de laboratoire. Un environnement de test plus dynamique, incluant la mobilité du transmetteur, du récepteur ou des éléments dispersifs, aurait été souhaitable, mais n'a pu être réalisé étant donné les difficultés logistiques inhérentes. Les taux d'erreur binaire mesurés avec différents nombres d'usagers actifs et différentes modulations sont proches des simulations sur ordinateurs pour un canal avec bruit blanc gaussien additif.

.

Abstract

This thesis presents a study of a complete downlink Multi-Carrier Code Division Multiple Access (MC-CDMA) system. The study includes the synchronization and channel estimation for a downlink MC-CDMA system and implementation of the baseband downlink MC-CDMA receiver in a Field Programmable Gate Array (FPGA) platform. MC-CDMA is a combination of Orthogonal Frequency Division Multiplexing (OFDM) and Code Division Multiple Access (CDMA) with the aim of integrating benefits from both technologies. The MC-CDMA system is designed to meet the 5 MHz channel bandwidth constraint for the Third Generation Partnership Project (3GPP) indoor-to-outdoor/pedestrian and vehicular channel models. The Orthogonal Frequency Division Multiplexing (OFDM) component of the MC-CDMA system was simulated using MATLAB software in order to obtain basic parameters for the MC-CDMA system. Orthogonal Variable Spreading Factor (OVSF) spreading codes with length 8 were chosen as spreading codes for our MC-CDMA system. This supports a maximum data rate of up to 20.6 Mbps and 22.875 Mbps (uncoded data, full load of 8 users) for the indoor-to-outdoor/pedestrian and the vehicular configuration, respectively. An analytical study of the BER performance expressions for an MC-CDMA in a Rayleigh multipath fading channel was studied in order to evaluate quickly and accurately the performance. Decision-directed channel estimation techniques were studied in order to further improve the Bit Error Rate (BER) performance of the downlink MC-CDMA system. The decision-directed overlap Linear Minimum Mean Square Error (LMMSE) channel estimator with a channel correlation sub-matrix of size 64 x 64 was chosen to be a good trade-off between performance and complexity for FPGA implementation. A new training sequence was designed for the receiver with the indoor-to-outdoor/pedestrian configuration in order to estimate coarse timing and fractional Carrier Frequency Offset (CFO) in the time domain. The remaining fine timing synchronization and integer CFO estimation were performed in the frequency domain with the aid of the pilot subcarriers. A complete MC-CDMA downlink receiver for the indoor-to-outdoor/pedestrian configuration has been simulated with closed-loop timing and frequency synchronizations prior to proceeding to hardware implementation. The baseband downlink receiver for the indoor-to-outdoor/pedestrian configuration was im-

Abstract vii

plemented in a Xilinx XtremeDSP development platform manufactured by Nallatech. A corresponding transmitter was also implemented. Functional testing of the receiver was carried out in a static wireless laboratory environment. A more dynamic test environment, providing motion of transmitter, receiver and/or scatterers, would have been of interest, but was not pursued given the inherent logistic difficulties. The measured BER results with different numbers of active users and modulation schemes were in close agreement with the computer simulations over an Additive White Gaussian Noise (AWGN) channel.

■

Acronyms

16QAM 16 level Quadrature Amplitude Modulation


3G Third Generation

3 G P P Third Generation Partnership Project

4G Fourth Generation




A C L R Adjacent Channel Leakage power Ratio

A D C Analog to Digital Converter

A G C Automatic Gain Control

A W G N Additive White Gaussian Noise

B E R Bit Error Rate

B P S K Binary Phase Shift Keying

B R A N Broadband Radio Access Networks

C D M A Code Division Multiple Access

C M Complex Multiplication

C M A C Complex Multiply-ACcumulate

C O R D I C Coordinate Rotation Digital Computer

Acronyms ix

CFO Carrier Frequency Offset

C F I Canadian Foundation for Innovation

CLB Configurable Logic Block

DAC Digital-to-Analog Converter

D C Direct Current

D C M Digital Clock Manager

D P L L Digital Phase-Locked-Loop

D S P Digital Signal Processing

D S - C D M A Direct-Sequence Code Division Multiple Access

D R D C - O Defence Research and Development Canada - Ottawa

D S - C D M A Direct Sequence Code Division Multiple Access

E G C Equal Gain Combining

F F T Fast Fourier Transform

F I R Finite Impulse Response

F I F O First In First Out

F P G A Field Programmable Gate Array

F S M Finite State Machine

HIL Hardware-in-the-Loop

I E E E Institute of Electrical and Electronics Engineers

I F F T Inverse Fast Fourier Transform

I I R Infinite Impulse Response

ISI Inter-Symbol Interference

ICI Inter-Carrier Interference

I P Internet Protocol

I T U International Telecommunication Union

Acronyms

L D P C Low Density Parity Check Code

LOS Line-of-Sight

LTE Long Term Evolution

LAN Local Area Network

LFSR Linear Feedback Shift Register

L M M S E Linear Minimum Mean Square Error

LRTS Laboratoire de Radiocommunications et de Traitement du Signal

LS Least Square

M A C Multiply-ACcumulate

M C - D S - C D M A Multi-Carrier Direct Sequence Code Division Multiple Access

M C - C D M A Multi-Carrier Code Division Multiple Access

M I M O Multiple Input Multiple Output

M T - C D M A Multi-Tone Code Division Multiple Access

MSE Mean-Square Error

M M S E Minimum Mean Square Error

M R C Maximum Ratio Combining

M U I Multiple User Interference

ML Maximum Likelihood

O F D M Orthogonal Frequency Division Multiplexing

O R C Orthogonal Restoring Combining

OVSF Orthogonal Variable Spreading Factor

P D F Probability Density Function

P N Pseudo Noise

P A P R Peak-to-Average Power Ratio

P I Proportional-Integral

Acronyms xi

P / S Parallel-to-Serial

P S A M Pilot-Symbol-Assisted Modulation

Q P S K Quadrature Phase Shift Keying

Q A M Quadrature Amplitude Modulation

R A M Random-Access Memory

ReSMiQ Microsystems Strategic Alliance of Québec

R F Radio Frequency

R O M Read Only Memory

R M S Root Mean Square

RTL Register Transfer Level

SER Symbol Error Rate

SF Spreading Factor

S / P Serial-to-Parallel

S N R Signal-to-Noise Ratio

SVD Singular-Value Decomposition

TS Technical Specification

U U T Unit Under Test

U A R F C N Universal terrestrial radio access Absolute Radio Frequency Channel Numr

ber

V H D L Very-high-speed integrated circuit Hardware Description Language

W C D M A Wideband Code Division Multiple Access

W L A N Wireless Local Area Network

W i M A X Worldwide Interoperability for Microwave Access

X S T Xilinx Synthesis Tool

ZF Zero Forcing

ZC Zadoff-Chu

List of Notations

()* Complex conjugate 10

()t Transpose conjugate 16

a DC notch filter coefficient 68

I Angle of a complex number 11

T 2 Second moment of the power delay profile 28

f First moment of the power delay profile 28

H Refined channel frequency response vector 52

X I/Q amplitude mismatch 68

A / Subcarrier spacing 5

Aricoarse True coarse timing offset 42

Anfine True fine timing offset 42

5 Pre-defined threshold 51

An Estimated timing offset 10

AnCOarse Estimated coarse timing offset 42

Anfine Estimated fine timing offset 45

A$(z) Steady-phase error 46

rj Damping factor 46

T Set of Gold codes 20

7(n) Correlation between the received signal and its delayed version 10

List of Notations xiii

7s Instantaneous SNR per symbol 101

$(n) Estimated phase 46

ê Estimated Carrier Frequency Offset (CFO) 10

êint Estimated integer CFO 46

êfrac Estimated fractional CFO 43

Hk,i Channel estimate at position (k, I) 15

A Constant depending modulation level 54

A(n, e) Log-likelihood function for the ML synchronization 10

Afc Euclidean distance 51

H Initial estimated channel frequency response 49

X Estimated transmit vector 49

X Respread data vector 52

Ck,i 2D channel estimator coefficients vector 15

Gs 8 x 8 orthogonal Gold code matrix 22

G m 64 x 64 filter matrix for LMMSE channel estimator 95

HLMMSE Estimated LMMSE channel frequency response 54

H2n Hadamard matrix 22

I Identity matrix 54

R H H Channel auto-correlation 16

R H Y Cross-correlation of the channel response with the received signal 54

R Y Y Auto-correlation of the received signal 54

X Transmit vector 49

Y Received vector 49

fi Set of first prime numbers 41

ujn Natural frequency 46

List of Notations xiv

0 Hadamard (elementwise) division 49

7J Average SNR per symbol 101

P~e Average BER 101

0 I/Q phase mismatch 68

$(n) Input phase 46

^(efrao n) Phase error in function of £frac and the sample index n 43

p Correction factor for the ML synchronization 10

oT Square root of the second central moment of the power delay profile 28

o \ Variance of Hk,i 16

Tfe Time delay in sample periods for the A;th path 8

rmax Maximum delay spread 16

H Decisiondirected channel estimate vector 51

HoTOrîipE Estimated overlapped LMMSE channel frequency response 54

Hfc,J Channel frequency responses at the nearest pilots close to position (k,l) 15

Mi (n) Average autocorrelation 12

y(n) Frequency offset compensated signal 13

£frac Fractional CFO 43

eint Integer CFO 43

<p(n) Energy compensation for correlated samples for the ML synchronization 10

t? Desired signal component 103

fy, (7s) PDF of the minimum order of N i.i.d. random variables 106

£ Noise component 103

A„, Scaling factor for CORDIC in vectoring mode 72

B Channel bandwidth 5

Be Coherence bandwidth 28

■

List of Notations xv

Be{t Bandwidth efficiency 32

Bocc Occupied bandwidth 32

c Velocity of the light 29

Cu Orthogonal spreading code for the u th user 24

D Low-pass FIR interpolation filter length 49

D u Decision variable for the u th user 101

E {•} Expected value 101

Eb Bit energy 105

Ec Chip energy 101

Ea Symbol energy 101

F s FFT sampling rate 32

f~ts (is) PDF of a random variable 106

/omax Maximum Doppler shift 16

G Frequency domain metric 45

g(t — iTa) Windowing function for the i th transmitted symbol 5

G(z) Equivalent closed-loop transfer function — 46

Gpp+(j Frequency domain equalization gain factor at the (Pp + l) t h subcarrier — 101

h(t, T) Channel impulse response 7

H(z) First-order loop filter transfer function 46

HQ(z2) Upper-arm filter transfer function 64

Hi (z2) Lower-arm filter transfer function 64

Hoc(z) DC notch filter transfer function 68

7/HB (Z) Half-band filter transfer function 64

-fimin Minimum value of the set of fading values 101

hk(t) Equivalent low-pass response of the A;th path 7

*

List of Notations xvi

Hi(n) Channel frequency response at the Zth subcarrier 8

Hpp+i,i Channel frequency response at the (Pp + l) t h subcarrier 101

I(t) In-phase signal 68

K Number of channel paths 7

K(z) Accumulator transfer function 46

kc Cut-off frequency 52

Ki Integral gain 46

Kp Proportional gain 46

L Spreading factor 18

M Modulation level 32

Mi(n) First timing metric 11

M2(n) Second timing metric 11

Mgeq Maximal possible length of a sequence 19

N FFT size 5

No AWGN power spectra density 101

•Ncp Cyclic prefix length 49

JVpf Number of filter taps in the frequency domain 16

./Vpt Number of filter taps in the time domain 16

A zc Length of Zadoff-Chu sequence 22

Né Total data and pilot subcarriers 41

Nd Total data subcarriers 32

Nf Pilot subcarrier spacing 32

Np Total pilot subcarriers 32

Ns Short training symbol length 11

A/grid Observation grid size 15

List of Notations xvii

Atap Number of coefficients in 2D channel estimator 15

P Parallel branches of Serial-to-Parallel (S/P) converter 24

•Piower Lower bound on BER 101

Ppit(x) Pilot tone generator polynomial 81

Pscr(x) Bit scrambler/descrambler polynomial 90

Pe Bit Error Rate (BER) 101

Q(-) Gaussian Q function 105

Q(t) Quadrature signal 68

R(u) Periodic normalized auto-correlation function of an m-sequences 20

R(z) Loop transfer function 46

Rb Actual bit rate 32

Rs Actual symbol rate 32

Rij(v) Periodic normalized cross-correlation function between two m-sequence 20

S Sub-vector length of the refined channel response vector 54

■Strain Short training sequence 41

T Symbol duration 5

Tb Bit duration 18

Tc Coherence time 29

Tc Chip duration 18

Tg Guard interval 5

Ts Effective symbol duration 5

v Velocity of the mobile 29

w(t) Additive White Gaussian Noise (AWGN) 7

uii(n) AWGN discrete-time representation on the i t h symbol 7

Wu AWGN on the Ith subcarrier for the i th symbol 8

List of Notations XVlll

x(t) Transmit signal 5

Xi(n) Discrete-time representation for the i th transmitted symbol 5

xu(n) The u th root Zadoff-Chu sequence 22

Xu Data at the Ith subcarrier of the i th symbol 5

y(t) Receive signal 7

yi(n) Discrete-time representation for the zth received symbol 7

Z Number of subbands of the refined channel response vector 54

Contents

Résumé iv

Abs t r ac t vi

Acronyms viii

List of Nota t ions xii

Con ten t s xix

List of Tables xxiii

List of Figures xxv

In t roduc t ion 1

1 M C - C D M A sys tem model 5 1.1 Overview of multi-carrier modulation 5 1.2 Channel estimation and synchronization in multi-carrier systems . . . . 10

1.2.1 Time and frequency synchronization 10 1.2.2 Channel estimation and equalization 13

1.3 Overview of CDMA 18 1.3.1 Spreading concept 18 1.3.2 Spreading codes 19

1.4 Fundamentals of MC-CDMA 23 1.4.1 MC-CDMA transmitter model 24 1.4.2 MC-CDMA receiver model 25

1.5 Channel models 26 1.5.1 Types of fading 26 1.5.2 3GPP WCDMA channel models 27

1.6 Parameters for downlink MC-CDMA systems 31 1.6.1 Indoor-to-outdoor/pedestrian channel 32 1.6.2 Vehicular channel 34

1.7 Conclusion 38

Contents xx

Synchronizat ion and channel es t imat ion in M C - C D M A system 39 2.1 Introduction 39 2.2 Synchronization issues in MC-CDMA systems 41

2.2.1 Preamble design for downlink MC-CDMA system 41 2.2.2 Coarse timing synchronization 42 2.2.3 Fractional carrier frequency offset estimation 43 2.2.4 Fine timing offset synchronization 45 2.2.5 Integer CFO synchronization 46 2.2.6 Proportional-integral loop filter 46

2.3 Decision-directed channel estimation for downlink MC-CDMA 49 2.3.1 System description 49 2.3.2 Decision-directed virtual pilot-based channel estimation 51 2.3.3 Decision-directed iterative transform domain channel estimation 52 2.3.4 Decision-directed overlap LMMSE channel estimator 54

2.4 Conclusion 57

M C - C D M A downlink receiver implementa t ion 58 3.1 Target FPGA platform and design partitioning 58 3.2 Proposed receiver architecture 62

3.2.1 Digital front-end 63 3.2.2 Digital AGC circuit implementation 71 3.2.3 Serial CORDIC processor 72 3.2.4 Pre-FFT timing and frequency synchronization 76 3.2.5 FFT processor unit 79 3.2.6 Reference pilot generator 81 3.2.7 Pilot tone extractor 82 3.2.8 Post-FFT timing and frequency synchronization 83 3.2.9 Digital proportional-integral loop filter 85 3.2.10 Channel estimator 86 3.2.11 Channel equalizer 87 3.2.12 Frequency domain despreader 88 3.2.13 Bit demapper 89 3.2.14 Bit descrambler 90 3.2.15 Host computer/debug interface 91 3.2.16 Implementation summary 92

3.3 Decision-directed overlap LMMSE channel estimator 95 3.3.1 Proposed LMMSE estimator implementation 95 3.3.2 Hardware-in-the-Loop verification 98

3.4 Conclusion 99

Lower bound for downlink B E R performance of M C - C D M A 101

Contents xxi

4.1 Introduction 101 4.2 Analytical lower bound for BER performance 101 4.3 Conclusion 109

5 Simulat ion resul ts and discussions 110 5.1 Introduction 110 5.2 OFDM system simulation results I l l

5.2.1 Results for the indoor-to-outdoor/pedestrian channel I l l 5.2.2 Results for the vehicular channel 114

5.3 MC-CDMA systems simulation results 116 5.3.1 Results for the indoor-to-outdoor/pedestrian channel 118 5.3.2 Results for the vehicular channel 121

5.4 Lower bound for downlink BER performance of the MC-CDMA system 122 5.5 Timing and frequency synchronization simulation results 123

5.5.1 Coarse timing simulation results 123 5.5.2 Fractional CFO estimation simulation results 125 5.5.3 Fine timing simulation results 126 5.5.4 Integer CFO estimation simulation results 128

5.6 Complete receiver simulation results 130 5.7 Decision-directed channel estimation simulation results 134 5.8 Conclusion 139

6 M C - C D M A downlink receiver tes t ing 141 6.1 Introduction 141 6.2 Functional testing 144 6.3 Receiver BER performance results 147

6.4 Conclusion 154

7 Conclusions and Fu tu re Work 156

A RTL simulat ion resul ts 161

B M C - C D M A t r a n s m i t t e r implementa t ion 172

C Channe l es t imator complexi ty 174

D Vir tex-4 SX35 overview 175 D.l Configurable logic block 175 D.2 Block RAM 176 D.3 First In First Out (FIFO) 177 D.4 DSP48 . 177

Contents xxn

E R F front-end overview E.l Description . : . . . E.2 Specifications . . . .

179 179 179

Bibl iography 181

List of Tables

1.1 Channel parameters 31 1.2 OFDM simulation parameters for the indoor-to-outdoor/pedestrian en

vironment 34 1.3 Bandwidth efficiency of the OFDM system for the indoor-to-outdoor/pedestrian

environment 34 1.4 MC-CDMA simulation parameters for the indoor-to-outdoor/pedestrian

environment 35 1.5 Bandwidth efficiency of the uncoded MC-CDMA system for the indoor-

to-outdoor/pedestrian environment. 35 1.6 OFDM simulation parameters for the vehicular environment 36 1.7 Bandwidth efficiency of the OFDM system for the vehicular environment. 36 1.8 MC-CDMA simulation parameters for the vehicular environment. . . . 37 1.9 Bandwidth efficiency of the MC-CDMA system for the vehicular envi

ronment 37

3.1 Half-band filter specifications 64 3.2 Polyphase decimation filter specifications 68 3.3 Device utilization summary for the digital front-end circuit 71 3.4 Device utilization summary for the digital AGC circuit 72 3.5 Device utilization summary for the serial CORDIC processor 75 3.6 Device utilization summary for the convolution circuit 77 3.7 Device utilization summary for the moving sum circuit 78 3.8 Device utilization summary for the fractional CFO estimator 79 3.9 Device utilization summary for the FFT processor 81 3.10 Device utilization summary for the pilot generator 82 3.11 Device utilization summary for the pilot tone extractor 83 3.12 Device utilization summary for the fine timing synchronization unit. . . 84 3.13 Device utilization summary for the integer CFO estimator 85 3.14 Device utilization summary for the loop filter unit 86 3.15 Device utilization summary for the channel estimator. 87 3.16 Device utilization summary for the channel equalizer unit 88 3.17 Device utilization summary for the despreader unit 88

List of Tables xxiv

3.18 Device utilization summary for the demapper 90 3.19 Device utilization summary for the descrambler 91 3.20 Device utilization summary for the host interface logic 92 3.21 Implementation results of crucial modules in the receiver 93 3.22 Device utilization summary for the overlap LMMSE estimator 98

6.1 BER performance of the receiver in a static wireless laboratory channel, 1 user 151

6.2 BER performance of the receiver in a static wireless laboratory channel, 4 users 152

6.3 BER performance of the receiver in a static wireless laboratory channel, 8 users 152

C.l Channel estimator complexity 174

List of Figures

1 Evolutionary path of cellular technology [1] 2

1.1 Basic blocks of an OFDM transmitter 6 1.2 Orthogonal overlapping spectra/for OFDM 6 1.3 Basic blocks of an OFDM receiver 7 1.4 Xilinx's pipelined streaming I/O architecture [2] 9 1.5 Xilinx's radix-4 burst I/O architecture [2] 9 1.6 IEEE 802.11a OFDM training structure [3] 11 1.7 Timing metric for double auto-correlation [4] 12 1.8 Block diagram of the timing and frequency estimator 13 1.9 Example of a pilot grid 14 1.10 A two-dimensional pilot grid 15 1.11 Example of a simple CDMA transmitter 18 1.12 Power spectrum of the spread signal versus the data signal 19 1.13 Example of LFSR with m = 5 [5] 20 1.14 Example of Gold codes generation [5] 21 1.15 MC-CDMA classification 24 1.16 MC-CDMA transmitter. 25 1.17 MC-CDMA receiver 26 1.18 Small-scale fading classification 27 1.19 Indoor-to-outdoor/pedestrian channel power delay profile(3 km/h). . . 28 1.20 Vehicular channel models power delay profile (120 km/h) 29 1.21 Data and pilot subcarriers allocation 33

2.1 Simple frame format 41 2.2 Low PAPR training symbol with Nc = 448, u = 197 and 7VZC = 443. . 42 2.3 Structure of first-order digital loop filter 47 2.4 Simplified closed-loop frequency offset correction diagram 48 2.5 Linearized closed-loop frequency offset correction diagram 48 2.6 Downlink MC-CDMA block diagram 50 2.7 Proposed decision-directed virtual pilot channel estimator 51 2.8 Proposed iterative transform domain channel estimator 52 2.9 Proposed receiver with decision-directed LMMSE channel estimator. . . 55

List of Figures xxvi

2.10 Decomposition of channel auto-correlation matrix R H H by the overlap technique 56

3.1 Block diagram of the Xtreme DSP development kit [6] 59 3.2 The partition of the design in the User FPGA 59 3.3 Clock and reset managers detail 60 3.4 Modified design flow 61 3.5 Implementation block diagram of the MC-CDMA receiver 62 3.6 Multistage decimation filter structure 64 3.7 Characteristics of the half-band filters 65 3.8 Polyphase partition for the half-band decimation filter 66 3.9 Polyphase partition for the half-band decimation filter with input down-

samplers 66 3.10 Polyphase partition for the half-band decimation filter with input com

mutator .' 67 3.11 Polyphase half-band decimation filter structure 67 3.12 Characteristics of the polyphase decimation filter 69 3.13 Implementation block diagram of the polyphase decimation filter. . . . 69 3.14 Structure of first-order digital DC notch filter 70 3.15 First-order digital DC notch filter characteristics with a = 0.95 70 3.16 I/Q mismatch corrector unit architecture 71 3.17 Digital AGC circuit architecture 72 3.18 Architecture for the CORDIC processing element 74 3.19 Architecture for the serial CORDIC 75 3.20 Architecture for the proposed convolution block 76 3.21 Direct implementation of the moving sum circuit 77 3.22 Architecture for the proposed moving sum circuit 77 3.23 State diagram for the peak detector 78 3.24 Architecture for the fractional CFO estimator 79 3.25 FFT processor architecture 80 3.26 State machine for the FFT processor 81 3.27 Pilot tone generator architecture 82 3.28 Simulation results for the pilot tone generator 82 3.29 Pilot tone extractor architecture 83 3.30 Fine timing synchronization unit architecture 84 3.31 Integer CFO estimator architecture 85 3.32 Channel estimator architecture 86 3.33 Channel equalizer architecture 87 3.34 Despreader unit architecture 88 3.35 Bit position in an M-QAM symbol. 89 3.36 M-QAM bit demapping 89

List of Figures xxvii

3.37 Demapper architecture 90 3.38 Data descrambler architecture 91 3.39 Host interface logic module 92 3.40 Debug interface architecture 92 3.41 Detailed VHDL implementation diagram 94 3.42 MC-CDMA symbol timing 95 3.43 Proposed matrix-vector multiplication architecture 96 3.44 Matrix-vector multiplication timing 97 3.45 Overlap LMMSE estimator timing 97 3.46 Proposed overlap LMMSE estimator architecture 98 3.47 Hardware-in-the-loop verification block diagram 99

5.1 Simulation block diagram for the OFDM system I l l 5.2 Performance of QPSK-OFDM over the indoor-to-outdoor/pedestrian chan

nel 112 5.3 Performance of 16QAM-OFDM over the indoor-to-outdoor/pedestrian

channel 112 5.4 Performance of 64QAM-OFDM over the indoor-to-outdoor/pedestrian

channel 113 5.5 Performance of QPSK-OFDM system over the vehicular channel. . . . 114 5.6 Performance of 16QAM-OFDM system over the vehicular channel. . . . 115 5.7 Performance of 64QAM-OFDM system over the vehicular channel. . . . 115 5.8 Simulation block diagram for the downlink MC-CDMA system 117 5.9 Impact of the number of active users on the performance of the QPSK-

MC-CDMA system 117 5.10 Performance of QPSK-MC-CDMA over the indoor-to-outdoor/pedestrian

channel 118 5.11 Performance of 16QAM-MC-CDMA over the indoor-to-outdoor/pedestrian

channel 119 5.12 Performance of 64QAM-MC-CDMA over the indoor-to-outdoor/pedestrian

channel : 119 5.13 Performance of QPSK-MC-CDMA over the vehicular channel 120 5.14 Performance of 16QAM-MC-CDMA over the vehicular channel 120 5.15 Performance of 64QAM-MC-CDMA over the vehicular channel 121 5.16 Lower bound on downlink BER performance 122 5.17 Simulation of coarse frame detection 123 5.18 Probability of correct frame boundary detection 124 5.19 RMS timing error of the coarse timing synchronizer 124 5.20 Proposed fractional CFO estimator performance 126 5.21 Probability of correct fine timing synchronization 127 5.22 RMS timing error of the fine timing synchronizer 127

'

List of Figures xxviii

5.23 Integer CFO correlator output at Eb/N0 = 20 dB 128 5.24 Probability of correct integer CFO synchronization 129 5.25 RMS error of the integer CFO synchronizer 129 5.26 BER performance of the complete MC-CDMA receiver 131 5.27 BER performance of the complete QPSK-MC-CDMA receiver with dif

ferent numbers of active users 131 5.28 BER performance of the complete 16QAM-MC-CDMA receiver with dif

ferent numbers of active users 132 5.29 BER performance of the complete 64QAM-MC-CDMA receiver with dif

ferent numbers of active users 132 5.30 BER performance of the system over an AWGN channel. 133 5.31 BER performance versus virtual pilot selection thresholds (Eb/N0 = 30

dB) 134 5.32 BER performance versus cut-off frequencies (Eb/N0 = 30 dB) 135 5.33 Decision-directed virtual pilot versus iterative transform domain method

over the indoor-to-outdoor/pedestrian channel 135 5.34 Decision-directed virtual pilot versus iterative transform domain method

over the vehicular channel 136 5.35 BER performance comparison of the overlap LMMSE estimator (linear

interp.) 137 5.36 BER performance comparison of the overlap LMMSE estimator (FIR

interp.) 137 5.37 Eb /N0 versus sub-matrix size (target BER = 10~3) 138 5.38 Complexity versus sub-matrix size (target BER = 10~3) 138

6.1 MC-CDMA system measurement setup 142 6.2 Photo of the receiver testbed 142 6.3 Transmitter control software 143 6.4 Receiver control software 143 6.5 Fixed indoor-to-outdoor office environment test scenario 144 6.6 Results at the output of the digital front-end unit (block 1 ) 145 6.7 Results at the output of the convolution unit (block 3) 146 6.8 Results at the output of the auto-correlator unit (block 4) 146 6.9 Results at the output of the peak detector unit (block 3) 147 6.10 Results at the output of the derotator unit (block 2) 147 6.11 Results at the output of the cyclic prefix removal unit (block 5) 148 6.12 Results at the output of the FFT processor unit (block 6) 148 6.13 Results at the output of the channel estimator unit (block 8) 149 6.14 Results at the output of the channel equalizer unit (block 9) 149 6.15 Results at the output of the despreader unit (block 11) 150 6.16 Result at the output of the demapper unit (block 12) 151

List of Figures xxix

6.17 Measured BER performance under different modulation schemes. . . . 153 6.18 QPSK-MC-CDMA performance under different numbers of active users. 153 6.19 16QAM-MC-CDMA performance under different numbers of active users. 154 6.20 64QAM-MC-CDMA performance under different numbers of active users. 154

A.l Impulse and step response simulation of the first half-band decimation filter 161

A.2 Random input data simulation of the first half-band decimation filter. . 162 A.3 Impulse and step response simulation of the second half-band decimation

filter 162 A.4 Random input data simulation of the second half-band decimation filter. 162 A.5 Impulse response simulation of the polyphase decimation filter 163 A.6 Step response simulation of the polyphase decimation filter 163 A.7 Random input data simulation of the polyphase decimation filter. . . . 163 A.8 Simulation results for the convolution block 164 A.9 Simulation results for the moving sum and the peak detector 164 A. 10 Simulation results for the auto-correlator 164 A.11 Vector mode simulation results for the serial CORDIC 165 A. 12 Rotation mode simulation results for the serial CORDIC 165 A. 13 Simulation results for the pilot tones generator 165 A.14 Simulation results for the 512-point Radix-2-Lite FFT core 166 A. 15 Simulation results for the complete FFT processor 167 A. 16 Simulation of the fine timing estimator unit with an input timing offset

of 4 samples 168 A. 17 Simulation of the integer CFO estimator with an input CFO of 3 sub-

carrier spacings 169 A. 18 Simulation of the channel estimator and ZF equalizer 170 A. 19 Simulation of the despreader 171

B.l Simple MC-CDMA transmitter block diagram 173 B.2 Data scrambler 173

D.l Arrangement of slices within the CLB [7] 175 D.2 Dual-port I/O ports [7] 176 D.3 Single-port I/O ports [7] 176 D.4 FIFO I/O ports [7] 177 D.5 DSP48 slice architecture [8] 178

E.l RF front-end front panel [69] 179 E.2 MAX2829 specifications [69] 180

Introduction

Background

Since the beginning of the twenty-first century, the demand for high-speed wireless communications services has grown tremendously. Today, Third Generation (3G) cellular wireless services have become very popular in many countries over the world although their deployment has been slower at the beginning. The 3G standard was created by the International Telecommunication Union (ITU) and is called IMT-2000 in order to harmonize worldwide existing 3G systems to provide global roaming. A 3G system must allow simultaneous use of speech and data services, and provide peak data rates of at least several hundred of kbps, and up to several Mbps according to the original releases of the 3G interfaces: Wideband Code Division Multiple Access (WCDMA) and CDMA2000. A Fourth Generation (4G) system is expected to provide many high-speed data services such as Internet Protocol (IP) telephony, ultra-broadband Internet access, gaming services and streamed multimedia. Recently, pre-4G technologies such as mobile Worldwide Interoperability for Microwave Access (WiMAX) and first-release Third Generation Partnership Project (3GPP) Long Term Evolution (LTE) technologies have been available on the market (see Figure 1). Both of them were based on the advantages of OFDM technology which offers high-speed data transmission and robustness to multipath fading without having to provide powerful channel equalization. A number of the World's operators and vendors are already committed to LTE deployments and developments, making LTE the market leader in the upcoming evolution to 4G wireless communication systems [9]. During the development of the 4G standard, MC-CDMA was one of the candidate for 4G. In late 2009, LTE-Advance formally submitted to ITU as a candidate for 4G system. However, MC-CDMA remains an open research topic in terms of architecture, algorithm, and hardware implementation.

The MC-CDMA technique is leveraged to support multiple users at a high data rate in a spectrally-efficient manner. It consists of a combination of OFDM and CDMA with the aim of integrating benefits from both technologies. In MC-CDMA, three

Introduction

OBW«ba»ad|<ipt^>ajw 1

2G 3G { | | I, 4G IEEE P 5 ^ fJ^Y^T|j ÇwZZ

i i t 3GPP - f GSM Y GPRS T EDGE ) [ UMTS ] ( H50PA f HSUP* l LTS }—-

3GPP2 f-j«-Y^r) r ^ r ^ f a i

lUTtortnoM

1994 2002 2006 2009 2010» ]

Figure 1: Evolutionary path of cellular technology [1].

popular variations have been proposed: (1) Multi-Carrier Code Division Multiple Access (MC-CDMA), (2) Multi-Carrier Direct Sequence Code Division Multiple Access (MC-DS-CDMA), and (3) Multi-Tone Code Division Multiple Access (MT-CDMA). Both MC-DS-CDMA and MT-CDMA employ time-domain spreading while MC-CDMA employs frequency-domain spreading. Hence, MC-CDMA is capable of exploiting frequency diversity in an explicit manner since the energy of a symbol is spread over several subcarriers [10]. MT-CDMA is capable of providing a significantly higher spreading factor than that of MC-DS-CDMA, resulting in higher numbers of supported users. However, MT-CDMA suffers from Inter-Carrier Interference (ICI) due to loss of orthogonality between subcarriers. Since MC-DS-CDMA is capable of providing backward compatibility with the exiting IS-95 Direct Sequence Code Division Multiple Access (DS-CDMA) system, a specific variant of MC-DS-CDMA has been chosen as one of the3G standards [10].

Spreading codes in MC-CDMA systems are usually similar to the spreading codes used in CDMA systems such as: Pseudo Noise (PN) codes, Gold codes, orthogonal Gold codes, Walsh codes, OVSF codes, etc [5, 10]. In our MC-CDMA system, OVSF codes with length 8 were used as the spreading codes. Since OFDM is the multicarrier component of the MC-CDMA system, a study of OFDM systems must be carried out prior to investigating to MC-CDMA systems. Then, the downlink MC-CDMA system is simulated with the same channel models that were used for the OFDM system.

In OFDM systems, orthogonality between subcarriers is very sensitive to synchronization errors compared to single carrier systems. This results in significant degradation of the performance of the system. There exist many synchronization methods for OFDM systems and they are mostly classified into two groups: preamble based and cyclic prefix based [11-24]. Synchronization techniques for OFDM systems in [11, 12, 14] are adapted to the MC-CDMA system. Pilot-Symbol-Assisted Modulation (P§AM)

Introduction 3

based channel estimation with low-pass Finite Impulse Response (FIR) filtering is used for frequency domain channel estimation [25, 26]. A decision-directed process is also investigated for the improvement of the channel estimation accuracy. A virtual pilot-based and iterative transform domain estimators with the aid of a decision-directed process are studied for our system. An alternative approach based on the LMMSE estimator for PSAM OFDM systems [25, 27-30] is also studied in this thesis. A decision-directed overlap LMMSE channel estimator technique adapted from [30] is used to further improve the performance of the system.

Several MC-CDMA receiver designs were introduced with different implementation parameters to meet the requirements of the next generation mobile communication of cellular systems [31-33]. Tsai et al. in [31, 32] designed and implemented a 9.9 mW and 21.7 Mbps uncoded data rate (full load of 64 users) receiver in a 5 MHz bandwidth for 3GPP typical urban and bad urban channels [34]. They have a synchronization algorithm, which tracks and reduces residual synchronization errors, and a channel estimation algorithm, which provides accurate channel state information with mobility at speeds up to 120 km/h. The entire MC-CDMA baseband receiver was implemented in 0.18 fim CMOS technology. Nours et al. in [33] provided an MC-CDMA design framework on a mixed Digital Signal Processing (DSP)/FPGA platform for the indoor Broadband Radio Access Networks (BRAN)-A channel [35]. They presented the system specifications and simulations and then implementation aspects on a heterogeneous platform combining DSP and FPGA.

The aim of this study is to design, simulate and implement a complete downlink MC-CDMA system that meets 3GPP's indoor-to-outdoor/pedestrian and vehicular channel bandwidth. The parameters of OFDM are first designed for the indoor-to-outdoor/pedestrian and vehicular channels in order to obtain basic parameters for MC-CDMA. An analytical study of the BER performance expressions for MC-CDMA in a Rayleigh multipath fading channel is also presented in order to evaluate quickly and accurately the performance. In order to improve the performance of MC-CDMA, decision-directed based channel estimation methods are also studied in this thesis. The design of training sequences and timing and frequency offset synchronizations are necessary for the implementation of a complete MC-CDMA receiver. All of the floatingpoint algorithms of the receiver must be ported to hardware architectures in fixed-point arithmetic prior to implementation in FPGA. Finally, testing of the receiver is also performed in a static laboratory wireless environment.

Introduction 4

Contributions

The contributions of this thesis are the following:

• design and simulation of a complete downlink MC-CDMA system;

• a timing and frequency synchronization method for a downlink MC-CDMA system based on a new training sequence design method;

• a new decision-directed channel estimator for an MC-CDMA receiver;

•

•

a lower bound on BER for a downlink M-QAM MC-CDMA system in a multipath fading channel;

implementation of a complete baseband downlink receiver in an FPGA platform;

• testing of the receiver in a laboratory static wireless environment.

Thesis organization

The thesis is organized as follows. First, a literature review on multicarrier systems, spreading codes in CDMA systems, channel estimation and synchronization for OFDM and MC-CDMA systems is presented in Chapter 1. The proposed synchronization and channel estimation methods for the MC-CDMA downlink receiver are described in Chapter 2. The FPGA implementation of the downlink receiver for the indoor-to-outdoor/pedestrian configuration is detailed in Chapter 3. Analytical lower bound expressions for the downlink M-QAM MC-CDMA system are presented in Chapter 4. Simulation results of the channel estimation methods, timing and frequency synchronizations, and the complete MC-CDMA system are presented in Chapter 5. Chapter 6 presents the functional testing results and BER performance measurements over a static wireless laboratory channel. Finally, Chapter 7 concludes the thesis.

•

Chapter 1

MC-CDMA system model

1.1 Overview of multi-carrier modulation

In multicarrier modulation, the data stream is divided into N subcarriers or subchannels of lower data rate. This can be seen as parallel transmission in the frequency domain. This scheme does not affect the total bandwidth of B Hz. Each subcarrier is spaced A / = B / N Hz apart, while the symbol duration T = 1/A/ is increased by a factor of N [5]. This leads to the key idea in understanding Orthogonal Frequency Division Multiplexing (OFDM) which is the orthogonality of the subcarriers that allows simultaneous transmission on A subcarriers without interfering with each other.

Figure 1.1 illustrates the basic blocks of an OFDM transmitter. The input data is sent to a Serial-to-Parallel (S/P) converter (the S/P block). Then, the N parallel outputs from the S/P block feed the inputs of the Inverse Fast Fourier Transform (IFFT) block in order to create an OFDM symbol. Since the subcarriers are orthogonal to each other, the OFDM symbol has an overlapping sine spectra centered at the subcarrier frequencies as shown in Figure 1.2. The individual subcarriers are separated and they do not mutually interfere. After the IFFT has been computed, the N complex numbers at the output of the IFFT block are Parallel-to-Serial (P/S) converted. The discrete time domain IFFT output of the i th symbol at the n t h sample is given by

^ ) = ^ É ^ e x p ( ^ ) n = 0,1, . . . , A - l , (1.1)

where Xt<i is the data symbol at the Zth subcarrier of the i th OFDM symbol. The cyclic prefix is inserted in order to combat the Inter-Symbol Interference (ISI) and Inter-Carrier Interference (ICI) caused by the multipath channel. The cyclic prefix is

Chapter 1. MC-CDMA system model

S/P IFFT P/S S/P IFFT P/S S/P IFFT P/S S/P IFFT P/S S/P IFFT P/S S/P IFFT P/S S/P IFFT P/S S/P IFFT P/S S/P IFFT P/S S/P IFFT P/S Insert Cyclic Prefix

DAC Upconverter

Figure 1.1: Basic blocks of an OFDM transmitter.

-0.4 20 40 60 80 100 120 140 160 180 200 220

Normalized frequency fT

Figure 1.2: Orthogonal overlapping spectra for OFDM.

sometimes called the guard interval. In order to create the cyclic prefix, a complex vector of length Tg at the end of the symbol duration T is copied and appended to the front of the signal block. The total OFDM symbol duration becomes Ts = T + Tg. In practice, the cyclic prefix is chosen to be longer than the maximum delay spread of the channel. The complex baseband representation of the transmitted signal in the time domain x(t) is expressed as

1 J V - l

*(*) = -7B £ E *M e x P b '2*/A/« - TB - iTs)]g(t - iTs) , (1.2) V J i=-oo 1=0

where g(t — iTs) is a rectangular windowing function Vf € [0, T3]. The resulting signal is fed to the Digital-to-Analog Converter (DAC) and low-pass filtered for each real and imaginary stream. The output of the DAC is up-converted, sent through a bandpass


Downconverter ADC Remove Cyclic Prefix

S/P FFT P/S S/P FFT P/S S/P FFT P/S S/P FFT P/S S/P FFT P/S S/P FFT P/S S/P FFT P/S S/P FFT P/S S/P FFT P/S S/P FFT P/S

Figure 1.3: Basic blocks of an OFDM receiver.

filter, and then sent to the antenna for transmission.

The signal is transmitted over a frequency selective X-path fading channel with its impulse response expressed as [36, 37]

K - l

h ( t , r ) = Yl hk(t)5(t - rfc) fc=0

(1.3)

where hk (t) is the equivalent low-pass response of the kth path and rk is the time delay of each multipath component. We assume that these paths are uncorrelated to each other so that the total channel energy is normalized to one, and that the impulse response of the channel is quasi-static during one OFDM symbol period but time varying from one symbol to another.

At the receiver side, the received signal is the convolution of the transmitted signal and the channel impulse response. That is

K - l

V(t)= E hk(t)x(t - rfc) + w(t) fc=0

1 °° K-1N-1 /i 4 \

= ^ E E Yhk(t)Xu^p[J2nlAf(t-Tk-Tg-iTg)}g(t-Tk-iTs) K ' ' V I i=-oo fc=0 (=0

+ w(t) ,

where w(t) is Additive White Gaussian Noise (AWGN). Figure 1.3 illustrates the basic blocks of an OFDM receiver. In the first step, the received signals are down-converted and fed to an Analog to Digital Converter (ADC). After removing the guard.interval (cyclic prefix), the discrete-time representation of the n t h received sample for the i th

'


symbol is given by

K - l

Vi(n) = £ hk(n)xi(n - Tk) + w{(n) k=0

K ~ 1 N ~ \ j 2 n ( n - T k ) l N

/ «-O— l \ N - l

T f y E E fc*(n)Xwexp viv fc=0 ,= 0

Wi(n)

(1.5) - j = £ hk(n) exp ^ — j v - j g X u exp ^ j + * ( n )

1 ^ (j2itnl\ . . - = 2 ^ %(n)A:,,iexp I — — j + ^i(n) ,

where rk is the time delay in sample periods for the A;th path and Hi (n) is the channel frequency response at the Zth subcarrier. Because the channel is assumed quasi-static within a symbol duration, Hi(n) is assumed constant within a symbol duration, i.e. Hi(n) « Hi. As a result, the n th sample of the i th symbol in discrete time is rewritten as

/ v 1 v^1 TT -%r ( j2 i rn l \ = -ff i 2-r H u X u e x p I — — j + Wi(n) . (1.6)

The Fast Fourier Transform (FFT) block performs demodulation in order to obtain the transmitted symbols with the amplitude and phase corrupted by the channel and additive noise. The data symbol at the Zth subcarrier for the i th symbol after applying FFT demodulation is expressed as

n . = ^ E »(*) e x P ( Z l v ^ ) = *M#M + Wu , (1.7)

where Wj^ is the AWGN on the Zth subcarrier for the i th symbol. Finally, the output data stream is obtained by converting the output of the FFT block into a serial bit stream.

As we just saw in OFDM, the data is modulated using the IFFT. There are some high performance commercial FFT/.IFFT cores provided by companies such as Xilinx, Altera, or Actel. Such FFT/IFFT cores provide several architecture options to offer a trade-off between core size and transform time. Thus, the use of an FFT/IFFT core is very efficient for the implementation of OFDM systems. Figure 1.4 (Figure 1 in [2]) illustrates the pipelined streaming I/O architecture which is provided by Xilinx. The pipelined architecture uses several radix-2 butterfly processing engines to offer continuous data processing. Another architecture using fewer resources than the pipelined streaming I/O architecture is shown in Figure 1.5 (Figure 2 [2]). This architecture uses only one radix-4 butterfly engine and has two processes [2]. One process is loading

Chapter 1. MCCDMA system model 9

and/or unloading the data, and the second process is calculating the transform. This architecture requires lower resource usage than the pipelined streaming I/O architecture but has a longer transformation time.

3roup 0 Group 1

Memory Memory Memory Memory

' ! i

i \ i k

■

1 L

1

Input Data i Radix-2 Butterfly

Radix-2 Butterfly

Radix-2 Butterfly

Radix-2 Butterfly

Radix-2 Butterfly

Radix-2 Butterfly

Radix-2 Butterfly

Radix-2 Butterfly

Stage 0 Stage 1 Stage 2 Stage 3

Memory

Radix-2 Butterfly

Memory

Radix-2 Butterfly

Output Shuffling

Output Data

Figure 1.4: Xilinx's pipelined streaming I/O architecture [2].

Input Data Data

RAMO

Data RAMI

Data RAM 2

Data RAM 3

ROM for Twiddles

RADIX-» DRAGONFLY

Output Data

Figure 1.5: Xilinx's radix4 burst I/O architecture [2].

Chapter 1. MC-CDMA system model 10

1.2 Channel estimation and synchronization in multi-carrier systems

1.2.1 T i m e and frequency synchroniza t ion

OFDM systems are much more sensitive to synchronization errors than single carrier systems. In OFDM, orthogonality can only occur if the receiver clock is synchronized to the transmitter clock and no frequency offset exists. Thus, the synchronization of an OFDM signal requires finding the symbol timing and carrier frequency offset, i.e. finding an estimate of where the symbol starts. Many synchronization methods for multicarrier systems have been proposed in the last few years and they are mostly classified into two groups: preamble based and cyclic prefix based [12-24].

Authors in [15-17, 19-22, 24] use the periodicity of the cyclic prefix for timing synchronization. Such a method performs Maximum Likelihood (ML) estimation of time and Carrier Frequency Offset (CFO) by exploring the periodicity of the cyclic prefix. Beek et al. [24] introduced the log-likelihood function

A(n,e) = | 7 ( n ) | - M n ) , (1.8)

where n-HVp-l

7(n)= £ y(k)y*(k + N) , (1.9) k=n

correlates the received sampled baseband signal with its delayed version and

, n + N p - l

V(n) = - Yl l y ^ l + l y ^ + AOl2 . (1.10)

compensates for the difference in energy in the correlated samples, where Np is the number of samples in the cyclic prefix, SNR is the signal-to-noise ratio, ()* denotes complex conjugate, and

SNR P = S N R T T ' ( L 1 1 )

The timing offset is given by finding the maximum of the log-likelihood function

An = argmax{A(n,e)} , (1-12)

and the CFO estimate is given by

ê = ~ Z 7 ( A n ) , (1.13)


8 + g = 16 u-s

10x0.8 = 8|U 2x0.8+2x3.2=8.0|is « ►< »<f ^ « n r \ ~T T i i ~r r r i ^ / i i v — i w—i N ,—i / M t2 t3 t4 t5 % t7 t8 t9 t10V GI2 I T, | T2 I GI| SIGNAL GIi Datai I GIi Data 2

2. i i i i i i i i i s \ i i J \ i A i A. i

0.8+3.2 = 4.0 us 0.8 + 3.2 = 4.0 |is 0.8 + 3.2 4.0 |U

- ► • * - -+<- ■++-Signal Detect, Coarse Freq. Channel and Fine Frequency RATE AGC, Diversity Offset Estimation offset Estimation LENGTH

-* •«-SERVICE + DATA DATA

Selection Timing Synchronize

Figure 1.6: IEEE 802.11a OFDM training structure [3].

where L denotes the angle of a complex number.

In contrast to cyclicprefixbased synchronization techniques, the timing and fre

quency synchronization method based on preamble symbols is also suggested in [4, 11

14, 18, 23]. Timing and frequency synchronization methods based on the IEEE 802.11a preamble structure are presented in [4, 11, 12, 14, 18]. In the IEEE 802.11a standard, the preamble is appended at the start of every frame. The preamble consists of ten short training symbols having a duration of 0.8 pis each and two long training symbols having a duration of 3.2 p,s each as illustrated in Figure 1.6 [3]. In [12], the authors proposed an accurate coarse symbol timing synchronization method based on the use of short training symbols. This technique involves a convolution between a known short training symbol and the received sequence. The resulting convolution peak provides the expected coarse frame boundary. In [4, 14], a short training symbol MLbased timing synchronization technique uses a double autocorrelation to estimate the coarse frame boundary. In order to increase estimation accuracy, the authors present two normalized autocorrelation timing metrics Mi and M2 They are given by

N B 1 Mi(n) = Y y(n + k)y*(n + k + Ns)

k = 0

and N 3 l

M2(n) = Yv( n + % > + k + 2 N * ) fc=0

(1.14)

(1.15)

where Ns is the delay of one short training symbol. We can see that the second metric M2(n) is defined as the correlation between the received signal and itself with a delay of two short symbols, 2A^. The triangular shaped timing metric is obtained by subtracting M2(n) from Mi(n). The peak value of the difference |M2(n)| — |Mi(n)| indicates the start of the 9 th short training symbol as illustrated in Figure 1.7. That is, the timing estimate is given by

An = argmax{|M2(n)| IM^n)!} (1.16)


Timing metric M 1(8)

20 60 80 100 120 140 Timing metric M2(9)

200

200

Figure 1.7: Timing metric for double auto-correlation [4].

The symbol timing estimate An could be earlier or later than the true time. If An is earlier than the true time, part of the cyclic prefix of the current symbol is taken as data, thus causing no interference. If An is later than the true time, part of the cyclic prefix of the next symbol is taken as data, which introduces ISI. However, the ISI can be easily avoided by shifting the estimated symbol time ahead. Figure 1.8 illustrates the block diagram of the given timing and frequency synchronization algorithm. The fine timing synchronization is obtained by a conventional method of cross-correlating the received sample with the known long training symbols [4]. At the receiver side, the main difference between two consecutive short training symbols is the phase difference caused by the carrier frequency offset [14]. As a result, the CFO estimation using Mx(9) can be expressed as

ZMj(An) e =

TT (1.17)

where ê is frequency estimate. In order to improve the accuracy of the frequency estimation, averaging of the first metric Mi must be performed over four short training symbols [14]. The averaging auto-correlation is given by

4 N S - 1

M(n) = Y y(n + k)y*(n + k + Ns) , k=0

(1.18)


Received data Buffer

Correlator

Buffer

-G

Correlator Averaging

Peak detector

Timing estimate ►

Phase estimator

CFO estimate

CFO corrector

CFO correct ►

Figure 1.8: Block diagram of the timing and frequency estimator.

and the frequency estimate is

£ = ZMi(An)

TT

Finally, the frequency offset compensation is given by

y(n) = 2 / ( n ) e x p ( ^ M

(1.19)

(1.20)

where y(n) and y(n) are the received sample and the frequency offset compensated sample, respectively.

1.2.2 Channel estimation and equalization

In OFDM systems, the overall system performance will degrade due to the frequency se

lective fading channel, as subchannels may experience deep fades. In order to compen

sate for frequency selectivity, the use of channel estimation and equalization is critical for the performance of OFDM systems. Channel estimation can be performed by either inserting pilot tones into all subcarriers of the OFDM symbol (time domain), also know as blocktype pilot channel estimation, or inserting pilot tones into each OFDM symbol (frequency domain), also known as combtype pilot channel estimation [25, 26] as shown in figures 1.9(a) and 1.9(b), respectively. The blocktype pilot channel estimation has been developed under the assumption of a slow fading channel (i.e. the channel transfer


H,

i i

•ooo»ooo«ooo •ooo#ooo«ooo •oootoootooo •oootoootooo , toootoootooo •oootoootooo •oootoootooo •oootoootooo

. £ Pilot symbol Q Data symbol

• • • • • • • • • oooooooo o oooooooo oooooooo • • • • • • • • oooooooo oooooooo oooooooo • • • • • • • • oooooooo oooooooo oooooooo • • • • • • • •

Data tubcamatr

(a) Block-type (b) Comb-type

Figure 1.9: Example of a pilot grid.

function does not change very rapidly). The comb-type pilot channel estimation has been developed under the assumption that the channel changes from one OFDM block to the other. The comb-type channel estimation technique estimates the channel at pilot frequencies. Consequently, the frequency response of the channel at frequencies where pilot tones are not located must be interpolated using various interpolation techniques such as linear, spline, FFT, or low-pass filtering [26, 38]. Furthermore, if the multipath channel is time varying, the interpolation in the time domain must perform the tracking of the variations of the channel.

The time-frequency structure of the OFDM signal makes a two-dimensional pilot grid especially attractive for channel measurement and estimation as mentioned in [5, 10, 39]. An example of such a two-dimensional grid is shown in Figure 1.10. The authors in [39] generalize the formulation of the 2D discrete Minimum Mean Square Error (MMSE) estimation problem and its solution.

Assume that the Least Square (LS) estimate of the channel frequency response at pilot location Hk',v is given by

È v j r - B v f + Wvfi , V(k ' , l> )eV, (1.21)

where 0 < k' < K — 1 and 0 < /' < L — 1 are frequency and time indices of the pilot location, K is the total number of symbols, L is the total number of subcarriers, V is a set of pilot locations, the number of observations is Agnd = \V\ < KL, H(k',l ') is the actual channel frequency response at the pilot location, and Wk<i> is a complex AWGN sample. The linear estimate Hk,i of the channel at a specific data location (k, /) is given by [39]

Hk}l = Y Ck,l;k',l'Hk,l , k',l'€T>

(1.22)

.


•ooo«ooo«ooo« ooooooooooooo ooooooooooooo ooooooooooooo •000«OOOt»000« OOOOOOOOOOOOO OOOOOOOOOOOOO OOOOOOOOOOOOO •000-»000«000« OOOOOOOOOOOOO OOOOOOOOOOOOO OOOOOOOOOOOOO •000«000«000«

M Pilot tone

Q Data subcarner

Figure 1.10: A twodimensional pilot grid.

where Ck,i#,v is the complexvalued filter coefficient which depends on the position (k, I) of the channel frequency response to be estimated, as well as on the nearest pilot location (k', l') G V. The above equation can be rewritten in vectorial notation in the following

Hkj = CfcjHjy , (123)

where Ck,i is an N t a p x 1 estimator coefficients vector defined by

Cfc.; = Ckj;k'vl[> Cfc,{;*ai<3> ■ • • ' C*,J;fc ^tap ' ^ t a p (1.24)

and t l k j is an A ap x 1 vector containing channel frequency responses at the nearest pilot locations in order to estimate the channel at position (k,l). It is defined by

-|T H k.i H y i> , H y ; ' , . . . , H k ' ;

* c l '* l ' K 2 ' l 2 ' ' N,.„'1. tap **tap (1.25)

The authors in [39] stated that the optimal filter order is N t ap = Ngr id . The Mean

Square Error (MSE) for a given location (k, I) is defined as

MSE (C*,/) = E I |i/fc)j Hk,i\ ? , (1.26)

where E{.} denotes expectation, and Hkj, is the real channel response at the (k,l) position. The optimal filter coefficients are obtained by minimizing (1.26), i.e. a 2D Wiener filter is obtained by applying the orthogonal projection theorem [39]:

E {(Hkil Hkt l)H*k„ r} = 0 , V 0 < k" < K 1, 0 < I" < L 1 , (1.27)

where k" and I" are frequency and time indices of another nearest pilot location. The authors assume that the Wiener filter is physically realizable, i.e. the coefficients Ck,i,k',l' exist. The 2D discrete WienerHopf equation is obtained as [39]

E \Hk,iHk",i"} = Yl ^%i,k',i'E[Hk\i'Hl,,tl,,j V {&",/"} e7> {k',i'}ev

(1.28)


where C£[j*fc, t, is the set of optimum filter coefficients and the cross-correlation between the channel frequency response at actual location (k, I) and the channel frequency response at pilot location (k", I") is given by

rk-k»,i-i» = E lyHkjHl,, v , j , (1-29)

and the auto-correlation between the channel frequency response of two nearest pilot location (fc', V) and (k", I") is

Rk'-k",V-l" = E {Hkrj'Hj^jHt . (1.30)

Inserting (1.29) and (1.30) into (1.28) yields

f"fc-k",i-i" = Cfc .jfc/j>Rk'-k",i'-i" , (1-31)

which, in vector notating, becomes

i j , = C ^ R H H , (1-32)

where R-HH is &n Ntap * N t ap channel auto-correlation matrix and r^j is an N t ap x 1 cross-correlation vector. Thus, the optimal filter coefficient vector is given by

C ^ ^ R H H - 1 - (1-33)

Inserting (1.23) into (1.26) yields the MSE [39]

MSE (Cw) = 4 - r£.C^ - C&rfc, + C^RHHC;,, = a H ~ rfc,;Cfc,; - CktlTk,i + Ck ( R H H C ^ ,

where ()* denotes transpose conjugate, and o \ is the variance of H k j .

(1.34)

Instead of performing 2D filtering, we can perform suboptimal 2 x ID filtering without losing significantly in performance [39]. First, filtering in the time domain is performed, followed by filtering in the frequency domain. The algorithm can be summarized as follows.

Channel estimation in the time domain

• At a given kth subcarrier, search for the Apt nearest pilot locations (k',l') and (k", I") with respect to the actual location (k, I), where A pt is the number of filter taps in the time domain. The nearest can be understood as the distance from the actual position to the closest pilot symbol position according to the "weighted" distance such as |/ — I'lfomaxTg + \k — A;'|rmaxA/ [39], where fomax is the maximum Doppler shift, Ts is the symbol duration, and rmax is the maximum delay spread.


• Obtain the channel response at the nearest pilot positions.

• Compute the auto-correlation matrix R H H of size Np t x Np t between the nearest pilot positions.

• Compute the cross-correlation vector rk j of size Apt x 1 between the actual position (k, I) and the nearest pilot locations.

• Compute the filter coefficients set Ckii using (1.33).

• Compute the channel estimate H k l at the actual position using (1.23).

Channel estimation in the frequency domain

• At a given Ith symbol, search for the Np{ nearest pilot locations (k', I') and (k", I") with respect to the actual location (k, I), where Apf is the number of filter taps in the frequency domain.

• Obtain the channel response at the nearest pilot positions (using the result from the time domain estimation).

• Compute the auto-correlation matrix R H H of size Np{ x Np{ between the nearest pilot positions.

• Compute the cross-correlation vector rkii of size Apf x 1 between the actual position (k, I) and the nearest pilot locations.

• Compute the filter coefficient set Ck,i using (1.33).

• Finally, compute channel estimate H k l at the actual position using (1.23).

Besides these above channel estimation techniques, there exist several Linear Minimum Mean Square Error (LMMSE) estimators for Pilot-Symbol-Assisted Modulation (PSAM) OFDM systems. Golovins in [27] described in more details the LMMSE estimator for PSAM OFDM systems. Hsieh in [25] studied a simplified LMMSE estimator in conjunction with conventional interpolation methods. Edfors in [29] also introduced an optimal low-rank approximation using singular value decomposition (SVD), but he did not mention the complexity of the Singular-Value Decomposition (SVD) algorithm. Partitioning the channel correlation matrix into small sub-matrices in order to reduce the complexity of the LMMSE estimator was studied in [28-30]. Noh in [28] investigated non-overlap and overlap techniques for partitioning the channel correlation matrix into small sub-matrices. Mehlfiihrer in [30] used only one small sub-matrix in order to realize the approximate LMMSE estimator.


Data stream

PN code

_n_n_n_n_

-►(x) _n_n_n_n_

AN"

►®-

V

Spreading RF upconversion

Figure 1.11: Example of a simple CDMA transmitter.

Furthermore, a traditional decisiondirected process could be coupled with the Least Square (LS) channel estimates in order to improve the estimation accuracy [40, 41]. Zhu in [40] applied a decisiondirected process to the LS channel estimates to blindly select virtual pilot tones to further improve the estimation performance over a static channel. Wei in [41] used a decisiondirected process in conjunction with an iterative transform

domain process based LS channel estimation.

1.3 Overview of C D M A

1.3.1 Spreading concept

Code Division Multiple Access (CDMA) is a multiple access technique where different users share the same frequency band at the same time. Figure 1.11 illustrates an example of a simple CDMA transmission scheme. The heart of CDMA is the spread spectrum technique, which uses a higher data rate signature pulse to enhance the signal bandwidth far beyond what is necessary for a given data rate [5]. Spreading is obtained via a multiplication of the baseband data information by a spreading sequence of pseudo random alternating positive and negative pulses, sometimes called Pseudo Noise (PN) or code signal, before transmission. In CDMA, the Spreading Factor (SF) is defined as the ratio of the information bit duration over the chip duration

=1 T,

(1.35)

where Tb and Tc are the bit duration and the chip duration, respectively. This leads to an increase of the bandwidth by the spreading factor, as shown in Figure 1.12.

Chapter 1. • MC-CDMA system model 19

- i

Data signal / \ Spread signal

i y / * ~ ~ *S .

Frequency

Figure 1.12: Power spectrum of the spread signal versus the data signal.

1.3.2 Spreading codes

Pseudo-noise sequences

A spreading code is mainly characterized by its auto-correlation and cross-correlation functions. The rate of the spreading code is called the chip rate. A well-known technique to generate codes with good auto-correlation properties can be implemented using a Linear Feedback Shift Register (LFSR). A register of length m can be configured with appropriate feedback paths to produce a sequence of "0"s and "l"s having maximal possible length Mseq = 2m — 1, sometimes called a maximal-length sequence or /resequence. In [5], the authors show that a linear feedback shift register of length m produces an m-sequence if and only if the corresponding generating polynomial of degree m is primitive. Let us consider a register of length m = 5 with the weights given in the brackets as shown in Figure 1.13 (Figure 5.13 in [5]). It possesses 2m = 32 different states. The register generates a sequence of "0"s and "l"s as an output whose period cannot exceed the maximum-length sequence Mseq = 2m — 1 = 31. This example exhibits the following properties:

• An m-sequence contains | (M s e q + 1) ones and | (M s e q + 1) — 1 zeros. Thus, for large Mseq, we can have nearly the same number of ones and zeros.

• We have 1 run of ones of length m, 1 run of zeros of length m — 1, 2 m _ r _ 2 runs of zeros and 2m~ r~2 runs of ones each of length r, where r — 1, 2, 3 , . . . , m — 2.

Consider a sequence c corresponding to an m-sequence by mapping each "0" to "+1" and "1" to " - 1 " , the periodic discrete normalized auto-correlation function is given


Run of length: 5 2 1 1 2 1

1 1 1 1 1 0 0 1 1 0 1 0 0 1 0 0 0 0 1 0 1 0 1 1 1 0 1 1 0 0 0 - 4

First bit Last bit of the first period

Figure 1.13: Example of LFSR with m = 5 [5].

by [5]

R(u) I JWseq-l

M £-4 W w (1.36)

and the periodic normalized cross-correlation function between two periodic codes Cj a n d Cj is

R i M 1 M S e q - l

m s e q k = 0

(1.37)

It can be shown that the auto-correlation function of m-sequences is only two-valued: 72(0) = 1 and R(v) = — j ^ ~ , V 7 0. Hence, for large Mseq, the m-sequences have a nearly optimal auto-correlation function but do not have necessarily low cross-correlation among themselves. The peak values of the cross-correlation function decrease slowly with increasing of the sequence length [5]. Therefore, another method, described below, is used to derive codes with low cross-correlation for m-sequences.

Gold codes

A set of Gold codes of length Mseq can be obtained by combining specific pairs of /resequences c and d [5]. This set of Gold codes Y is given by c, d and the modulo-2 sum of c and all Mseq different cyclically shifted version of c', thus it contains M ^ + 2 elements. The generation of a set of Gold codes can be expressed as

T = Co, C i , . . . , CM a B^, CA/ s e q + l (1.38)


q, = 1 1 1 0 0 1 0 0 c, = 1 1 0 1 1 1 1 0 c2 = 1 0 1 0 1 0 1 0 c3 = 1 0 0 1 0 0 0 0 c4 = 0 1 1 1 1 0 0 0 c5 = 0 1 0 0 0 0 1 0 c6 = 0 0 1 1 0 1 1 0 c7 = 0 0 0 0 1 1 0 0

► C = 1 1 1 0 1 0 0

8 initial settings

Figure 1.14: Example of Gold codes generation [5].

where co = c, cMseq+i = c', cll = c + c'(p), p = l , 2 , . . . , Mseq (1.39)

and where d(p) is the msequence given by the binary representation of p as the initial setting for the shift registers. Gold codes exhibit lower peak crosscorrelations than msequences and hence they differentiate among different users more distinctively. Fig

ure 1.14 shows an example for the generation of Gold codes of length Mseq = 7 with the shift registers corresponding to the following primitive polynomials of degree 3:

Px(x) = x3 + x2 + 1 , (1.40)

P2(x) = z3 + a;rl . (1.41)

Kasami codes

The Kasami codes are derived from msequences and the peak values of their cross

correlation functions are smaller than for Gold codes [5]. The set of Kasami codes is constructed similarly to the set of Gold codes by taking Co and the modulo2 sum of Co and all 2fc — 1 cyclically shifted version of d.


Walsh codes

Walsh codes are generated from a Hadamard matrix. The Hadamard matrix is defined by Hi = 0 and the recursive relation [10]

U l n — H n H n

H n H n (1.42)

for n = 2q, where q is an integer. Each row or column of the Hadamard matrix is a Walsh code of length n. Note that every row of H 2 n is orthogonal to all other rows.

Or thogonal Gold codes

The orthogonal Gold codes are obtained by simple zero-padding of the original Gold codes. Let us consider the generation of Gold codes as in Figure 1.14. We set the initial values of the upper LFSR to "111" and the lower LFSR to "001". We obtain eight orthogonal Gold codes by adding a "0" at the tails, which are given below [10]

1 0 1 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 1 1 0 0 0 0 1 1 1 1 0

G8 = (1.43)

The rows of Gg form an orthogonal Gold code of length 8. We can easily verify that the cross-correlation between the rows is zero after replacing "0" and "1" with "1" and " - 1 " , respectively.

Zadoff-Chu sequences

The u t h root Zadoff-Chu (ZC) sequence of length NZc is defined as [42]

. . —JTTu(n + l)n xu(n) = exp — — ± —, 0 < n < NZc

A; zc (1.44)

ZC sequences have the following properties [42]:


• Periodic property when NZc is prime.

xu(n) = xu(n + N z c ) (1.45)

• The DFT of the u th root ZC sequence of length NZQ is a conjugated time-scaled version of xu(n), multiplied by a constant factor.

• Zero auto-correlation of a prime length ZC sequence with a cyclically-shifted version of itself.

• Constant magnitude (vÂzc) of the cross correlation between two prime length ZC sequences.

1.4 Fundamentals of MC-CDMA

MC-CDMA, a digital modulation and multiple access scheme [10, 43, 44], is a combination of OFDM and CDMA which draws benefits from both. Such a combination is not unique and three different types of MC-CDMA can be distinguished, namely Multi-Carrier Direct Sequence Code Division Multiple Access (MC-DS-CDMA), Multi-Carrier Code Division Multiple Access (MC-CDMA) and Multi-Tone Code Division Multiple Access (MT-CDMA). MC-DS-CDMA spreads data symbols in the time domain. In this scheme, the same spreading code is used for all subcarriers as illustrated in Figure 1.15(a). In contrast, a data symbol in MC-CDMA is spread in the frequency domain, i.e. each chip of the spreading code corresponds to a subcarrier as illustrated in Figure 1.15(b). A fraction of the symbol, corresponding to a chip of the spreading code, is transmitted through a different subcarrier. Furthermore, symbols are modulated on many subcarriers to introduce frequency diversity instead of using only one carrier like in CDMA. Thus, MC-CDMA is robust against deep frequency-selective fading compared to DS-CDMA [45]. MT-CDMA spreads the data streams using a given spreading code in the time domain similar to MC-DS-CDMA. But the time domain spreading is performed after the IFFT stage as shown in Figure 1.15(c). This leads to the resulting spectrum of subcarriers no longer satisfying the orthogonality condition and the processing gain is increased within a given bandwidth [10]. Therefore, MT-CDMA allows longer spreading codes leading to more users than MC-DS-CDMA, but suffers from Inter-Carrier Interference (ICI).


a S/P 'A') IFFT P/S y M

fl,

(a) MC-DS-CDMA

v-(o)

. C j ( L - \ )

(b) MC-CDMA

S/P IFFT

aJ S/P

1 1 1 1

IFFT

1 1 1 1

S/P

1 1 1 1

IFFT

1 1 1 1

S/P IFFT

5 - a *

A')

' / ( ' )

- & v«

(c) MT-CDMA

Figure 1.15: MC-CDMA classification.

1.4.1 MC-CDMA transmit ter model

Consider an MC-CDMA transmitter for the u th user with N subcarriers modulated using M-QAM as shown in Fig. 1.16. The M-QAM input data symbols have to be S/P


M-QAM symbol input

S/P

Spreading

C

XI

1 r*

T 1 r*

T 1 1 1

- ^ ( 5 0 — 1

i i

—-ob—-

IFFT

1 1 1 1

i r" 1 1 1 1

vy *

CP Windowing x"(t)

Figure 1.16: MCCDMA transmitter.

converted to P parallel branches X$, X™, • • • , Xp_1 . The purpose of the S/P converter is to slow down the symbol rate in order to ensure frequency nonselective fading on the resulting subcarrier [43]. Each branch is spread in the frequency domain with an orthogonal spreading code C u of length L. The spread symbols are mapped onto PL = N subcarriers and an IFFT is performed to convert them to a time domain signal. The spreading data sequences are converted back to a serial data sequence. Then, a cyclic prefix is inserted between the symbols to combat the ISI and ICI caused by multipath fading. Finally, a windowing function g(t — iTs) is applied to the signal before DAC and upconversion for transmission.

1.4.2 MCCDMA receiver model

Consider the MCCDMA receiver for the itth user as shown in Figure 1.17. The re

ceived signal is first downconverted. Then, the cyclic prefix is removed and the re

maining samples are S/P converted to N parallel branches. The FFT block performs demodulation in order to obtain the transmitted symbols with the amplitude and phase corrupted by the channel and the additive noise. The amplitude and phase corrupted signal subcarrier is equalized and despread using the desired u th user spreading code to combine the received signal energy scattered in the frequency domain. The equaliza

tion gain factor can be obtained using one of the wellknown methods such as Equal

■ .


Despreadmg

Received signal

m Remove

CP S/P FFT

b

S/P FFT S/P FFT S/P FFT S/P FFT

M-QAM symbol output

Figure 1.17: MC-CDMA receiver.

Gain Combining (EGC), Maximum Ratio Combining (MRC), or Orthogonal Restoring Combining (ORC), sometimes called the Zero Forcing (ZF) method. The output data is then P/S converted to obtain the u th user data stream.

1.5 Channel models

1.5.1 Types of fading

In mobile radio communications, the received signal consists of the reflection, diffraction, and scattering versions of the transmitted signal. Multipath fading is caused by interference between several versions of the transmitted signal along different paths which arrive at the receiver at different time. Multipath fading creates small-scale fading effects. Mobile radio propagation in general is classified into two types: large-scale and small-scale fading [37]. In large-scale fading, the average received signal power is attenuated due to motion over large areas. This phenomenon is due to prominent terrain (hills, forest, buildings,...). Small-scale fading refers to the rapid fluctuations of the amplitude of the signal over a short travel distance, i.e. small changes in spatial separation between the transmitter and the receiver. Small-scale fading is classified based on two conditions: delay spread and Doppler spread [37]. The classification of the fading types in small-scale fading is illustrated in Figure 1.18.


Small-scale fading

1 1 f

Delay spread based Doppler spread based

i 1 w \ r 1 r

Flat fading

Signal BW < Coherence BW Delay spread < Symbol period

Frequency selective fading

Signal BW > Coherence BW Delay spread > Symbol period

Fast fading

High Doppler spread Coherence time < Symbol period Channel variations faster than baseband signal variations

Slow fading

Low Doppler spread Coherence time > Symbol period Channel variations slower than baseband signal variations

Figure 1.18: Small-scale fading classification.

In the next section, parameters for specific mobile multipath channels will be presented. The parameters are derived from the power delay profile which is generally represented as a plot of relative power versus excess delay with respect to a fixed time reference.

1.5.2 3GPP WCDMA channel models

In the simulation of the MC-CDMA systems, we will use the simple channel models proposed in the Third Generation Partnership Project (3GPP) Technical Specification (TS) 25.101 v2.1.0. The 3GPP models were modified from the International Telecommunication Union (ITU) models in order to use them for performance measurements in multipath fading channels. In TS 25.101 v2.1.0, there are propagation condition models for indoor (Case 1), indoor-to-outdoor/pedestrian (Case 2), and for vehicular (Case 3) environments. All paths have a classical Doppler spectrum. Because the indoor-to-outdoor/pedestrian propagation model has a weaker average power for the second path than the indoor model at the same relative delay, the indoor-to-outdoor/pedestrian model is used in this work as a worst case for a slow fading channel. The propagation conditions for indoor-to-outdoor/pedestrian and for vehicular environments with low delay spread are depicted in figures 1.19 and 1.20 (Case 2 and Case 3 in TS 25.101 v2.1.0) [46]. In our work, a wireless channel bandwidth of 5 MHz will be used in order to be consistent with the Wideband Code Division Multiple Access (WCDMA) bandwidth allocation. The carrier frequency is designated by the Universal terrestrial radio access Absolute Radio Frequency Channel Number (UARFCN) [46] and is equal to 2160 MHz.


OdB

-10 dB

-20 dB

-30 dB

-12.5 dB

-24.7 dB

244 488 T(ns)

Figure 1.19: Indoor-to-outdoor/pedestrian channel power delay profile(3 km/h).

Indoor - to -ou tdoor /pedes t r i an channel

In Figure 1.19, the first arrow represents the direct Line-of-Sight (LOS) signal, which is the strongest one and the reference level. The second arrow represents the first multipath signal, which is 12.5 dB weaker than the LOS signal and arrives 244 ns later. The third arrow represents the second multipath signal, which is 24.7 dB weaker than the LOS and arrives 488 ns later. Using this channel model, the following parameters are computed: maximum delay spread, mean excess delay, second moment of power delay profile, RMS delay spread, coherence bandwidth, and maximum Doppler shift.

The maximum delay spread of the channel is the maximum relative delay of the weakest multipath component as compared to the first arriving component in the power delay profile, sometimes called the maximum excess delay. This is an important parameter that is used to determine the guard interval for OFDM systems in Section 1.6. Therefore, the maximum delay spread of this channel model is

Tmax = 0.488 US (1.46)

The mean excess delay is the first moment of the power delay profile and is defined as [37]

s E k P ( r k ) r k = 0.0145 ps , (1.47) E k P ( r k )

where P(r k) is the average power (in linear power units) and Tk is the relative delay in seconds. The RMS delay spread is the square root of the second central moment of the


P(t) A

OdB

-10 dB

-20 dB

-30 dB

-2.4 dB A

244

-6.5 dB -9.4 dB

-12.7 dB -13.3 dB -15.4 dB

A

-25.4 dB A

488 732 976 1220 1464 1708 t (ns)

Figure 1.20: Vehicular channel models power delay profile (120 km/h).

power delay profile and is defined as [37]

0 T = JT~2 _ ( f )2 f

where r 2 is the second moment of the power delay profile and is given by

Y.kP{Tk)ri r2 = E k P ( r k )

Therefore, the RMS delay spread is

= 0.0039 p s '

o T = ^0.0039 - 0.01452 = 0.0609 u s

(1.48)

(1.49)

(1.50)

The coherence bandwidth, Be, is a statistical measure of the range of frequencies over which the channel can be considered "flat". If the coherence bandwidth is defined as the bandwidth over which the frequency correlation function is above 0.9 [37], then

Br 1

50 x oT (1.51)

The coherence bandwidth is also an important parameter to determine subcarrier spacing for OFDM systems. In the indoor to outdoor case, the coherence bandwidth is approximately

Br 1 = 328.4 kHz . (1.52) 50 x 0.0609 x 10-3

The Doppler spread is a measure of the spectral broadening caused by the temporal rate of change of the mobile radio channel and is defined as the range of frequencies over which the received Doppler spectrum is essentially non-zero. The Doppler spectrum has


a frequency range of fc ± fomax where fomax is the maximum Doppler shift and is given by [37]

/Dmax = ~ , (1.53) C

where v is the velocity of the mobile (3 km/h), fc is the carrier frequency (fc = 2160 MHz is set according to [46]), and c is the velocity of light. Therefore, we have

3 x 1000 x 2160 x 106 ^ TT f D m a * = 3600 x 3 x 10» = 6 H Z • ( L 5 4 )

The coherence time, Tc, is the time domain dual of the Doppler spread and is used to characterize the time-varying nature of the frequency dispersiveness of the channel and is defined as [37]

0.423 0.423 n n r r n r T c = = - r - = 0.0705 s . (1.55) j D m a x 0

The coherence time will be used in the simulations to determine how many symbols can be transmitted while the channel remains constant.

Vehicular channel

For the vehicular case, the same parameters are essentially recalculated as for the previous case, but based on the power delay profile for the vehicular environment. These parameters will be used to determine the system performance under this environment. Figure 1.20 shows a representation of the power delay profile. Notice that this profile contains more multipath signals than the previous case (Figure 1.19). Therefore, the vehicular case is a much more severe environment and system performance is expected to be degraded. Using this channel model, the following parameters are obtained.

Maximum delay spread:

Mean excess delay:

RMS delay spread:

r =

Tmax = L708 PS . (1.56)

Efc P(rk)rk

ZkP(Tk) = 0.2396 us . (1.57)

aT = sjr2 - (f)2 = 0.3298 ps . (1.58)

where the second moment of the power delay profile is

f 2 = y(^*2 = 0.1662 ps2 . (1.59)

X,*: P\Tk)


Table 1.1: Channel parameters.

Parameters Indoor to outdoor/pedestrian Vehicular Maximum delay spread (ps) 0.488 1.708 Mean excess delay (ps) 0.0145 0.2396 RMS delay spread (ps) 0.0609 0.3298 Coherence bandwidth (kHz) 328.4 60.64 Coherence time (s) 0.0705 0.0018 Maximum Doppler shift (Hz) 6 240

Coherence bandwidth

Bc-> 1 1

= 60.64 kHz . 50 x aT 50 x 0.3298 x 10"3

Maximum Doppler shift, assuming a vehicular speed of 120 km/h:

120 x 1000 x 2160 x 106

/ Dmax — 3600 x 3 x 108 240 Hz

Coherence time: 0.423 0.423

T c = ^ — = ^rr^- = 0.0018 s I D 240 Table 1.1 summarizes some important parameters of 3GPP channels.

(1.60)

(1.61)

(1.62)

1.6 Parameters for downlink MC-CDMA systems

In order to simulate an MC-CDMA system, we must find the simulation parameters for the system given a channel environment. Since the OFDM system is a special case of the MC-CDMA system with spreading factor SF = 1, simulations of OFDM systems are essential in order to obtain basic parameters for MC-CDMA systems.

As we saw in Section 1.1, the main advantage of OFDM resides in the conversion of a frequency-selective fading channel into a number of flat-fading sub-channels (narrow band signals). As a result, the bandwidth of an individual sub-channel (subcarrier spacing) A / , should be much less than the coherence bandwidth Be to ensure that the channel frequency response within the signal bandwidth is approximately constant, i.e. flat fading [37, 47]. Given a channel bandwidth B = 5 MHz, the FFT sampling rate is set to F s = 5 MHz. In the following, we compute the simulation parameters for the OFDM system for the indoor-to-outdoor/pedestrian and vehicular environments and then apply them to the MC-CDMA system.


1.6.1 Indoor-to-outdoor/pedestrian channel

Considering an FFT length of 64, the subcarrier spacing is

F F, x If)3

A / = ± - ^ - = 78.125 kHz . (1.63)

We can see that the subcarrier spacing is about one quarter of the coherence bandwidth Be = 328.4 kHz, which is sufficient to consider the channel frequency response within the signal bandwidth to be approximately constant. Increasing the FFT length above 64 leads to hardware resource overhead and added system complexity. Therefore, the choice of an FFT length of 64 is appropriate in this case. Hence, the effective symbol duration (FFT interval duration) is

T=-±j = l2.8p,s. (1-64)

In [48], the author states that the guard time interval Tg should be at least 4 times the maximum delay spread r m a i as a rule of thumb. Using this rule, the guard time in this channel environment is Tg = 4 x 488 ns « 2 ps. However, in practical systems, the guard time interval is often taken to be 25% of the effective symbol duration. This duration of the guard interval implies an SNR loss of about 1 dB. For example, the guard interval in the IEEE 802.11a wireless Local Area Network (LAN) standard is 0.8 ps which is 25% of the effective symbol duration of 3.2 ps [3]. Therefore, the guard interval is fixed at 25% of the effective symbol duration. That is

T Tg = - = 3.2 us . (1.65)

The resulting guard interval is more than 4 times the maximum delay spread (rmax = 0.488 us). Thus, the OFDM symbol duration can be expressed as

Ts = T + Tg = 12.8 ps + 3.2 us = 16 us . (1.66)

The symbol rate is equal to the total number of subcarriers divided by the OFDM symbol duration. Because of the insertion of pilot tones into the OFDM symbol, the actual symbol rate can be expressed as

*. = fr> (L67) where Nj is the number of data subcarriers. Then, the maximum bit rate is

Rb = R t log2 M , (1.68)

where M € {4,16,64} is the Quadrature Amplitude Modulation (QAM) modulation level. The occupied bandwidth is defined as the total bandwidth used by


pilot

1 1 1

pilot

1 1 1

DC

I

pilot

1 1 1

pilot

1 1 1

-28 28

Figure 1.21: Data and pilot subcarriers allocation.

the system, i.e. the bandwidth is measured from the left-most pilot subcarrier to the right-most pilot subcarrier as illustrated in Figure 1.21. That is

Bocc = (Nd + Np + 1) x A / , (1.69)

where Np is the number of pilot subcarriers per symbol. Note that we have to count the additional DC subcarrier. Therefore, the bandwidth efficiency can be expressed as

Rb B, eff BD bits/s/Hz (1.70)

Pilot tones are periodically inserted into several dedicated subcarriers of each OFDM symbol (comb-type pilot) as shown in Figure 1.21. The spacing between pilot tones in the frequency domain is denoted by Nf. Given the normalized channel bandwidth TmaxA/, the sampling theorem states that [39]

TmaxAf.N f < 0.5 .

Thus, the pilot spacing is given by

N f < 0.5

A i 13

(1.71)

(1.72)

Table 1.2 shows the simulation parameters for the OFDM system over the indoor-to-outdoor/pedestrian channel environment with pilot tone spacings Nf = 8 and Nf = 12. Table 1.3 shows the bandwidth efficiency of the OFDM system for this channel.

In order to extend to the MC-CDMA system, the spreading factor is assumed to be SF = 8, leading to an FFT length for the MC-CDMA system equal to 512 with the same channel bandwidth BW = 5 MHz and FFT sampling rate F s = 5 MHz as for the OFDM system. The remaining parameters are set to exactly the same values as for the OFDM system above. The parameters for the uncoded MC-CDMA system are summarized in Table 1.4. We can see that the data rate of the MC-CDMA system is around 8 times less than the OFDM system. In contrast, the MC-CDMA system can support up to 8 users while the OFDM system which can support only one user. Table 1.5 shows the bandwidth efficiency for the MC-CDMA system for this channel.


Table 1.2: OFDM simulation parameters for the indoor-to-outdoor/pedestrian environment.

Available bandwidth 5 MHz FFT sampling rate 5 MHz FFT size 64 Effective symbol duration 12.8 pa Guard time duration 3.2 ps OFDM symbol duration 16 ps Subcarrier spacing 78.125 kHz Pilot spacing 8 12 Number of pilot tones 8 6 Number of data subcarriers 48 54 Number of subcarriers 56 60 Occupied bandwidth 4.45 MHz 4.76 MHz Actual symbol rate 3MSps 3.375 MSps

Table 1.3: Bandwidth efficiency of the OFDM system for the indoor-to-outdoor/pedestrian environment.

Pilot spacing 8 12 QPSK 6 Mbps

1.35 bits/s/Hz 6.75 Mbps

1.41 bits/s/Hz 16QAM 12 Mbps


2.82 bits/s/Hz 64QAM 18 Mbps


4.25 bits/s/Hz

1.6.2 Vehicular channel

Likewise, the simulation parameters of the OFDM and MC-CDMA systems for the vehicular channel are calculated in the same way as for the indoor-to-outdoor/pedestrian channel. Given the same channel bandwidth B = 5 MHz, the FFT sampling rate is set to Fs = 5 MHz. Considering an FFT length of 256, the following values are obtained for the parameters.

Subcarrier spacing: A / = § = ^-1*2! = 19.5313 kHz

1 N 256 (1.73)


Table 1.4: MC-CDMA simulation parameters for the indoor-to-outdoor/pedestrian environment.

Available bandwidth 5 MHz FFT sampling rate 5 MHz Spreading factor 8 Spreading codes OVSF codes FFT size 512 Subcarrier spacing 9.765 kHz Effective symbol duration 102.4 ps Guard time duration 25.6 ps MC-CDMA symbol duration 128 ps Pilot spacing 64 94 Number of pilot subcarriers 8 6 Number of data subcarriers 440 464 Number of subcarriers 448 470 Occupied bandwidth 4.38 MHz 4.59 MHz Actual symbol rate 429.6875 kSps 453.125 kSps

Table 1.5: Bandwidth efficiency of the uncoded MC-CDMA system for the indoor-to-outdoor/pedestrian environment.

Pilot spacing 64 94 QPSK 859.375 kbps

0.196 bits/s/Hz 906.25 kbps

0.197 bits/s/Hz 16QAM 1718.75 kbps

0.391 bits/s/Hz 1812.5 kbps

0.394 bits/s/Hz 64QAM 2578.125 kbps

0.587 bits/s/Hz 2718.75 kbps

0.591 bits/s/Hz

We can see that the subcarrier spacing is smaller than the vehicular channel coherence bandwidth Be = 60.64 kHz. Therefore, the symbol duration is given by

T = — = 51.2 us

Guard interval: T 4~

Tg = - = 12.8 us .

OFDM symbol duration:

T, = T + Tg = 51.2 us + 12.8 ps = 64 ps

(1.74)

(1.75)

(1.76)


Table 1.6: OFDM simulation parameters for the vehicular environment.

Available bandwidth 5 MHz FFT sampling rate 5 MHz

FFT size 256 Effective symbol duration 51.2 ps

Guard time duration 12.8 ^s OFDM symbol duration 64 ps

Subcarrier spacing 19.5313 kHz Pilot spacing 8 12

Number of pilot tones 32 22 Number of data subcarriers 216 230

Number of subcarriers 248 252 Occupied bandwidth 4.86 MHz 4.94 MHz

Actual symbol rate 3.375 MSps 3.59 MSps

Table 1.7: Bandwidth efficiency of the OFDM system for the vehicular environment.

Pilot spacing 8 12 QPSK 6.75 Mbps


1.45 bits/s/Hz 16QAM 13.5 Mbps


2.9 bits/s/Hz 64QAM 20.25 Mbps


4.35 bits/s/Hz

Pilot tone spacing:

N f < 0.5

CA/ 15 . (1.77)

Table 1.6 summarizes the simulation parameters for the OFDM system over the vehicular environment with pilot tone spacings Nf = 8 and Nf = 12. These pilot tone spacings are used in order to compare the performance of this system with the previous case (indoor-to-outdoor/pedestrian). Table 1.7 shows the bandwidth efficiency of the OFDM system for this channel. Applying a spreading factor SF = 8, B = 5 MHz and FFT sampling rate F s = 5 MHz for the MC-CDMA system, the FFT length of the MC-CDMA system in this case is equal to 256 x 8 = 2048. The parameters for this MC-CDMA system are summarized in Table 1.8. Bandwidth efficiency of the system with various modulation schemes are detailed in Table 1.9.


Table 1.8: MC-CDMA simulation parameters for the vehicular environment.

Available bandwidth 5 MHz FFT sampling rate 5 MHz

Spreading codes OVSF codes FFT size 2048

Subcarrier spacing 2.4414 kHz Effective symbol duration 409.6 us

Guard time duration 102.4 ^s MC-CDMA symbol duration 512 ps

Pilot spacing 64 94 Number of pilot subcarriers 32 22 Number of data subcarriers 1952 1952

Number of subcarriers 1984 1974 Occupied bandwidth 4.85 MHz 4.77 MHz

Actual symbol rate 476.5625 kSps 476.5625 kSps

Table 1.9: Bandwidth efficiency of the MC-CDMA system for the vehicular environment.

Pilot spacing 64 94 QPSK 953.125 kbps

0.196 bits/s/Hz 953.125 kbps 0.2 bits/s/Hz

16QAM 1906.25 kbps 0.393 bits/s/Hz

1906.25 kbps 0.4 bits/s/Hz

64QAM 2859.375 kbps 0.59 bits/s/Hz

2859.375 kbps 0.6 bits/s/Hz


1.7 Conclusion

In this chapter, OFDM, CDMA and MC-CDMA techniques and their related issues were reviewed. Proposed design parameters for MC-CDMA systems with specific channel models, i.e. 3GPP indoor-to-outdoor/pedestrian and vehicular, were computed. In the scope of this thesis, the proposed system uses a spreading code length of 8, which supports a maximum of 8 users over a 5 MHz channel bandwidth. The following chapters will use these parameters for further study on the MC-CDMA system.

Chapter 2

Synchronization and channel estimation in MC-CDMA system

2.1 Introduction

This chapter presents timing/frequency synchronization techniques and channel estimation methods for our MC-CDMA system. Timing/frequency synchronization is usually classified into two types: pre-FFT and post-FFT synchronization. Pre-FFT estimates the timing/frequency offset before FFT demodulation. This synchronization process is in the time domain and uses the preamble or cyclic prefix (guard interval). Synchronization techniques based on the cyclic prefix exploit the periodicity of the cyclic prefix in OFDM signal in order to find Maximum Likelihood (ML) estimations of time and frequency offsets as presented in [15-17, 19-22, 24]. The advantages of cyclic-prefix-based synchronization is that there are no additional training data symbols, thus leading to transmission efficiency. The downside of cyclic-prefix-based synchronization is that the cyclic prefix is usually corrupted by ISI. As a result, the estimation accuracy is degraded in large delay spread multipath channels. In contrast, preamble-based synchronization techniques require training data at the beginning of each data frame [4, 11-14, 18, 23]. Thus, this reduces transmission efficiency but provides better estimation accuracy than cyclic-prefix-based methods. Post-FFT synchronization estimates the timing/frequency offset after FFT demodulation, sometimes referred to as frequency domain synchronization. Authors in [49, 50] proposed an efficient technique to estimate timing/frequency offset information from the pilot symbols. They presented a timing offset estimator based on the time shifting property of the Fourier transform while the frequency offset was estimated by correlating the received pilot tones with a time-shifted version of the

Chapter 2. Synchronization and channel estimation in MC-CDMA system 40

transmitted pilot tones.

Beside synchronization issues, channel estimation techniques are also considered in this chapter. It is well known that a decision-directed process is widely used for LS channel estimates in order to improve estimation accuracy [40, 41]. Zhu in [40] applied a decision-directed process to the LS channel estimates to blindly select virtual pilot tones to further improve the estimation performance over a static channel. Wei in [41] used a decision-directed process in conjunction with iterative transform-domain-based LS channel estimation. Edfors in [29] proposed the Linear Minimum Mean Square Error (LMMSE) channel estimator for OFDM systems using the frequency channel correlation of the channel. They stated that the LMMSE estimator could be applied for Pilot-Symbol-Assisted Modulation (PSAM) OFDM systems. Golovins in [27] described in more details the LMMSE estimator for PSAM OFDM systems. Edfors also introduced an optimal low-rank approximation using Singular-Value Decomposition (SVD) but he did not discuss the complexity of the SVD algorithm [29]. Partitioning the channel correlation matrix into small sub-matrices in order to reduce the complexity of the LMMSE estimator was studied in [28-30]. Noh in [28] investigated non-overlap and overlap techniques for partitioning the channel correlation matrix into small sub-matrices. Mehlfiihrer in [30] used only one small sub-matrix in order to realize the approximate LMMSE estimator. Hsieh in [25] studied a simplified LMMSE estimator combine with conventional interpolation methods.

Based on the above discussion, a proposed method to design low Peak-to-Average Power Ratio (PAPR) short training sequences for our MC-CDMA system using Zadoff-Chu (ZC) sequences is presented in this chapter. The preamble structure in the system is similar to the typical WLAN system [51]. This preamble structure naturally leads to a new joint pre-FFT coarse timing and fractional Carrier Frequency Offset (CFO) synchronization method for our system. Low-complexity frequency-domain fine timing and integer CFO synchronization methods for OFDM systems are also adopted [49, 50]. In order to perform closed-loop frequency tracking, a classical Proportional-Integral (PI) loop filter design is also presented based on a Digital Phase-Locked-Loop (DPLL) in [52]. For channel estimation, there exist many avenues to adapt channel estimation techniques from OFDM systems to MC-CDMA systems. As a result, three channel estimation techniques aided by a decision-directed process are presented in this chapter. The first technique uses the selection of so-called virtual pilots for channel estimation, the second technique is based on the iterative transform domain, while the third is based on a low-complexity overlap LMMSE estimator.

Chapter 2. Synchronization and channel estimation in MCCDMA system 41

< Ts ►<-

Preamble (1 symbol)

Data (5 symbols)

', ' 2 h t* h h h t, <9 ' lO

Figure 2.1: Simple frame format.

2.2 Synchronization issues in MCCDMA systems

2.2.1 Preamble design for downlink MCCDMA system

The proposed preamble for downlink MCCDMA consists of only one MCCDMA sym

bol. This is partitioned into ten short symbols similar to the short training symbol structure in 802.11a. The only difference between the proposed preamble and the preamble structure in 802.11a is a long training sequence. A long training sequence is not required because the channel estimation, the fine timing estimation and the integer CFO estimation are performed in the frequency domain. The preamble is periodically transmitted in the time domain in order to help the receiver detect the transmit frame, establish Automatic Gain Control (AGC), and estimate the fractional CFO as illus

trated in Figure 2.1. In this figure, a simple frame format consists of a training symbol (preamble) and 5 data symbols. Ts denotes the duration of an MCCDMA symbol and t\ to t\o are the repeated short training symbols within the preamble duration. Each short training symbol 5train consists of Ns samples and satisfies the following cri

teria: good correlation and low PAPR, sometimes referred to as crest factor (2.1 dB in typical of WLANs [51]). An efficient method to design the training sequences for MCCDMA systems basedon ZC sequences is embodied in the following algorithm:

Require : Total data and pilot subcarriers Nc

1: Find the set of first prime numbers ft of 2 ^ l o s ^ N ^ + 1

2: Find the largest prime number N^Q such that N ^ Q < Nc

3: for i = 0 to length(Q) 1 do 4: Ui < t l i

5: Compute u\h root ZC sequence of length Nc


Real

100 200 300 400 Sample

500 600

Figure 2.2: Low PAPR training symbol with Nc = 448, u = 197 and N z c = 443.

6 7 8 9

10 11 12

Xi(n) ZJIiiiililDl vzc

0 < n < Nc - 1 Zero padding at the edges of Xi to the desired FFT length N xx <- IFFT(X i) Strain *~ The first Ns samples of xx

Pi *~ [Strain) Straini •••! Strain] (10 times) PAPR, <- max{|p t |2}/Tôfe E ) ^ - 1 \Pl(j)\2

end for Return pt\ min{PAPRj}

This method allows designers to make a low PAPR preamble for any general multi-carrier communication systems. For example, given the receiver configuration for the indoor-to-outdoor/pedestrian channel with Nc = 448, the algorithm provides a training sequence with PAPR = 2.44 dB at u = 197 and Azc = 443. The resulting training symbol length of 10Ar

s = 640 samples is illustrated in Figure 2.2.

2.2.2 Coarse timing synchronization

In the MC-CDMA system, the timing offset is decomposed into coarse and fine timing offset. The timing offset is given by ;

An = AncoarSe + An fine (2.1)


where AncoarSe and Annne are true coarse timing offset and real fine timing offset, respectively. The coarse timing detection for the proposed system derives from a high accuracy coarse frame detection technique for WLAN systems proposed in [11, 12]. Training symbol-based coarse frame detection detects the frame boundary using convolution between the received samples and one short training symbol. The timing metric Mi is computed by convolution of the received sequence with the known training symbol Z of length Ns. The metric Mi is given by

Mi(n) = £ > ( n + k)S;tain(k) , V n € (-oo, oo) , (2.2) fc=0

where y(n) is the n t h received sample, Strain(^) is the kth sample in one short training symbol, Ns is the total number of samples in one short training symbol, and (•)* denotes complex conjugate.

To detect the frame boundary, a moving sum is performed on the output of the convolution output (2.2) over 10 consecutive repeat sequences Z from t\ to tw in order to obtain a second timing metric M2 in the following

9 M2(n) = Y \ M i ( n - k N a ) \ , V n € ( - o o , o o ) . (2.3)

fc=0

Finally, let An be the estimated coarse timing offset. The coarse timing offset is given by finding the peak of metric M2

AnCOarse = arg max {|M2(n) |} . (2.4)

2.2.3 Fractional carrier frequency offset estimation

There are two types of CFO estimators: fractional CFO and integer CFO. The CFO is defined as follows:

£ = £int + £frac , (2-5)

where eint and £frac are the integer and fractional CFO, respectively. The fractional CFO estimator can track a small change of CFO in fractional value of the subcarrier spacing, while the integer CFO estimator can only track the CFO in multiple values of subcarrier spacing. When the fractional CFO, efrac, is present in the system, each received sample y(n) is affected by the same amount of CFO. As a result, a phase error at the n t h sample is a function of £frac and the sample index n [10]. This is given by

#(efrac,n) = 27refracn7; , (2.6)


where Ts is the sampling period. The phase difference between y(n) and its delay y(n + Nd) is also given by a function \17 of £frac and the time delay Nd- That is

*(£&«, Né) = 27refracAdTs . (2.7)

Averaging the result is often used in order to accurately estimate the phase difference. An averaging over 4 short training symbols is suggested to further improve the estimation accuracy [14]. Hence, the average auto-correlation is given by

4 N S - 1

M i ( n ) = £ y(n + % ' ( n + * + AT,) , V n e ( -00 ,00 ) . (2.8) k=0

The phase difference is given by

V(e tnc,Na) = /.Mi(n) = 27r£fracAr

sTi

N, (2.9)

= 27T£fl A A / '

where L denotes the angle of a complex number, N and A / are the FFT size and the subcarrier spacing, respectively. The frequency offset estimate at the estimated coarse timing instance is given by

fffrac = r—rf ZMi(An c o a r s e ) . (2.10) ITTNS

Because we will implement the receiver with the indoor-to-outdoor/pedestrian configuration, the following example will be presented for this configuration. The detectable frequency offset range for this configuration with N = 512 and Na = 64 is derived as

- 4 A / < £frac < 4 A / . (2.11)

Roy in [11] proposed an efficient CFO estimation method by extracting the CFO information in (2.3). His method not only reduces the complexity but also produces an accurate estimation of the fractional CFO range within ± A / . This leads to a novel approach to estimate wider fractional CFO range by exploiting the first timing metric in (2.2). In fact, the CFO information can be found at every Mi(kNa), where k = 0 ,1 , . . . , 9. As a result, the phase difference of the metric Mi between the coarse timing estimate instance Ancoarse and the Na previous samples is given by

* ( £ f r a c , ^ ) = Z M i ( A n c o a r s e ) M 1 * ( A n c o a r s e - N a )

— 27r£iracArsTj, (2.12) N .

= 27T£ f r a c-N A f


The fractional CFO estimate is given by

êfrac = —1/zMi(AncoarSe)M1*(Ancoarse - Na) , (2.13) 2nNa

Inserting N = 512 and Na = 64 for the indoor-to-outdoor/pedestrian configuration into (2.13), the detectable fractional CFO range is

- 4 A / < £frac < 4 A / . (2.14)

It is clearly seen that the proposed method estimates the same detectable CFO range as in (2.11) without an additional average auto-correlation as in [14].

2.2.4 Fine timing offset synchronization

The fine timing synchronizer works in the frequency domain by finding the phase difference between two adjacent pilot subcarriers as proposed by the authors in [49]. This method is based on the time shifting property of the Fourier transform. Let Anfine

be the remaining timing offset left by the coarse timing synchronizer. Then the data at the kth subcarrier in the frequency domain will be multiplied by an amount of exp (—j27rfcA^fiae j . As a result, the phase difference of the channel frequency responses between two adjacent pilot subcarriers for the i th symbol is given by

G = HmNf<iH^m_1-)Nfi , m = 1,..., N p - 1 (2-15)

where Np is the total number of pilot subcarriers and Nf is the pilot tone spacing. However, averaging over several pilot subcarriers is performed in order to accurately estimate the fine timing offset, i.e.

l i V p - l

G - w" Y HrnN},iH*m-l)Nf,i • (2-16) J V P m = l

The resulting fine timing offset is given by

A ^ = d V G < (2-i7) where Z denotes the angle of a complex number and N is the FFT size. It is easily seen that the timing tracking range is up to ±4 samples with FFT size N = 512 and pilot spacing Nf = 64 for the indoor-to-outdoor/pedestrian receiver configuration.


2.2.5 Integer CFO synchronization

The purpose of the integer CFO estimator is to find the CFO in multiple subcarriers spacing so that the receiver can estimate coarse frequency offset. The integer CFO runs only once when the receiver starts up to acquire the transmitted signal from the transmitter. The integer CFO estimator performs multiple frequency domain correlations between the received pilot symbols with the shifted version of transmitted pilot tones similar to the method in [50]. Known transmitted pilot tones at the receiver are defined as

P=[P 0 ,0 ,P 1 , . . . ) P J V p _ 1 ,0 ,P J v p ] , (2.18)

where P m = XmN f, m = 0 ,1 , Np — 1 is the m t h transmitted pilot tone at the (mNf) th

subcarrier, Nf is the pilot spacing, 0 represents Nf — 1 data subcarriers, and Np is the number of total pilot subcarriers per symbol. The sum of the complex conjugate cross-correlations of the two adjacent pilot subcarriers for the i th symbol is given by

N d + N p - 1

G(n) = Yl (Yk,iPk+n,i){Yk+Nf,iPk+N,+n,i)* , (2-19) *:=0

where n € (—£int, £int) is the shift index for the known transmitted pilot vector, Né is the number of data subcarriers, Ykti is the kth data subcarrier for the i th symbol. A magnitude comparison is performed to find the maximum of the metric G(k) at the end of each correlation process. Therefore, the integer CFO is estimated by

£int = argmax{|G(n)|2} . (2.20)

2.2.6 Proportional-integral loop filter

Figure 2.3 shows the loop filter structure which is based on the classical first-order loop filter in [52]. Its transfer function is given by

1 — z~y

where Kp and Ki are proportional gain and integral gain, respectively. The transfer function of the loop filter H(z) itself is not stable because of a pole on the unit circle in the z plane. However, the loop filter is used as a component of a closed-loop frequency offset correction circuit as shown in Figure 2.4. The corresponding linearized equivalent model of this closed-loop is illustrated in Figure 2.5. The overall transfer function is then

G(z\ - M _ U ^ (2 22) G { Z ) ~ *(z) ~ 1 + U(z) ' • ( 2 - 2 2 )

.


Kp

Input 11

0 ■e

Ki

Output

Figure 2.3: Structure of firstorder digital loop filter.

where $(z), $(2) are the z transform of the input phase $(n) and the estimated phase $(n) at the n t h sample, respectively. The input phase at the n t h sample is given by

$(n) = 2nebacnTs = AunT s , (2.23)

where £frac is the frequency offset and Ts is the sampling period, and

U(z) = K(z)H(z) . (2.24)

K(z) = 1 l z_i is the transfer function of the phase accumulator. The loop transfer function in Figure 2.5 is also represented as

&$(z) R(z) = 1

(2.25) $(z) 1 + U(z)

where A$(z) is the steadyphase error due to the input phase $(2). Inserting K(z) and H(z) into (2.22) and (2.25), these transfer functions are rewritten as follows

Kp(z 1 ) + Ki

and

G(z)

R(z) =

(z l )2 + Kp(z 1 ) + K i '

(z l )2

(z l ) 2 + Kp(z 1 ) + K i ' From (2.27), the steadyphase error response is derived as

(z l ) 2

A$(z) (z l ) 2 + Kp(z 1) + Ki *(4,

where $(2) = Àû/ZVfpny As a result, the steadyphase error becomes

A$(z) = AUJTZ

(z l )2 + Kp(z 1 ) + K i '

(2.26)

(2.27)

(2.28)

(2.29)


Input Output

t .

■ r

Phase Accumulator Phase Error Estimator

> i .

Loop Filter Loop Filter

Figure 2.4: Simplified closedloop frequency offset correction diagram.

<D(«) + S " A O i »

Input V . i

)

Output

K(z) H(z) K(z) H(z)

Figure 2.5: Linearized closedloop frequency offset correction diagram.

Applying the final value theorem of the z transform to A$(z) [53], we get that the steadystate phase error is zero as follows

A $ s s = lim (z l)A$(z) z—> 1

= lim AuT(z 1)2

*> i (z l ) 2 + Kp(z 1) + Ki 0 .

(2.30)

To determine the range of stability for the overall loop, the poles of G(z) must be inside the unit circle in the z plane. The condition for stability is derived directly from [52], i.e.

2KP 4 < Ki < Kp , K i > 0. (2.31)

The coefficients Kp and Ki are similar to C2 and C\ in [52], which are defined as

Kp = 2nujnTa , (2.32)


and

* = S , (2.33)

where ujn = zirfn is the natural frequency (uinTs « 1) and 0 < n < 1 is the damping factor.

Each Kp and K t parameters have a unique impact on the system response: settling time and tracking performance of the frequency loop. Large proportional Kp gain adjustments usually result in faster responses. But too high a Kp value result in the system overshooting and maintaining a steady oscillating error. In contrast, the integral gain Ki determines how quickly the system reaches a desired steady-state. If Ki is too high, the system can overshoot, also resulting in oscillations. In order to obtain a good response, the Kp and Ki parameters must be well tuned based on the specific system under control. Hence, making the parameters digitally programmable is the simplest way to tune the response of the system. Furthermore, frequency resolution of the phase accumulator which drives the phase derotator also impacts the response of the system. If the frequency resolution is too high, it results in a steady oscillating error. Thus, the frequency resolution must be carefully selected. For example, given the sampling rate at 5 MHz and a 23-bit phase accumulator, the resulting frequency resolution is 0.596 Hz. With this frequency resolution, the phase derotator can accurately compensate the CFO from the incoming samples.

2.3 Decision-directed channel estimation for downlink MC-CDMA

2.3.1 System description

The block diagram for the MC-CDMA system is shown in Figure 2.6. The data bit stream for the desired user is quadrature modulated and then spread to Né subcarriers in the frequency domain using an Orthogonal Variable Spreading Factor (OVSF) sequence of length L. Np BPSK modulated pilot subcarriers are uniformly inserted among the spread symbols for channel estimation at the receiver. The Nc = Na + Np spread symbols and pilots are modulated using OFDM [43]. A cyclic prefix of length Ncp is appended at the beginning of the time domain symbol to combat ISI and ICI caused by multi-path fading.

At the receiver, after removing the cyclic prefix and OFDM demodulation, the


Bit in Modulator Spread Modulator Spread

Pilot Pilot

Bit out

IFFT P/S

Demodulator «— Despread al ' D Demodulator «— Despread al '

r E M U X

1 1

r E M U X

Channel estimator 4 — Pilot

FFT S/P

Insert cyclic prefix

i .

Multipath channel

Remove cyclic prefix «

Figure 2.6: Downlink MC-CDMA block diagram.

received symbol on the kth subcarrier is represented by1

Yk = X k H k + W k , k = 0 , l , . . . , N c - l , (2.34)

where Xk , Hk and Wk are the transmitted symbol, the channel frequency response and the complex zero-mean additive Gaussian noise with variance o\ at the A;th subcarrier, respectively. The channel frequency response at pilot subcarriers is given by

I m N r H m N, =

x m N i ' d - , m = 0 , l , . . . , N p - l (2.35)

where XmNf is the known pilot symbol at the (m/V/)th subcarrier and Nf is the spacing between two consecutive pilot subcarriers. The (mNf + l) t h data subcarrier is estimated using a low-pass sine interpolation method [54]

HmNf+l — Y fl,dH(m+é)Nf à— IP-1)/2J

(2.36)

where |_-J is the floor function, I = 1,2,..., Nf — 1 is the data subcarrier index between two consecutive pilot subcarriers and D is the filter length. The sine interpolation coefficient / ^ is computed using the following [54]

fi4 = sinc ( — + dj .

where d = - [ ( D - 1)/2J ,..., LD/2J and I = 1,2,..., N j - 1.

(2.37)

A conventional zero forcing (ZF) equalization method is applied to suppress the channel effects on the N received data subcarriers. The estimated transmit vector is

1For the sake of simplicity, the symbol time index i is ignored without changing the meaning of the following equations.


Rx [D E M U X

Demodulator Remove

cyclic prefix S/P - » ■ FFT

[D E M U X

Demodulator Remove

cyclic prefix S/P - » ■ FFT

[D E M U X

i . 'r

[D E M U X

\

i

Modulator Pilot

i

Modulator Pilot

i

»

•

Modulator Pilot

i

n •

i

■ •

Channel estimator

Virtual pilot selector « Spread Channel

estimator Virtual pilot

selector « Spread

H i '

Delay Equalizer Despread — > Demodulator Delay Equalizer Despread — > Demodulator

Figure 2.7: Proposed decisiondirected virtual pilot channel estimator.

given by X = Y 0 H ,

where 0 is the Hadamard (elementwise) division.

(2.38)

Frequencydomain despreading and combining are performed on the equalized data symbols using the desired user spreading code. Finally, the desired user data bit is obtained by demapping the Grayencoded data symbols into data bits.

2.3.2 Decisiondirected virtual pilotbased channel estimation

This method was originally used for OFDM systems [40]. The idea is to find the virtual pilot subcarriers blindly in order to improve the channel estimation quality by using both real pilots and virtual pilots at the interpolation stage. However, it is possible to adapt this approach to our proposed downlink MCCDMA system. The only difference between the proposed method and the method in [40] is that the virtual pilots must be selected on the respread recovered data symbols prior to interpolation in order to obtain the desired channel response H as illustrated in Figure 2.7. This proposed channel estimation algorithm for downlink MCCDMA systems can be summarized in the following. Require : Initial channel response vector H.

1: for each fcth subcarrier do 2: Perform ZF equalization to obtain Xk using (2.38). 3: Despread Xk using the desired user spreading code and demap symbol to bits.


Rx Remove cyclic prefix S/P FFT

♦I Equalizer

HZ Pilot Channel

estimator

♦ Despread > Demodulator

IFFT Filter FFT

H

Modulator

Refine channel

estimation Spread

L> Delay

\H

Equalizer Despread Demodulator Equalizer Despread Demodulator

5:

Figure 2.8: Proposed iterative transform domain channel estimator.

Remap decoded bits into symbol and then respread using the desired user spread

ing code to obtain the remodulated data Xk . Compute the Euclidean distance between Xk and Xk at the kth data subcarrier. The distance is given by

Afc = ^ [Xk X k ] 2 + 3 [Xk X k } 2 , (2.39)

where K[] and 9f[] are the real and imaginary parts of a complex number, re

spectively. 6: Select the A;th data subcarrier to be a virtual pilot subcarrier if its Euclidean

distance A*, is less than or equal to a predefined threshold d. The predefined threshold is determined by simulations, i.e. given a target SNR, the system has to be simulated for various values of 6 in order to find the lowest BER.

7: end for 8: Refine the channel estimation by interpolating other data subcarriers using both

real and virtual pilot subcarriers to obtain the final channel response vector H. 9: r e t u r n H.

2.3.3 Decisiondirected iterative transform domain channel es

timation

This channel estimation technique is adapted from the transform domain processing technique for OFDM systems in [38, 41, 55]. The decisiondirected iterative LS channel estimation approach in [41] is also adapted to the proposed downlink MCCDMA system with some modifications. The initial channel information H is obtained by (2.38). The remodulated data symbols must be respread in the frequency domain using the same


spreading code as the code used in the transmitter as illustrated in the shaded blocks of Figure 2.8. The refined decision-directed channel information is given by

F£ = X 0 X , (2.40)

where X is the respread data symbol vector and 0 is the Hadamard (element-wise) division. The refined channel response H is then used by an iterative transform domain estimation technique in order to estimate the final channel response vector H. The iterative algorithm is applied to the refined channel response similar to the algorithm in [55]. Require : Refined channel response vector H.

1: for each kth subcarrier do 2: Perform an Appoint FFT on Hk to obtain the transform domain Gk, i.e.

N-l ( 2nkl\ G k = E ^ e x p f - ; — j , fc = 0 , l , . . . , A - l . (2.41)

3: Perform low-pass filtering on Gk with "cut-off" frequency kc chosen so that the samples in the "high frequency" region are zero, i.e.

G k = { G k 0 < k < k c - l 0 k c - l < k < N - k c - l (2.42) G k i N - k c < k < N - l .

4: Perform an A'-point IFFT on Gk to convert back to the frequency domain, i.e

N ~ 1 - / 27TÀ-A

tffc=£Gtexp(j-—J, k = 0 , l , . . . ,N- l (2.43)

The channel frequency response at the pilot subcarrier must be replaced by the initial channel frequency response at the pilot subcarriers in H.

5: Stop if the maximum absolute different between Hk and Hk is below a pre-defined threshold Ô, i.e.

m a x \ H k - H k \ < 8 . (2.44)

Otherwise, repeat to step (2) and take Hk as the new initial channel frequency response.

6: end for 7: r e t u r n H.

•


2.3.4 Decision-directed overlap LMMSE channel estimator

Conventional LMMSE channel estimator

The LMMSE channel estimator minimizes the Mean-Square Error (MSE) between the actual and the estimated channel. This is obtained by applying the Wiener-Hopf equation [28]

HLMMSE = R H Y R - Y Y " Y , (2.45)

where Y is the received symbol vector. The cross-correlation matrix R H Y is given by

R H Y = E {HYt} = R H H X t , (2.46)

where (•)* denotes transpose conjugate. The auto-correlation of the received signal Y is

R Y Y = E {YY+} = X R H H X t + a2nI , (2.47)

where R H H = E {HH* j is the channel auto-correlation matrix and I is the identity matrix. Inserting (2.46) and (2.47) into (2.45), the Wiener-Hopf solution is expressed as [29]

HLMMSE = R H H RHH + ^(XX t)" - i

H

^ S N R 1 '

(2.48)

where A = E {\Xk\2} E {\1/Xk\2} is a constant depending on the modulation type and SNR is the target signal-to-noise ratio.

Proposed channel estimator

An application of decision-directed overlapped LMMSE channel estimation technique for a downlink MC-CDMA receiver over the indoor-to-outdoor/pedestrian channel is proposed. The proposed technique computes the LS channel estimation by interpolating channel responses from dedicated pilot subcarriers instead of using a dedicated pilot symbol (block-type pilot) as in a conventional LMMSE estimator. Thus, it reduces the transmission overhead in multicarrier systems. With the aid of a decision-directed process, the following overlapped LMMSE estimator exploits not just the low-complexity but also improves the BER performance.


Rx Remove cyclic prefix -*■ S/P - * FFT

D E M U X

FFT out Demodulator

D E M U X

Demodulator D E M U X

*t Bit in I

D E M U X -» Pilot Channel

estimator Modulator

D E M U X -» Pilot Channel

estimator Modulator

' ' 1 -LMMSE

estimator H Refine

channel Spread LMMSE estimator

Refine channel Spread

, , " IMMSE

Delay -t Despread a- Demodulator Delay -t Despread a- Demodulator

Figure 2.9: Proposed receiver with decisiondirected LMMSE channel estimator.

The LMMSE estimator requires knowledge of the target SNR and the channel frequency autocorrelation matrix R H H In practice, the target SNR and the auto

correlation matrix are known beforehand, therefore the matrix R H H ( R H H + SNRI)

needs to be calculated only once [29]. A uniform channel powerdelay profile is assumed in this system. The elements of the channel autocorrelation matrix R H H are expressed as [29]:

' lexp(j27rAf c pa^)

ftm,n — j 2 i r N , C P ~ N ~

if m 7 n (2.49)

if m = n where m, n = 0, ...,NC — 1 and A cp is the cyclic prefix length. The block diagram of the proposed receiver is illustrated in Figure 2.9. The initial channel information H is obtained by interpolating the channel frequency response at pilot subcarriers using a lowpass FIR interpolation technique. The remodulated data symbols must be spread in the frequency domain using the same spreading code as the code used at the transmitter. The refined decisiondirected channel vector H in (2.40) is partitioned into small sub

vectors H = [ H r , H ^ , . . . , H l ] T , (2.50)

where []T denotes transpose, H m = \Him^i)+i, . . . ,Himi)+s\ is the m t h subvector of the refined channel information H, m = 1,..., Z, Z = N c /S is the subband index and S is the subvector length and is such that [28]

S = Be A /

(2.51)

where Be is the coherence bandwidth and A / is the spacing between subcarriers.

The correlation components outside the coherence bandwidth are relatively low. Therefore, the elements outside the diagonal elements of the channel autocorrelation


■* S ►*- -N-S-

! s/2

S/2

Overlap region

N

Figure 2.10: Decomposition of channel autocorrelation matrix R H H by the overlap technique.

matrix have little effect on the performance of the estimator, and they can be safely ignored [28]. As a result, the channel autocorrelation matrix is reduced to an approx

imate form " R H I H I 0 • • • 0

0 R H m H m ••• 0 R H H — (2.52)

0 0 • • • R H Z H Z

where RnmHm is an S x S diagonal submatrix of R H H [28]. Authors in [28] showed that this approximation suffers from an increased MSE at the edges of each subband by partitioning of the refined channel vector in (2.50), because the subcarriers at the edges use less correlated channel statistics than other subcarriers. Therefore, the authors proposed an overlap technique where the channel vector in (2.50) is partitioned into overlapping subbands with an overlap length of 5/2. As a result, the number of sub

bands is increased from Z = N c / S to Z = 2NC/S — 1. The resulting submatrices in the approximate channel autocorrelation matrix are overlapped with an overlap length of 5/2, as illustrated in Figure 2.10. The m t h LMMSE channel estimation vector is given by

H LMMSE = R H m H j ! R H m H r + 3

I SNR

i

H r (2.53)


The LMMSE channel estimation vector of part A is given by

n 1 H LMMSE A,m = H LMMSE

m = 2 + 1, (2.54)

where 1 < n < Nc. The LMMSE channel estimation vector of part B is given by

H LMMSE B,m = H LMMSE m = 2

n 1 1 (2.55)

where 5/2 + 1 < n < Nc — S/2. Since the submatrix R H m n m is independent of the subband index m, the filter matrix RH mHm (RHmHm + SNR^S)

m (253) needs to be calculated only once [30]. The final LMMSE channel vector is obtained by combining the LMMSE vectors of part A and part B in the following

T T L M M S E overlap — \ û T H

— [ n L M M S E , l > n LMMSE.2 , . . . ,H T LMMSE.Z

where

HLMMSE A,m[l '■ 4 5]

H LMMSE.m = <

m = 1 HLMMSE B.mtj5 + 1 : j5] , m = 2,4, HLMMSE k,m\iS + 1 : jS] , m = 3,5, HLMMSE A,»ni4*5' + 1 : S], m = Z

Z Z

(2.56)

(2.57)

2.4 Conclusion

In this chapter, a low PAPR training sequence with a structure typical of WLANs (e.g. 802.11a) was designed using the wellknown ZadoffChu sequences [42]. A joint coarse timing and fractional CFO estimation method was also presented based on this preamble structure. The proposed coarse timing estimator allows accurate detection of the frame boundary while the fractional CFO estimator can estimate a wide range of frequency offset up to ±4 subcarrier spacings and requires no additional complexity compared to the one in [14]. Additional fine timing and integer CFO estimators were also adapted from the pilotbased frequencydomain timing and frequency estimator for OFDM systems in order to improve the performance of the system. Three techniques for decisiondirected pilotbased channel estimation for downlink MCCDMA systems were also presented in this chapter. The first technique uses the selection of the vir

tual pilots for the channel estimation, the second technique is based on the iterative transform domain while the third is based on a lowcomplexity overlap LMMSE estima

tor. Computer simulation and performance analysis of the proposed timing/frequency synchronization scheme and the three channel estimation methods will be presented in Chapter 5.

Chapter 3

MC-CDMA downlink receiver implementation

3.1 Target FPGA platform and design partitioning

For the hardware implementation of our MC-CDMA system, we have used a development kit manufactured by Nallatech. The XtremeDSP development kit features three Xilinx FPGAs: a Virtex-4 User FPGA, a Virtex-II FPGA for clock management and a Spartan-II Interface FPGA. Figure 3.1 illustrates the FPGAs and other components and their basic interactions with one another. The User FPGA and Clock FPGA can be configured via USB/PCI interfacing. More information about the Xtreme Digital Signal Processing (DSP) development kit is available in the manufacturer's datasheet.

We partitioned the design in the User FPGA into several modules which are graphically shown in Figure 3.2. A host interface communication logic module performs the data exchange between the host computer and other modules within the user design. The Clock manager manages various clock sources from external oscillators and feedback clock from the Clock FPGA. The Reset manager manages both synchronous and asynchronous reset signals from multiple sources. The clock and reset manager modules are detailed in Figure 3.3. The receiver module consists of multiple sub-modules which are mapped to appropriate registers address and will be explained in more details later.

In Figure 3.3, the clock source for the receiver design is SYS.CLK, which deskews the internal feedback clock CLK3.FB from the Clock FPGA in order to allow data going to and from the User FPGA to be clocked on the same clock edge as the data

■

Chapter 3. MCCDMA downlink receiver implementation 59

Spartan-It interface FPGA

/ \ CrytMlarfaMnwl j " — ■ *

2 p l n U M r H N d W

01*)

A i f M w i t Out St * (AOJO UT)

Vt r tu -4 (XC4VSX3S-I0FF668)

Main U » r FPGA

Ad|icent Com mi P-

In But LinkO MM

"ft ft"

V V l l W t l î M H t P C I

IntaHtta USB Intarfac*

i ^z Ad()f . [0: IT1 P-UnkO CHfhal I/O Digital WO

M C X t Clock Input

V i r t e x - l l ( X C 2 V 8 0 - 4 C S 1 4 4 ) U s a r C l o c k F P G A

i T t i t

(TTAG * RS232)

~y SZ.

S> —r*—

ADC (MCX Input.)

OAC (MCXOotpoti)

l««rJhMVrTAG \ é H*Mtor

• Note th . t clock C it NOT initM*y *» iUb lc in dvc Kit. It i> . . « k i t to «Bow 1—M to popwh.» dtM> www o y i t e h » tmfàmS, ■

KEY «*> CoMMCttdblM

^ • • ■ " V Intor^FPGA Clock not»

< > Si f tub pndonr imml r u M c k t M l wMi M M gOMral kit. U> JT AC K I H I .

<" > U««r M i ™ h port or in whofc lUOckMd with ■' thoFPCAi .

Figure 3.1: Block diagram of the Xtreme DSP development kit [6].

Clock manager DACs control logic ADCs control logic DACs control logic ADCs control logic

Reset manager

DACs control logic ADCs control logic

Reset manager Reset manager

Receiver Receiver

Register map

Receiver Receiver

Host interface logic

Receiver

User FPGA XC4VSX35F688

User software (host PC)

Figure 3.2: The partition of the design in the User FPGA.

Chapter 3. MC-CDMA downhnk receiver implementation GO

Clock FPGA (XC2V80)

Programmable oscillator

CLKA

DCM internal deskew circuit

<

Clock buffers

Reset manager

DCM RESET. DCM LOCKED

DAC

ADC

* ADC

DCM internal deskew circuit

User FPGA (XC4VSX35)

Programmable oscillator

User control software

(Host PC)

Figure 3.3: Clock and reset managers detail.

in the DACs and ADCs. The clock source for the clock FPGA is provided by the programmable oscillator, namely CLKA. The clock source for the host interface logic is PCLCLK, which deskews the external clock source CLKB provided by the second programmable oscillator. The available operating frequencies of the programmable oscillators are as follows: 20 MHz, 25 MHz, 30 MHz, 33.33 MHz, 40 MHz, 45 MHz, 50 MHz, 60 MHz, 66.66 MHz, 70 MHz, 75 MHz, 80 MHz, 90 MHz, 100 MHz and 120 MHz. Both programmable oscillators are controlled via the user control software in the host computer. In this design, clock frequencies of 80 MHz and 33.33 MHz are used for the system clock (SYS.CLK) and the host interface clock (PCLCLK), respectively. The Reset manager block in the User FPGA manages the reset signal from multiple sources such as the hardware reset signal HW.RESET, software reset signal SW.RESET, Digital Clock Manager (DCM) locked signal in both User and Clock FPGAs. The software reset source comes from User software in the host computer via the host interface logic (shaded block in Figure 3.3). The Reset manager generates the internal reset signal INT.RESET for the user design from either asynchronous HW-RESET source or synchronous SW.RESET source. It also generates the reset

Chapter 3. MC-CDMA downlink receiver implementation 61

Design entry

Design synthesis

Design implementation

Device programming

Design verification

Behavioral simulation

Functional simulation

Static timing analysis

Timing simulation

In circuit verification

Fixed-point model verification

Figure 3.4: Modified design flow.

signal CLK_FPGA_RESET for the Clock FPGA on the development kit and uses the DCM locked signal RESET_FB to generate the reset signal for the DCM in the User FPGA.

In the implementation, all of the floating-point DSP algorithms of the system must be carried out to hardware architectures in fixed-point arithmetic with the aid of the MATLAB fixed-point toolbox. Dynamic range, quantization noise, overflow and saturation of the fixed-point arithmetic must be considered in order to determine the appropriate trade-off between performance and implementation cost. After fixed-point modeling, the fixed-point models are mapped into Register Transfer Level (RTL) implementation using Very-high-speed integrated circuit Hardware Description Language (VHDL). Each DSP algorithm during the hardware implementation must follow a modified design flow from Xilinx's ISE software [56] as illustrated in Figure 3.4 in order to achieve the system specifications.


RF front-end

ADC Digital front-end

AGC

Phase derotator

RF front-end gain control Phase

accumulator

Fraction CFO

estimator m Loop filter

Integer CFO estimator

Reference pilot

Coarse frame

detection

FFT

IE CP removal/ FFT window

Fine timing detection Loop filter

Pilot extractor

Channel estimator

Output FIFO Descrambler Demapper Despreader

Equalizer

Ï Data

extractor

Figure 3.5: Implementation block diagram of the MC-CDMA receiver.

3.2 Proposed receiver architecture

Figure 3.5 illustrates the block diagram of the receiver. The baseband downlink MC-CDMA receiver works at a system clock rate of 80 MHz and consists of the following blocks.

• Digital front-end: decimates the input data from the 14-bit ADC devices, sampled at 40 Msps, down to a sampling rate of 5 Msps. This structure also compensates for DC offset, amplitude and phase mismatch for both in-phase and quadrature (I&Q) rails.

• Automatic Gain Control (AGC): maintains the average power of the received signal automatically within a desired operation range by adjusting the gain of the RF front-end circuit.

• Phase derotator: compensates the CFO between the transmitter and the receiver.

• Fraction CFO estimator: estimates fractional subcarrier spacing of the carrier frequency offset.

• Coarse frame detection: detects the frame boundary of the received signal.

• Cyclic remove/FFT window: removes cyclic prefix samples in conjunction with the alignment the FFT window.

FFT processor: performs fast Fourier transform of the aligned samples.


• Phase accumulator: accumulates the filtered phase offsets.

• Loop filter: smooths the estimated phase offsets.

• Fine timing detection: detects fine sample offset left by the coarse frame detection circuit.

• Integer CFO estimator: estimates the integer subcarrier spacing of the carrier frequency offset.

• Pilot extractor: extracts pilot subcarriers embedded in the MC-CDMA symbol.

• Reference pilot generator: generates the reference pilot tones.

• Channel estimator: interpolates the channel frequency response given the information from the pilot subcarriers.

• Equalizer: equalizes the distorted signals.

• Data extractor: extracts the useful data subcarriers.

• Despreader: despreads and combines the signal energy scattered in the frequency domain using the desired user spreading code.

• Demapper: demodulates the despreaded symbols to bits.

• Descrambler: descrambles the demodulated bit sequences.

• First In First Out (FIFO): transfers the detected bit sequences asynchronously to the host computer.

3.2.1 Digital front-end

The digital front-end circuit decimates the input data from the ADC devices, sampled at 40 Msps, down to a sampling rate of 5 Msps using a polyphase decimation filter structure [57]. The architecture of the digital front-end is detailed in Figure 3.6. In this figure, the digital front-end is constructed from two cascaded half-band decimation filters, a decimation filter, a DC notch filter and an I/Q mismatch corrector.


From ADC Output

Haflband filter

40 Msps

Haflband filter - ! ■ Polyphase

filter DC notch

filter

l/Q mismatch corrector

20 Msps 10 Msps 5 Msps 5 Msps 5 Msps

Figure 3.6: Multistage decimation filter structure.

Table 3.1: Half-band filter specifications.

P a r a m e t e r Value Normalized passband frequency 0.25 Passband ripple (dB) 0.001 Normalized stopband frequency 0.5 Stopband attenuation (dB) 60 Filter taps 19

Half-band filters

The two half-band filters have the same characteristics and are followed by a factor of 2 downsampler. The polyphase filter is also followed by a factor of 2 downsampler. Such a combination of three stages yields an output sample rate decreased by 8 as compare to the input sample rate. The half-band filters act as an anti-aliasing filter that has a 60 dB stopband rejection. They satisfy the minimum Adjacent Channel Leakage power Ratio (ACLR) requirements in TS 25.101 v2.1.0 [46]. The half-band filters specifications are detailed in Table 3.1. Figure 3.7 shows the characteristics of the half-band filters using the filter design toolbox in MATLAB. The normalized frequency of 0 and 1 correspond to frequencies 0 and 4f, where F s is the sampling frequency. In the half-band filters, only one tap out of every two is non-zero, except for the center tap. Furthermore, since the coefficients are symmetric, the total number of effective non-zero taps is 6. Such an attractive property makes this structure uniquely desirable for use in multirate filters and it is very interesting for FPGA implementations.

In Figure 3.6, the first half-band decimation filter is followed by a factor of 2 down-

sampler. Thus, the implementation of such a combination may exploit the properties of the half-band structure such as one tap out of 2 being zero (except for the center tap) and tap symmetry. Since the system clock is chosen to be 80 MHz, (2x hardware over-clocking), the temporal multiplexing is also exploited for FPGA resource saving. Since the zero filter coefficients do not contribute to the filter output, there is no need to perform the sum product at these taps.


Magnitude Raspon** (dB)

0 1 0.2 0.3 0 4 0.5 0 6 0 7 0 Normalized Frequency {xx rad/sample)

(a) Magnitude response.

0.2 0.3 0 4 0.5 0 6 0 7 0.8 Normalized Frequency (*x rad/sampto)

(b) Phase response.

Impulsa Rasponaa

0.5

04

1 1 0.5

04

0.3 < > < » -

0.2

0.1

-0.2

0.1

• • * • a.

- 0 1 < I < )

10 12 14 n « T T

• • « • •

0 2 4 6 8 10 12 14 16 18 ■amptoa

(c) Impulse response. (d) Step response.

Figure 3.7: Characteristics of the halfband filters.

We define the z transform of the nonzero values of the halfband filter structure representing a set of delayed filter coefficients as

18

^ H B ^ = E M ^ " " (31) n=0

The polyphase partition of (3.1) representing the filter as the sum of successively delayed subfilter with the coefficients separated by a decimation factor of 2 is given by

# H B ( * ) = MO) + h(2)z~2 + h(4)z~4 + /i(6)z~6 + h(8)z~8 + h(9)Z9 + h(10) Z w

+ h{12)z~12 + h ( U ) z u + h(16)z16 + h(18)z~18 ,

We define HQ(Z2) and H\(z2) as the set of even and odd taps, respectively, and we have

H0(z2) = h(0) + h(2)z~2 + h(4)zA + h(6)z~6 + h(8)z~8 + h(10) Z w

+ h(12)z~12 + h(14)z~u + h(16)z16 + h{18)z~18

= h(0)(l + z"18) + h(2)(z~2 + z~18) + h(4)(z~4 + z~ u ) + h(6)(z~6 + z'1 2) + h(8)(z~8 + z '1 0) ,

(3.3)


k H A 7) r\ . i.

11 K J ♦

z"1

— - * #,(*)

Figure 3.8: Polyphase partition for the halfband decimation filter.

"„(*) e-

*,(*)

Figure 3.9: Polyphase partition for the halfband decimation filter with input down

samplers.

and H^z 2 ) = h(9)z~8 .

Finally, (3.5) is a compact representation of (3.2).

HHB(z) = H0(z2) + z1H1(z2)

(3.4)

(3.5)

These definitions naturally lead to the 2arm polyphase structure as shown in Fig

ure 3.8. In this figure, the output downsampler operates on every second input sample. Applying the equivalent conversions in [57, p. 750], a downsampler by 2 is pulled to the input side of each subfilter as shown in Figure 3.9, thus yielding 4 clock cycles per output sample for filtering operations (2 clock cycles per input sample). We can see that the interaction of delay lines in each arm with the synchronous downsamplers can be understood as an input commutator that feeds successive samples to successive in

puts of the subfilters as illustrated in Figure 3.10. It is also noteworthy that the upper subfilter HQ(Z) can exploit tap symmetry in order to obtain a folded structure [57]. This leads to the structure of the halfband decimation filter in Figure 3.11. Since there are only 5 taps in the upper arm and 4 clock cycles per input sample, it is possible


• — ► "oW ■e-

-►•——> ",w

Figure 3.10: Polyphase partition for the halfband decimation filter with input commu

tator.

&

&

«—» z~' —f* 2 ' » > z~' T * z '

a

z"' +4— z"' +4— z"1 +4— z"' «

a

No£>

J5>—»o IW> » £ ^

N8Î> » £ ^

Ô — * » £ > — * G

h B ^ — ^ ^

Figure 3.11: Polyphase halfband decimation filter structure.

to implement the upper arm with 2 embedded multipliers which are time multiplexed. The lower arm has only one tap and its coefficient is the center tap which is equal to 0.5. Thus, the lower arm is implemented by shifting the input sample to the left one bit instead of using a multiplier. Therefore, a small 8 x 16 ROM bank is required to store 5 tap coefficients which are quantized into 16bit values for the upper arm (unused address locations are set to zero).


Table 3.2: Polyphase decimation filter specifications.

P a r a m e t e r Value Normalized passband frequency 0.45 Passband Ripple (dB) 0.01 Normalized stopband frequency 0.5 Stopband attenuation (dB) 80 Filter taps 128

Polyphase dec imat ion filter

The polyphase decimation filter is also designed based on the minimum ACLR requirements. The specifications are detailed in Table 3.2 and its characteristics are graphically shown in Figure 3.12. The resulting polyphase filter has 128 symmetrical taps; the number of effective taps is therefore 64 taps per arm. Since the required output data sample rate of the polyphase decimation filter is 5 Msps, resulting in 16 clock cycles per output sample for the filtering operation, it is impossible to compute 64 taps within an output sample duration. Thus, we must partition each arm into 4 shorter sub-filters banks with 16 taps each. This leads to the implementation structure of the polyphase decimation filter in Figure 3.13 using 4 embedded multipliers per arm.

D C offset and I / Q mismatch corrector

In order to avoid DC offset of the received signal, a first-order digital DC notch filter was implemented in the digital front-end circuit. This filter is a special case of a second-order band stop filter [57] with the notch frequency uQ = 0. The transfer function of the DC notch filter becomes

HDc(z) = 1 - z -

1 — OLZ - 1 (3.6)

where the filter coefficient a varies from 0.95 to 0.99 so that the filter is always stable and it is controllable via user control software in the host computer. This leads to a single stage IIR filter as illustrated in Figure 3.14. The phase and frequency response of the filter with a = 0.95 are illustrated in figures 3.15(b) and 3.15(a), respectively.

In practical receivers, the phase and amplitude response of the I and Q branches are never exactly the same. This causes frequency translation in the I and Q branches. Therefore, it is necessary to perform I/Q mismatch correction at the end of the front-end


Magnitude Raaponaa (t Phaia Rasponsa

(a) Magnitude response. (b) Phase response.

Impulsa RmponM

(c) Impulse response. (d) Step response.

Figure 3.12: Characteristics of the polyphase decimation filter.

' I Comf | R O M

' I Comf | R O M 4-

' D u a l - p o r t

R A M 8 4 x 1 8

M A C

A d d . r

' D u a l - p o r t

R A M 8 4 x 1 8

M A C

A d d . r

' D u a l - p o r t

R A M 8 4 x 1 8

I Co»/. | R O M

M A C

A d d . r D u a l - p o r t

R A M 8 4 x 1 8

I Co»/. | R O M

A d d . r D u a l - p o r t

R A M 8 4 x 1 8

I Co»/. | R O M 4-

A d d . r ~ I

C o n t r o l l og ic

D u a l - p o r t R A M

8 4 x 1 8 M A C

A d d . r

L A d d a r



8 4 x 1 8 M A C

A d d . r

L A d d a r

C o n t r o l l og ic | R O M A d d a r

C o n t r o l l og ic | R O M J- A d d a r


D u a l -por t R A M

8 4 x 1 8

M A C

A d d a r r A d d a r


8 4 x 1 8

M A C


I npu t c o m m u t a t o r


8 4 x 1 8

M A C




8 4 x 1 8 I R O M A d d a r r

A d d a r



8 4 x 1 8 I R O M

4 A d d a r r F r o m l o c o n d



8 4 x 1 8 M A C

A d d a r

A d d a r

F r o m l o c o n d I npu t

c o m m u t a t o r


8 4 x 1 8

|— _ M A C

A d d a r

A d d a r I npu t c o m m u t a t o r

| R O M 4- A d d a r I npu t c o m m u t a t o r


8 4 x 1 8

M A C

A d d a r



8 4 x 1 8

M A C

A d d a r



8 4 x 1 8 | R O M A d d a r



8 4 x 1 8 | R O M i r A d d a r —i



8 4 x 1 8 M A C

A d d a r

L A d d a r


8 4 x 1 8 M A C

*

A d d a r

L A d d a r | R O M A d d a r | R O M 4- A d d a r


8 4 x 1 8

M A C



8 4 x 1 8

M A C


I p* D u a l - p o r t

R A M 8 4 x 1 8 | R O M 4, A d d a r i


8 4 x 1 8 M A C

A d d a r

■


8 4 x 1 8 M A C

A d d a r

Figure 3.13: Implementation block diagram of the polyphase decimation filter.


Input

■e- i e Output

Figure 3.14: Structure of firstorder digital DC notch filter.

OS

■

~N

i

i ; 2

2 5

3

o.a 0.6 0 4 0.2 Normalized Frequency («trad/sample)

-OJ -0.Ë - M -0 2 0 02 04 Normalized Frequency (XK rad/iampla}

(a) Frequency response (b) Phase response

Figure 3.15: Firstorder digital DC notch filter characteristics with a = 0.95.

unit. To illustrate this, consider a simple form of the I/Q mismatch signal as

I(t) = Xcos(ut) (3.7)

Q(t) = sm(ut + <p) (3.8)

where x a n d <fi are amplitude and phase mismatch, respectively. Ideally, x = 1 a n d 0 = 0, but in practical cases, x a n d <j> are always different from the ideal values. A simple scheme to correct a small amount of phase and amplitude mismatch derived from [58] is described in the following equations.

P(t) = Xcos(u>t)~ = cos(ivi) , X

Q'(t) = tan(0)xcos(w£) + — v . . ' « sm(ut) . X cos(0)

(3.9)

(3.10)

Using this principle, the architecture of the I /Q mismatch corrector is illustrated in Figure 3.16. This architecture exploits temporal multiplexing with the use of a real


Real

Imag

Alpha 1 /Alpha ROM

Phi Tan(Phi)

ROM

1/cos(Phi) ROM

Multiplier Multiplier

DFF

DFF i ^ Truncate

Adder

Real

Imag

□ Embedded in Vrrtex-4 DSP48 blocks

Figure 3.16: I/Q mismatch corrector unit architecture.

Table 3.3: Device utilization summary for the digital frontend circuit.

Logic util ization Total Used Utilization Number of slices 15360 1020 6 %

Number of FIF016s/BRAM16s 192 34 17% Number of DSP48s 192 24 12%

multiplier to perform the phase and amplitude correction for both I and Q branches. In this figure, the ROM blocks contain the precomputed values of the inversion of the amplitude mismatch, », the tangent of the phase mismatch, tan(</>), and the inverse of the cosine of the phase mismatch, \ , t , so that the value of 7 and <fr can be adjusted by user control software in the host computer. The FPGA resource utilization for the digital frontend circuit is given in Table 3.3. In this table, the logic slices, the FIFOl6s, the BRAM16s and the DSP48s are dedicated resources in the Virtex4 SX35 chip which are described briefly in Appendix D. Timing analysis results show that the critical path is 5.841 ns, i.e. the maximum clock frequency is 171.208 MHz.

3.2.2 Digital AGC circuit implementation

The automatic gain control (AGC) circuit is implemented as illustrated in Figure 3.17. The power approximation block calculates |Re| + |Im| of the decimated complex sample


Control logic

From digital front-end circuit

Approx power

|ReHH ■e Scaling

Averaging

RF front-end interface RF Front-end

gain controller Loop filter

«

Power threshold

Figure 3.17: Digital AGC circuit architecture.

Table 3.4: Device utilization summary for the digital AGC circuit.

Logic uti l izat ion Total Used Util izat ion Number of slices 15360 300 1 %

Number of FIF016s/BRAM16s 192 1 < 1 %

from the digital frontend for the power calculation. The accumulator and scaling circuit performs averaging of the approximated power of the signal. The error power is fed to a loop filter to track the variation of the power error. The characteristics of the loop filter affect the settling time and tracking behavior of the AGC loop. The loop filter will be detailed in the loop filter implementation section. The RF frontend controller converts the filtered power error signal to appropriate power control word for the RF frontend. The FPGA resource utilization for the digital AGC circuit is given in Table 3.4. Timing analysis results show that the critical path is 3.552 ns, i.e. the maximum clock frequency is 281.524 MHz.

3.2.3 Serial CORDIC processor

In Figure 3.5, the phase derotator rotates the phase of the complex input signal. An efficient approach to rotate the phase of a complex signal is a Coordinate Rotation Digital Computer (CORDIC) in rotation mode. The CORDIC processor accepts two basic operation modes: vectoring and rotation [59]. In the vectoring mode, the phase


and magnitude of a complex signal XQ + jy0 are computed iteratively as described in the following equations.

(3.11)

where

The output after n iterations is then

x i + l —

yi+i = Zi+l =

Xi - lhyi2~x

yi - /3 ix i2- i

Z i - & t a n ' 1 (2~l)

A = { 1 Vi < o - 1 otherwise

is then

xn

Vn = Zn =

Any/xl + yl 0

*> + tan"1® where An is a scaling factor and is given by the following equation.

1 An —

nro1 \/i + 2-2* '

(3.12)

(3.13)

(3.14)

When the number of iterations is large, An = 0.6073. In this mode, the input vector is limited to the I and IV quadrants of the complex plane. Therefore, a coarse rotation of the input vector must be performed in order to rotate the input vector from the II or III quadrants to the I and IV, respectively. The rotation is performed as follow

Xr

XQ XQ > 0 yo x Q < 0 , y0 > 0

-yo xQ < 0, y0 < 0

= <

yo XQ > 0 -XQ XQ < 0, y0 > 0

XQ XQ < 0, y0 < 0

0 X0 > 0 TT/2 XQ < 0, y0 > 0

-7T/2 XQ < 0, t/o < 0

(3.15)

In rotation mode, a complex signal is rotated iteratively as described in the following equations:

Xi+i = X i - A y t 2 _ i , yt+i = yt + PiX i2~ i, (3.16) z i+1 = Z i - fa t a n ' 1 (2~*) ,


X_in DFF DFF

Add/ Sub Add/ Sub DFF Add/ Sub DFF

Barrel shifter

Add/ Sub DFF

Barrel shifter

Add/ Sub

Barrel shifter

Add/ Sub Add/ Sub

OP i i

r ' r Barrel shifter -►

Add/ Sub

Barrel shifter -►

Add/ Sub Add/ Sub DFF

Y in Add/ Sub DFF

Y in DFF

Add/ Sub DFF

DFF

Add/ Sub

DFF

Add/ Sub Add/ Sub

X_out

Y_out

Z_in

Add/ Sub

Z_in DFF

Add/ Sub

DFF Add/ Sub

DFF Add/ Sub DFF Add/ Sub DFF Add/ Sub DFF

ROMjn DFF

Add/ Sub

DFF

Add/ Sub

Figure 3.18: Architecture for the CORDIC processing element.

where

A = - 1 Zi < 0 ,

1 otherwise . (3.17)

The output after n iterations is then

x n = A n (xQ COS ZQ - y0 sin z0) , y n = A n {yQ COS ZQ + XQ s in ZQ) ,

Zn = 0 .

(3.18)

Since the rotation angles are limited to the range [—TT/2, 7r/2] (I and IV quadrants of the complex plane), a coarse rotation of the input angle must be performed in order to extend the rotation angle between — TT and TT. Thus, an initial coarse rotation should be performed as follows

Xc

-yo ZQ > 7T/2 yQ ZQ < - I T / 2

XQ ZQ > TT/2

-XQ ZQ < - 7 r / 2 (3.19)

ZQ - TT/2 ZQ > TT/2

ZQ + TT/2 ZQ < —7r/2


The heart of the CORDIC processor unit is a processing element as shown in Figure 3.18. The OP signal in Figure 3.18 allows the processing element to work either in vectoring or rotation mode. The ROM J n signal is the pre-computed value of the arctangent in those equations and is stored in a small 16-word ROM for 16-bit input/output data widths. The multiplication terms 2~x are reduced to simple programmable barrel shift operations. Figure 3.19 shows an architecture of a cost-effective implementation of the CORDIC processor. Such an architecture exploits temporal multiplexing and uses only one CORDIC processing element. Thus, it requires 16 clock cycles to complete an operation. The FPGA resource utilization for the serial CORDIC processor is shown in Table 3.5. Timing analysis results show that the critical path is 6.983 ns, i.e. the maximum clock frequency is 143.202 MHz.

u M u X

M U X

M U X

CORDIC processing

element

X_in

Coarse rotation

u M u X

M U X

M U X

CORDIC processing

element Scaling

X_out X_in

Coarse rotation

u M u X

M U X

M U X

CORDIC processing

element Scaling

X_out

Y_in Coarse rotation

M u X

M U X

M U X

CORDIC processing

element Scaling

Y_out Coarse rotation

1*

M u X

M U X

M U X

CORDIC processing

element 1 1

Scaling

ZJn

Coarse rotation

1*

M u X

M U X

M U X

CORDIC processing

element 1 1

Scaling

Z_out

Coarse rotation

M u X

M U X

M U X

CORDIC processing

element Scaling

M u X

M U X

M U X

CORDIC processing

element

. ,

MODE MODE

Control logic Control logic

Figure 3.19: Architecture for the serial CORDIC.

Table 3.5: Device utilization summary for the serial CORDIC processor.

Logic uti l ization Total Used Uti l izat ion Number of slices 15360 252 1 %

Number of DSP48s 192 2 1 %


From decimation filter

16-word DPRAM

16-word DPRAM

16-word DPRAM

16-word DPRAM

Magnitude

To moving sum

□ Embedded in Virtex-4 DSP48 blocks

To autc-correlator

Figure 3.20: Architecture for the proposed convolution block.

3.2.4 Pre-FFT timing and frequency synchronization

Coarse frame detector implementation

In this receiver design, a training symbol structure typical of wireless LANs (e.g. 802.11) is periodically transmitted in the time domain in order to help the receiver detect the transmit frame and track the Carrier Frequency Offset (CFO) as illustrated in Figure 2.1. Direct FIR filter implementation of the convolution in (2.2) (the first timing metric) requires much logic resources, i.e. the number of complex multipliers in the direct implementation is in proportion to the length of the short training symbol Ns (in this case, Ns = 64 samples). By exploiting the system clock of 80 MHz (16 times faster than the data rate), the temporal multiplexing is also exploited for FPGA resource savings. This leads to the convolution block architecture as shown in Figure 3.20.

The proposed architecture uses only 4 complex multipliers that consists of 3 real multipliers each. The accumulators are implemented by exploiting the advanced fea-


Table 3.6: Device utilization summary for the convolution circuit.

Logic util ization Total Used Util izat ion Number of slices 15360 814 5 %

Number of FIF016s/BRAM16s 192 10 5 % Number of DSP48s 192 12 6 %

Input

To peak detector

►

Figure 3.21: Direct implementation of the moving sum circuit.

From convolution 640-word BRAM

& 64-word BRAM

To peak detector

Figure 3.22: Architecture for the proposed moving sum circuit.

tures of the Virtex4 DSP48 blocks without any extra logic. The delay elements are implemented using a dualport block RAM in FPGA. Both input and output of the block are truncated to 16bit precision to maintain the same dynamic range and preci

sion. The FPGA resource utilization for the convolution circuit is given in Table 3.6. Furthermore, timing analysis results show that the critical path is 5.136 ns, i.e. the maximum clock frequency is 194.708 MHz.

Direct implementation of the moving sum circuit (the second timing metric in (2.3)) can be realized as a special case of an FIR filter with unitary coefficients and tap delay duration Ns as shown in Figure 3.21. An efficient implementation of such a moving sum


Table 3.7: Device utilization summary for the moving sum circuit.

Logic ut i l izat ion Total Used Uti l izat ion Number of slices 15360 64 < 1 %

Number of FIF016/BRAM16s 192 2 1 %

Preset THRESHOLD

NEW FRAME - T

INPUT >= THRESHOLD

INPUT >* THRESHOLD

Figure 3.23: State diagram for the peak detector.

is proposed in Figure 3.22. Such an implementation uses only one real adder and one real subtracter. The delay elements in the proposed architecture use single port block RAMs. The FPGA resource utilization for the moving sum block is given in Table 3.7. Furthermore, timing analysis results show that the critical path is 5.483 ns, i.e. the maximum clock frequency is 182.387 MHz.

The peak detector detects the coarse timing by finding the sample position having a maximum value of the second timing metric in (2.3). The peak detector circuit is simply implemented using a Finite State Machine (FSM) with a search window length of Ns samples. When the input signal amplitude is larger than a predefined threshold, the peak position and new threshold are updated. The search window counter is reset and the operation continues for the incoming samples. The state diagram of the peak detector is shown in Figure 3.23.


From input buffer Conjugate

Complex multiplier

Conjugate

Complex multiplier 256-word BRAM Complex multiplier 256-word BRAM Complex multiplier Complex multiplier

/ï

r ^ a To phase

1 Embedded in Virtex-4 DSP48 blocks

r ^ a DFF calculator

1 Embedded in Virtex-4 DSP48 blocks

. DFF

.

Figure 3.24: Architecture for the fractional CFO estimator.

Table 3.8: Device utilization summary for the fractional CFO estimator.

Logic util ization Total Used Util ization Number of slices 15360 129 < 1%

Number of FIF016s/BRAM16s 192 1 < 1 % Number of DSP48s 192 3 1 %

Fractional C F O es t imator implementa t ion

A conventional CFO estimator method as described in Section 2.2.3 is used for the implementation of the fractional CFO estimator with the short training symbol length N$ = 64. Figure 3.24 shows the proposed architecture for the fractional CFO estimator. This architecture uses only a complex multiplier, a complex subtracter and a complex adder which are efficiently implemented using DSP48 blocks. The conjugate block inverses the sign (2's complement inversion) of the imaginary part of the input. The delay elements of the moving average circuit are implemented using a 256-word single port block RAM, i.e. averaging over 4 short training symbols. The FPGA resource utilization for the fractional CFO estimator is shown in Table 3.8. Timing analysis results show that the critical path is 4.354 ns, i.e. the maximum clock frequency is 229.676 MHz.

3.2.5 F F T processor unit

The FFT core provided by Xilinx computes a 512-point fast Fourier transform. The FFT core supports several architectures such as: Pipelined, Radix-4 burst I/O, Radix-

Chapter 3. MC-CDMA downhnk receiver implementation 80

From the pre-FFT frequency offset corrector

FFT control logic

512x32 input FIFO

Up counter

FFT output

512x32 block RAM

Reordered frequency bins output

Figure 3.25: FFT processor architecture.

2 burst I/O, and Radix-2-Lite burst I/O [2]. Given the system clock rate of 80 MHz and data rate of 5 Msps, the Radix-2-Lite burst I/O architecture took 70.787 ps to transform 512 data points. This is less than an MC-CDMA symbol duration of 128 ps. Therefore, we choose this architecture for our design in order to save logic resources while maintaining the performance.

Since the actual input sample rate to the FFT core is 80 Msps, it is necessary to synchronize the output sample rate of 5 Msps of the frequency offset correction unit and the input of the FFT core. Therefore, a simple 512 x 32-bit FIFO is needed to perform this task as shown in Figure 3.25. Similarly, the output sample rate of the FFT core must be synchronized with the connecting module. Furthermore, the output samples index of the FFT core are in natural order; a joint output sample rate synchronization and frequency bins reordering is performed at the same time as shown in this figure. In Figure 3.25, the FFT core is controlled according to the timing requirements in the FFT core document [2]. Thus, only simple control logic is necessary for the implementation as shown in Figure 3.26. The FFT core first resets all output pins, internal registers, counters and state variables to their initial values. All pending load processes, transform calculations and unload processes stop and are reinitialized. Once the FFT core is ready, the input data is loaded into the internal RAM and the process begins. The FFT core starts unloading the results when the 'Done' flag is asserted. The output data sample is then ready to feed the output buffer while re-ordering frequency bins as shown in Figure 3.25. The output buffer is simply implemented using a 512x32-bit block RAM with the write addresses in natural order and the read addresses beginning


FFT_REQUEST . T

Figure 3.26: State machine for the FFT processor.

Table 3.9: Device utilization summary for the FFT processor.

Logic util ization Total Used Uti l izat ion Number of slices 15360 632 4 %

Number of FIF016s/BRAM16s 192 4 2 % Number of DSP48s 192 4 2 %

from the middle of the RAM (i.e. the initial read address is 256 and counts up in natural order resulting in the output frequency bins being symmetrical around DC). The FPGA resource utilization for the FFT processor as illustrated in Figure 3.25 is given in Table 3.9. Timing analysis results show that the critical path is 2.911 ns (i.e, the maximum clock frequency is 343.536 MHz).

3.2.6 Reference pilot generator

Figure 3.27 shows the pilot tones generator architecture which is used for channel estimation, fine timing offset synchronization and integer CFO detection. The pilot


Pilot tones

DFF 9

DFF 8

DFF 4

DFF 3

DFF * 2

DFF 1

DFF 0 "

. BPSK modulator

DFF 9

DFF 8

DFF 4

DFF 3

DFF * 2

DFF 1

DFF 0 "

. BPSK modulator

" i

Figure 3.27: Pilot tone generator architecture.

/pilot_generator_tb_vrtd/rstn j ~

■pilot generator tb vhd/en « n i n i n i n i n « 0 1 :1 II 1 1 1 1 /pilot generator tb vhd/load

/pik>t_generator tb vhd/init 0100111011

/pilot_generator tb vhd/rdy T

i i generator tb vhd/uut/ser out i ~i r~ L i i _ l 1 1 ~i r~ /pilot generator tb vhd/pout lo 118318 I-i«3ia 118318 1-18318 11831(1-18318 118318 (-18318 118318

Figure 3.28: Simulation results for the pilot tone generator.

Table 3.10: Device utilization summary for the pilot generator.

Logic uti l izat ion Total Used Util izat ion Number of slices 15360 8 < 1 %

tones are generated from a maximum length binary sequence using the polynomial Pp\t(x) = x w + x3 + 1. In our design, the output sequence of the generator is BPSK modulated and the generator is reset at every new frame received. Figure 3.28 illus

trates the simulation results with the initial values of the registers randomly set to '0100111011'. We note that an allzero value is forbidden since the generator would endup in lockup situation (i.e. not generating any values). The FPGA resource uti

lization for the pilot tone generator is given in Table 3.10. Timing analysis results show that the critical path is 0.986 ns (i.e. the maximum clock frequency is 1013.787 MHz).

3.2.7 Pilot tone extractor

The pilot tone extractor unit in Figure 3.29 exploits temporal multiplexing of the FFT output buffer between data subcarriers and pilot tone subcarriers. It is implemented in conjunction with the frequency bins reordering inside the FFT processor unit. The


From FFT output

Pilot tone subcarrier index ROM

Up counter

Address

512x32 block RAM

Pilot tones output

►

□ FFT output buffer

Figure 3.29: Pilot tone extractor architecture.

Table 3.11: Device utilization summary for the pilot tone extractor.

Logic util ization Total Used Util izat ion Number of slices 15360 35 < 1 %

pilot tone subcarriers indexes are pre-computed and stored in a small ROM. The FPGA resource utilization for the pilot tone extractor unit excluding the FFT processor and buffers is given in Table 3.11. Timing analysis results show that the critical path is 2.155 ns, i.e. the maximum clock frequency is 464.113 MHz.

3.2.8 Post-FFT timing and frequency synchronization

Fine t iming synchronizat ion implementa t ion

Figure 3.30 shows the architecture of the fine timing synchronization unit discussed in Section 2.2.4. The average phase difference of the two adjacent pilot tones HmN},i and H*m_^N { for the i t h symbol is done by a Complex Multiply-ACcumulate (CMAC) using the dedicated DSP48s blocks. The estimation of the timing offset is implemented by reusing the CORDIC processor in vectoring mode. The FPGA resource utilization for the fine timing synchronization unit is given in Table 3.12. Timing analysis results


From pilot extractor

Conj

Complex multiplier ■e-

CORDIC (vectoring mode)


Scaling

T Timing offset

Figure 3.30: Fine timing synchronization unit architecture.

Table 3.12: Device utilization summary for the fine timing synchronization unit.

Logic uti l ization Total Used Util izat ion Number of slices 15360 310 1 %


show that the critical path is 4.422 ns, i.e. the maximum clock frequency is 226.124 MHz.

Integer C F O es t imator implementa t ion

An architecture of the proposed integer CFO estimator mentioned in Section 2.2.5 is illustrated in Figure 3.31. A Complex Multiply-ACcumulate (CMAC) unit using the dedicated DSP48s blocks is used in order to efficiently compute multiple cross-

correlations of the received pilot tones with shifted versions of the reference pilot tones. A simple comparator is used as a peak detector to find the maximum amplitude of each correlation process. The peak detector is synchronized with the control logic block that drives the reference pilot tone generator. At each correlation process, the control

■


Embedded in Virtex-4 DSP48 blocks

> 2's comp

Delay Conj

Pilot generator

Complex multiplier ■e

Control logic

2 . 1 , |2 l*r+M

Peak detector

Frequency offset

Figure 3.31: Integer CFO estimator architecture.

Table 3.13: Device utilization summary for the integer CFO estimator.

Logic util ization Total Used Util izat ion Number of slices 15360 272 1 %

Number of DSP48s 192 3 1%

logic block shifts the initial register of the pilot generator to the left or right by one bit in order to generate the positive or negative shifted version of the reference pilot tones. This approach results in an efficient determination of how much the integer CFO corresponds to the number of shifted samples at the reference pilot generator. The FPGA resource utilization for the integer CFO estimator is given in Table 3.13. Timing analysis results show that the critical path is 4.422 ns, i.e. the maximum clock frequency is 226.124 MHz.

3.2.9 Digital proportionalintegral loop filter

Implementation of the digital ProportionalIntegral (PI) loop filter illustrated in Fig

ure 2.3 is straightforward. A multiplierless design is exploited in this implementation


Table 3.14: Device utilization summary for the loop filter unit.

Logic uti l ization Total Used Util izat ion Number of slices 15360 100 < 1%

From pilot extractor

iv 2's comp.

DPRAM

Pilot generator

Split

Control logic

Coefficients ROM

MAC Scaling Real

►

MAC Scaling

Output

Imag


Figure 3.32: Channel estimator architecture.

by choosing the coefficients Ki (integral) and Kp (proportional) to be a power of 2. Furthermore, the two coefficients are programmable so that we can easily tune the fre

quency/timing tracking performance of the receiver. The FPGA resource utilization for the loop filter unit is given in Table 3.14. Timing analysis results show that the critical path is 2.476 ns, i.e. the maximum clock frequency is 403.910 MHz.

3.2.10 Channel estimator

Since the reference pilot tone amplitude is assumed to be one, the estimated channel response at the (mNf) th subcarrier, Htnjvy,i ,m = 0,1,...,NP — 1 for the zth symbol is obtained by a simple multiplication between the received pilot and the corresponding reference pilot. The remaining channel responses at subcarriers other than pilot sub

carriers are interpolated using a serial FIR filter approach for the sine interpolation technique mentioned in Section 2.3.1. The proposed channel estimator architecture is illustrated in Figure 3.32. In this figure, a serial architecture for the FIR filter ex

ploits the temporal multiplexing technique to save silicon area while maintaining the performance. The filter coefficients are precomputed and stored in a small ROM.

The FPGA resource utilization for the channel estimator unit excluding the pilot generator is given in Table 3.15. Timing analysis results show that the critical path is


3.744 ns, i.e. the maximum clock frequency is 267.087 MHz.

Table 3.15: Device utilization summary for the channel estimator.

Logic util ization Total Used Util izat ion Number of slices 15360 150 < 1 %


From FFT processor

DFF

From channel estimator

Conj

Complex multiplier

Real Truncate

Imag

—► Truncate

Split

Real ►

Imag ►

MAC Truncate

□

Divider

Xilinx's divider core

Real

Output

Imag

Figure 3.33: Channel equalizer architecture.

3.2.11 Channel equalizer

The ZF equalizer performs complex division on the data subcarriers from the FFT processor by the corresponding channel response estimated by the channel estimator. A lowcomplexity complex division architecture is proposed in Figure 3.33. The equal

izer exploits temporal multiplexing and overclocking techniques for resources savings. Consider a complex division described as follows:

a + bj ac + bd be — ad . c + dj c2 + d2 c2 + d2 (3.20)

The divisor term c2 + d2 is obtained by simply performing a MultiplyAccumulate (MAC) operation on the real and imaginary components of the input signal. The resulting divisor feeds to the real divider with its dividend input being multiplexed between the real ac + bd and imaginary be — ad outputs of the complex multiplier. As a result, the output of the divider must be demultiplexed in order to separate the real and imaginary components of the resulting signal. The real divider block in this figure uses Xilinx's Divider Generator [60] with minimum resource setting. The FPGA resource


Table 3.16: Device utilization summary for the channel equalizer unit.

Logic ut i l izat ion Total Used Uti l izat ion Number of slices 15360 547 3 %


utilization for the channel equalizer unit is given in Table 3.16. Timing analysis results show that the critical path is 3.093 ns, i.e. the maximum clock frequency is 323.326 MHz.

Keai

M U X

M U X

Ace Truncate Real

U 2's complement

M U X

Ace Truncate U 2's complement

M U X

From equalizer | Despread

signal

r* 2's complement M

U X

r* 2's complement M

U X

Ace Truncate Imag M

U X

Ace Truncate Imag

M U X

Ace Truncate

-. Code select

Spreading cc de ROM

Figure 3.34: Despreader unit architecture.

Table 3.17: Device utilization summary for the despreader unit.

Logic ut i l izat ion Total Used Uti l izat ion Number of slices 15360 119 < 1 %

3.2.12 Frequency domain despreader

The despreader performs despreading in the frequency domain by accumulating the sign-controlled data samples from the equalizer as shown in Figure 3.34. The despreader unit allows users to select the available spreading codes which are pre-computed and stored in a small ROM. In this design, there are 8 selectable spreading codes. The FPGA resource utilization for the despreader unit is given in Table 3.17. Timing analysis results show that the critical path is 4.233 ns, i.e. the maximum clock frequency is 236.259 MHz.


q1 qO

IO it K)

QPSK 16-QAM

q2 q1 qO

î î î î t î b1 bO b3 b2 b1 bO bS b4 b3 b2 bt bO

i I I i i i 12 11 iO

6 4 - Q A M

Figure 3.35: Bit position in an MQAM symbol.

Inphase/ Quadrature 11 10 00 01

-3d - d d 3d

Inphase/ Quadrature

(a) QPSK bit demapping. (b) 16QAM bit demapping.

111 ■ 110 • I • -

i 100 101 001 000 010 i

! -3a -d d 3d 5d ;

Inphase/ Quadrature

(c) 64QAM bit demapping.

Figure 3.36: MQAM bit demapping.

3.2.13 Bit demapper

The inphase and quadrature bits position of QPSK, 16QAM and 64QAM symbols are defined as in Figure 3.35. Therefore, the demapper is easily implemented using a bit

bybit demapping scheme as proposed in [54].

The data symbols after despreading are normalized and Gray decoded. The bit

bybit demapping schemes for QPSK, 16QAM, and 64QAM symbols are illustrated in figures 3.36(a), 3.36(b), and 3.36(c), respectively. In these figures, the decision region boundaries are represented as the vertical dash lines. To understand, consider the 16QAM modulation in Figure 336(b). For example, if the inphase part of the despread signal has a positive value (its sign bit is zero), the decoded symbol has 2 possibilities: "00" or "01". If the inphase value is greater or equal to 2d, where d is the Euclidean distance between two adjacent symbol along the inphase/quadrature axis, the decoded symbol is "01", otherwise, the decoded symbol is "00". Demodulation of the quadrature part is identical to the inphase part. A proposed architecture for a complete demodulator unit is illustrated in Figure 3.37. The modulation selector signal


QPSK demapper

D E M U X

D E M U X

16-QAM demapper

D E M U X

From despreader 16-QAM

demapper

D E M U X

P/S 16-QAM

demapper

D E M U X

P/S 16-QAM

demapper

D E M U X

D E M U X

D E M U X

64-QAM demapper 64-QAM

demapper

Decoder

i .

Decoder Decoder Decoder

Bit output

Figure 3.37: Demapper architecture.

Table 3.18: Device utilization summary for the demapper.

Logic uti l ization Total Used Uti l izat ion Number of slices 15360 94 < 1%

in this figure allows the user to select the modulation scheme to be demodulated via user software in the host computer. The FPGA resource utilization for the demapper is given in Table 3.18. Timing analysis results show that the critical path is 1.822 ns, i.e. the maximum clock frequency is 548.802 MHz.

3.2.14 Bit descrambler

The random bit streams are generated from a maximum length binary sequence using the polynomial Psrc{x) = x8 + x2 + 1 similar to the descrambler in [61]. Figure 3.38 shows the architecture of the data descrambler unit. In our design, the output sequence of the generator is exclusive-ORed with the demodulated bit stream to produce the de-scrambled bit stream. The initial values of the registers are arbitrarily set to '11111111'. We note that an all-zero value is forbidden since the generator would end-up in a lockup situation (i.e. not generating any values). The FPGA resource utilization for the descrambler is given in Table 3.19. Timing analysis results show that the critical path is 0.885 ns (i.e. the maximum clock frequency is 1130.199 MHz).


Bit in

DFF 7

DFF 6

1

DFF 5

DFF H°rh DFF fc DFF " * 1

DFF 0 ^ T ^ Bit out

► DFF 7

DFF 6

1

DFF 5

DFF H°rh DFF fc DFF " * 1

DFF 0 i - H L /

Figure 3.38: Data descrambler architecture.

Logic util ization Total Used Utilization Number of slices 15360 7 < 1 %

Table 3.19: Device utilization summary for the descrambler.

3.2.15 Host computer/debug interface

The host interface control logic module was implemented based on the timing require

ments detailed in [6, p. 116] and running at the PCI bus clock speed PCLCLK of 33.33 MHz as shown in Figure 3.39. A simple FSM was used to determine if the address or the data is sent from the Interface FPGA [6] via some handshaking signals address/data strobe (AS_DSn), empty (EMPTY), busy (BUSY) and a 32bit ADIO bus. Once the address and data are completely decoded, they are assigned to the appropriate output ports namely DATA and ADDRESS. Other handshaking output signals are read strobe (RD_STROBE) and write strobe (WR_STROBE) that are used to control the Register map block. The host interface communication logic uses a few slices in the User FPGA as shown in Table 3.20.

Figure 3.40 shows the debug interface that allows the designer to observe the internal data path of an individual unit or a combination of several units inside the receiver. For example, observable data paths include the output of the frontend unit, convolution unit, peak detector unit, etc. An additional 8bit debug register is also defined in the host interface unit which can handle up to 128 data paths and can be selectable via user control software. The observed data are fed to the onboard DAC chips so that they can be monitored by an external oscilloscope. Since the debug interface unit costs very few extra logic resources, the FPGA resource utilization for this unit is not mentioned here.


ADIO

BUSY

EMPTY

AD_DSn

RSTn

PCI CLK

Host interface logic

ADDRESS

WR_STROBE

Register map

Modules to be controlled

Modules to be monitored

Figure 3.39: Host interface logic module.

Table 3.20: Device utilization summary for the host interface logic.

Logic uti l ization Total Used Util izat ion Number of slices 15360 19 < 1%

Digital oscilloscope

PCI interface 1 Debug register

h-PCI

interface 1 Debug register

h-PCI

interface

-

PCI interface

i -

M U X

-

UUT M U X

♦ H Front-end h -

UUT M U X

UUT M U X

* H Convolution h*-

UUT M U X

UUT M U X

* H Autocorrelator h—

UUT M U X

UUT M U X

«•H Peak detector h— UUT

M U X

UUT DAC M U X

* H Phase derotator H -UUT DAC

M U X

UUT DAC M U X «•H Cyclic prefix remove H - UUT DAC M U X «•H Cyclic prefix remove H - UUT DAC M U X

UUT DAC M U X

* H FFT processor h -

UUT M U X

UUT M U X

«•H Channel estimator H -

UUT M U X

UUT M U X

* H Channel equalizer h -

UUT M U X

UUT M U X

* H Post-FFT processing r*-

UUT M U X

;

UUT

Figure 3.40: Debug interface architecture.

3.2.16 Implementation summary

Xilinx's synthesis tool reported that the maximum clock frequency of the complete receiver was 149 MHz and the total power consumption was 889 mW (reported by Xil

inx's XPower software). Table 3.21 summarizes the implementation results of crucial modules in the receiver. Each module is listed in terms of logic slices, block RAMs,

.


Table 3.21: Implementation results of crucial modules in the receiver.

Sys tem clock (MHz) 80 Sampling r a t e (MHz) 5 Spreading codes length 8 Receiver configuration indoor- to-outdoor Total slices 15360 Total b lock-RAM 192 Total mult ipl iers 192 Module Slice % R A M % Multipl ier % Digital front-end 1020 6 34 17 24 12 AGC 300 2 1 1* 0 0 Coarse frame detection 814 5 10 5 12 6 Fractional CFO estimator 129 1* 1 1* 3 1 CORDIC 252 1 0 0 2 1 FFT processor 632 4 3 1 4 2 Channel estimator 150 r 1 1* 2 1 Channel equalizer 547 3 0 0 4 2 Pilot generator 8 1* 0 0 0 0 Loop filter 100 1* 0 0 0 0 Fine timing detection 310 2 0 0 3 1 Interger CFO estimator 272 1 0 0 3 1 Despreader 119 1* 0 0 0 0 Demapper 94 1* 0 0 0 0 Desrambler 7 1* 0 0 0 0

less than 1%

hardware multipliers and in percentage of the target FPGA device. The detailed implementation block diagram of the receiver is sketched out from the synthesized top-level RTL schematic as shown in Figure 3.41 in order to reveal the internal data paths for measurement purposes.


« S

e E C

i s o £ *. 3 _ i «' M s

J

E tM

Figure 3.41: Detailed VHDL implementation diagram.


3.3 Decisiondirected overlap LMMSE channel es

timator

3.3.1 Proposed LMMSE estimator implementation

In this section, the channel autocorrelation submatrix RHmHm of size 64 x 64 is chosen as constituting a tradeoff between performance and complexity for efficient implemen

tation of the decisiondirected overlapped LMMSE estimator in an Xilinx VirtexII Mul

timedia development platform with the Virtex II XC2V2000 device [62]. The LMMSE estimation block in Figure 2.9 requires the most processing power. This block performs complex matrixvector multiplications. Let the filter matrix of size 64 x 64 be

G m — R H m H „ RHmHm + A

SNR G m ( l , l ) Gm{l,2) ■ Gm(2, l ) Gm(2,2) ■

Gm(64,l) Gm(64,l) •

Gm(l ,64) Gm(2,64)

Gra(64,64)

(3.21)

where I is an identity matrix of size 64 x 64, A is a constant that depends on the mod

ulation and SNR is the target SNR. RH mHm , A and SNR are known and precomputed beforehand. The m t h LMMSE channel estimation vector in (2.53) can be rewritten in matrix form

H LMMSE

G m ( l , l ) C7m(l,2) Gm(2, l ) Gm(2,2)

Gm(l,64) Gm(2,64)

Gm(64,l) Gm(64,l) ••• Gm(64,64) _

H m ( l ) Hm(2)

Hm(64)

(3.22)

CLK _ r i j n j n j i j n j n j i j n j a j n j a i i J i j n L i n j n j ^ VALID

DATA -//- -//-

X -16 cycles—

-//--//- J< -16 cycles-

-//-16 cycles

Figure 3.42: MCCDMA symbol timing.

'


-&h DFF

DFF

DFF

DFF

> DFF -&h DFF

DFF

DFF

DFF

> h . RAMO DFF DFF -&h DFF

DFF

DFF

DFF

^ ^ RAMO DFF

D E M U X

ROMO

DFF

DFF

DFF

DFF

D E M U X

ROMO

DFF

DFF

DFF

DFF

/ D E M U X

- < 2 H

DFF

DFF

DFF

DFF

/ D E M U X

» DFF - < 2 H

DFF

DFF

DFF

DFF

/ S , RAMI DFF

D E M U X

» DFF - < 2 H

DFF

DFF

DFF

DFF

\ . J RAMI DFF

D E M U X

np ROM1

DFF

DFF

DFF

DFF

D E M U X

Output ROM1

DFF

DFF

DFF

DFF

1

r

D E M U X

-<EH

DFF

DFF

DFF

DFF

1

r

D E M U X

» DFF -<EH

DFF

DFF

DFF

DFF

1

r i

RAM2 DFF

D E M U X

» DFF -<EH

DFF

DFF

DFF

DFF

^ Z 7 ^ RAM2 DFF

D E M U X

ROM2

DFF

DFF

DFF

DFF

»

D E M U X

ROM2

DFF

DFF

DFF

DFF , r

»

D E M U X

-&H

DFF

DFF

DFF

DFF , r

»

D E M U X

DFF -&H

DFF

DFF

DFF

DFF , r S , RAM3 DFF

»

D E M U X

DFF -&H

DFF

DFF

DFF

DFF \ V '

RAM3 DFF

ROM3

DFF

DFF

DFF

DFF

Figure 3.43: Proposed matrixvector multiplication architecture.

Suppose that the duration of an input symbol to the estimator Hm(z) , i = 1,2,.., 64, requires 16 clock cycles per symbol, as shown in Figure 3.42. In this case, a sub

matrix size 5 = 64 is a good tradeoff between BER performance and complexity. A straightforward implementation of (3.22) requires a total of 64 multiplications, i.e. 64 clock cycles per element of Hj^MMSE. However, this violates the symbol timing re

quirement. Therefore, splitting 64 complex multiplications into 4 parallel processing streams with 16 complex multiplications each is appropriate to solve this issue. As a result, a proposed matrixvector multiplication architecture is shown in Figure 3.43. This architecture requires 4 complex multipliers, 4 complex adders, some block RAMs (BRAM) and some registers. In this proposed architecture, the filter matrix G m of size 64 x 64 words is partitioned into four 16 x 64 block of words, i.e. four 1024word blocks map to reconfigurable onchip ROMs. As we mentioned earlier, assume that when the first sample of vector H m arrives at the estimator, there are 64 multiplications G m ( l , l )H m ( l ) , Gm(2, l )H m ( l ) , ■■■ , Gm(Q4, l )H m ( l ) that have to be performed within one symbol duration, i.e. 1 multiplication/cycle x 4 x 16 clock cycles/symbol. The RAMbased complex accumulator accumulates these multiplication results and stores values into memory (16word deep per RAM) in order to add up with the next multipli

cation results of the incoming sample of vector H m . For example, Yliti Gm ( l , i )Hm ( i ) is stored into the first cell of RAMO, Sf l j Gm(2,i)Hm(i) is stored into second cell of RAMO, YHLI G m ( n , i ) H m ( i ) is stored into first cell of RAMI and so on. After receiv

ing 64 input symbols (in 64 x 16 clock cycles), 64 output symbols are unloaded by


CLK _ n j i j n j n j i _ f L i T j n _ i i _ r i j n j ^ INPUT j i ^ r ^ r OUTPUT

- / / -- / / -

•/■ 64x16 cycles L -//-

64 cycles 64 cycles

STATE -PROCESSING- 4—UNLOAD ► -PROCESSING- <•—UNLOAD ►

Figure 3.44: Matrixvector multiplication timing.

LMMSE A _J~

LMMSE B

DEMUX r OUTPUT —'

- / / - - / / -64 cycles

- / / -64cydes

I 64 cycles '—rV ' 64 cycles

I—rV 1 64 cycles '—•/" '

- / / -

- / / -64 cycles ~U

I - V A - T

64 cycles

U^-T

64 cycles I—y* 1 64 cydes

-—i^_r-~ Lv

48 cycles 32 cycles 32 cycles 32 cycles 32 cycles 32 cycles 48 cycles

Figure 3.45: Overlap LMMSE estimator timing.

demultiplexing all 16 words stored in each RAM and flushing their contents to zero values in order to be ready for the next incoming data, as illustrated in Figure 3.44.

Final matrixvector multiplication results are obtained by overlapping the results of two matrixvector multiplication units. The final overlap LMMSE estimator architec

ture is shown in Figure 3.46. The doublebuffering controller does two things: it buffers the input vector H to bank A while scheduling the output samples on bank B and vice versa. The scheduling process produces H m samples to the upper matrixvector unit (part A) and its delayed version of S/2 = 32 samples to the lower matrixvector unit (part B) during the overlapping operation. As a result, the final output samples are obtained by demultiplexing the output samples from the two matrixvector multiplica

tion unit as illustrated in Figure 3.45. An additional FIFO is inserted at the output of the overlap LMMSE estimator in order to reformat the output symbol timing similar to the input timing for future processing.

The estimator in Figure 3.46 was implemented on the Virtex II XC2V2000 device. The Xilinx Synthesis Tool (XST) reported that the overlap LMMSE estimator design achieved a maximum clock frequency of 199.124 MHz. The resources consumption of the proposed estimator is summarized in Table 3.22.


RAMA

Matrix-vector (Part A)

D E M U X

, L

' r


D E M U X

, L

' r


D E M U X

Input Double-buffering control

D E M U X

Output Input Double-buffering control

D E M U X

FIFO Output Double-

buffering control

D E M U X

FIFO Double-buffering control

D E M U X \

Double-buffering control

Matrix-vector (Part B)

D E M U X \

i .

< r


D E M U X \

i .

< r


D E M U X \

RAM B


D E M U X \

RAM B . 1

\

RAM B

Control

\

Control

\

ogic

Figure 3.46: Proposed overlap LMMSE estimator architecture.

Table 3.22: Device utilization summary for the overlap LMMSE estimator.

Logic ut i l izat ion Total Used Uti l izat ion Number of slices 10752 1983 19%

Number of FIF016s/BRAM16s 56 36 6 4 % Number of MULT18x 18s 56 24 4 3 %

3.3.2 Hardware-in-the-Loop verification

Verification of the proposed overlap LMMSE estimator is based on the Hardware-in-the-Loop (HIL) technique. Due to limitation of the available resources of the FPGA device, only the feedback path of the receiver in Figure 2.9 (shaded blocks) was implemented for HIL verification. The other blocks were implemented in MATLAB software and the required data samples were transferred to the HIL testbed via an RS232 interface, i.e. the signals labeled "FFT out", "Bit in" and "Bit out" in Figure 3.47. The block diagram of the HIL testbed is detailed in Figure 3.47. The HIL verification design achieved a maximum clock frequency of 160.09 MHz. Data exchange between control software on the host computer and the HIL test design was fixed at 115.2 KBaud. The control software simply sends FFT samples and bit samples generated by MATLAB then waits for the resulting bit stream from the Unit Under Test (UUT). Experiments indicated that the results obtained from the UUT in the FPGA device were identical to the results obtained in MATLAB simulations, as expected.

•


RS232 transceiver

r —

Unit under test

RS232 transceiver

FFT out RAM-based delay

Unit under test

RS232 transceiver

Bit in 1

RAM-based delay

RS232 transceiver

Bit in 1

RS232 transceiver

Bit in 1 Mapper Spreader - * ►

RAM-based interleaver

- * ■ ZF equalizer

RS232 transceiver

Mapper Spreader - * ► RAM-based interleaver

- * ■ ZF equalizer

Host PC RS232

transceiver

, . i . , . ■ ■ >

r

■4 ► RS232

transceiver Modulation Control logic

Overlap LMMSE

estimator

RAM-based delay

RS232 transceiver

Spreading code . Control logic

Overlap LMMSE

estimator

RAM-based delay

RS232 transceiver

' ■ ■ ' ' > 1 '

RS232 transceiver

Bit out Demapper Despreader RAM-based

deinterleaver ZF equalizer

RS232 transceiver

Demapper Despreader RAM-based deinterleaver ZF equalizer

RS232 transceiver

l__

Figure 3.47: Hardwareintheloop verification block diagram.

3.4 Conclusion

This chapter presented an implementation of a complete baseband MCCDMA down

link receiver into an FPGA platform for an indoortooutdoor/pedestrian environment. The receiver was modularly implemented in native VHDL language so that it is possi

ble to maximize reuse of the codes for future enhancements. The implementation also exploited temporal multiplexing techniques in order to minimize logic gates, embed

ded multipliers, embedded memory blocks while maintaining the same performance as the traditional implementation technique. Polyphase filtering was chosen to be used for decimation filters in the digital frontend circuit in order to maximize hardware usage, resulting in low complexity overall. The receiver used an 802.11like preamble for acquisition and tracking the symbol timing and carrier frequency offset. A nearly optimum sine interpolation method was used in the pilot assisted modulation channel estimation technique. This algorithm reduces hardware complexity significantly by ex

ploiting the temporal multiplexing technique. Furthermore, the CORDIC algorithm was used to compute the carrier frequency offset and compensate for these errors. A serial CORDIC architecture was implemented due to its low complexity in this receiver. The lowest complexity radix2 FFT core from Xilinx was used in this design. Internal controllable signals were mapped to registers in order to allow the user to change their values at run time. The MCCDMA receiver for indoortooutdoor channel configura

tion used less than 40% resource of the Virtex4SX35 device and achieved a maximum clock frequency of 149 MHz.

A decisiondirected LMMSE estimator with a submatrix size of 64 x 64 was chosen to be a good tradeoff between performance and complexity for efficient implementation in an FPGA platform, i.e. a Xilinx development board with the Virtex XC2V2000 device.


The implementation of the proposed estimator achieved a maximum clock frequency of 199.24 MHz. Hardware-in-the-loop verification was also performed to verify the implementation results with computer simulation in MATLAB. Experiments showed that the results obtained from the FPGA chip were identical to those from MATLAB simulations.

Chapter 4

Lower bound for downlink BER performance of MCCDMA

4.1 Introduction

In this chapter, an analytical study on BER performance for an MCCDMA system in a Rayleigh multipath fading channel is presented. It is difficult to find the exact BER performance expression for an MCCDMA system in a multipath fading environment since it depends on a large number of independent fading subcarriers and on Multiple User Interference (MUI). Authors in [5, 10, 43] used a numerical simulation approach to study the performance of MCCDMA. Zhang and Guan [63] used an approximation method involving a multidimensional integral to simplify the BER expression. They also presented a lower bound on BER performance for Binary Phase Shift Keying (BPSK) MCCDMA systems with Orthogonal Restoring Combining (ORC).

The work of Zhang and Guan will be extended herein to a general MCCDMA scheme in order to find an analytical lower bound expression for downlink BER perfor

mance with coherent detection for MQAM in a multipath fading channel.

4.2 Analytical lower bound for BER performance

Consider an MCCDMA transmitter for the u th user with N MQAM modulated sub

carriers over a bandwidth of B Hz as shown in Fig. 1.16. Adjacent subcarriers are spaced

■

Chapter 4. Lower bound for downlink BER performance of MCCDMA 102

A / = B / N Hz apart and the effective symbol duration is T = 1/A/. The MQAM in

put data symbols for the i th MCCDMA symbol have to be serialtoparallel converted to P parallel branches X Q ^ X ^ , ■ ■ ■ , X p _ l v The purpose of the serialtcparallel con

verter is to slow down the symbol rate in order to ensure frequency nonselective fading on the resulting subcarrier [43]. Each branch is spread in the frequency domain with an orthogonal spreading code of length L. The spread symbols are mapped onto N = PL subcarriers and an IFFT is performed to convert them to a time domain signal. The discrete time domain IFFT output of the i th MCCDMA symbol at the n t h sample is given by

« W J l E Ë W ^ ^ l , 0.1.W1 (4.1) » JV p=0 (=0 I i y J

where Cfi is the Zth chip of the u th user spreading code for the i th symbol, and Ec = E s /L is the chip energy, where E s is the symbol energy before spreading. The discrete time domain sequences are converted back to a serial data sequence. Then, a cyclic prefix of duration Tg is inserted between the symbols to combat InterSymbol Interference (ISI) and the InterCarrier Interference (ICI) caused by multipath fading. This leads to the total symbol duration being extended to Ts = T + Tg. Finally, a windowing function g(t — iTs) is applied to the signal before digitaltoanalog conversion and upconversion for transmission. The complex baseband representation of the transmitted signal Xj(t) for the u th user is expressed as

/ Z7I CXI P 1 L 1

*"(*) = feEEE x l £ l i exP UMPP + 0 A/(* T g iT.)}g(t iTa) . T ■ i = o o p = 0 / = 0

(4.2)

The MCCDMA receiver for the u th user is shown in Fig. 1.17. The complex base

band representation of the received signal is expressed as

»(*)= Ë T,hk(t)xv(tTk) + w(t) fc=0 u=0

fËT oo K 1 U 1 P 1 L 1 (A%\

= V^ E E E E E W*«.c& ( j " J i = o o fc=0 u=0 p=0 Z=0

x exp {j2ir(Pp + l)Af(t r k T g iTs)} g(t r k %Ta) + w(t) ,

where hk(t) is the equivalent lowpass response of the kth path, rk is the time delay of each multipath component, and w(t) is the complex baseband Additive White Gaussian Noise (AWGN) with zero mean and power spectral density A o The cyclic prefix is first removed and the remaining samples are serialtoparallel converted. The discrete time

. J

Chapter 4. Lower bound for downlink BER performance of MC-CDMA 103

representation of the u th user for the i th symbol is given by

K - 1 U - 1 yi(n) = E E hk(n)x^(n - Tfc) + Wi(n)

fc=0 u=0

[ Ê ~ K - 1 U ^ P - ^ L ^ U . . v u n u jj27T(Pp + l ) ( n - T k ) \ . . = V^v E E E E hk(n)X;4C^exp i - i i i + w<(n)

' -'V fc=0 u = 0 p = 0 i=0 I JV J ^ ' m \- j2ir(Pp + l)T k \

E >*(") exP ) M f (4-4) J V f e " " v v ^ l "

u=o p=o ;=o I J

- v 7F E E E HpP+iAn)XpA,i exP i KF r + W i ^ ' V i V u=0 p=0 (=0 I i V )

where Tk is the time delay in number of samples for the A:th path, and Hpp+u(n) is the channel frequency response at the (Pp+l ) t h subcarrier. Because the channel is assumed quasi-static within a symbol duration, Hpp +u(n) is assumed constant within a symbol duration, i.e. Hpp +u(n) « Hpp + i j . As a result, the n t h sample of the i th symbol in discrete time is rewritten as

[ Ë ~ U ^ P ^ L ^ \ j2ir(Pp + l )n \ , Vi(n) = w — Y E E HPp+i,iXpA,i exP i ^ r + "^ n )

» i V «=0 p=0 i=0 I i V J

- v TT 2-r z^ HPp+i,ixP,ici,i exP i v r (4-5)

» JV p=0 (=0 l JV 1

+ Vf £ EE^M^CSexp{^±^}+»,„,. » JV m=0,m/u p=0 (=0 I i V )

At the receiver, the FFT block performs demodulation in order to obtain the transmitted symbols with the amplitude and phase corrupted by the channel response and the additive noise. The demodulated received signal at the (Pp + Z)th subcarrier of the i th symbol after the FFT demodulation is expressed as

u - i ypP+i,i = V EcHpp+uXp^Cu + \/Ec Y Hpp+uX^Cu + Wpp+i,i , (4.6)

ro=0,m/u

where the first term is the desired user signal, the second term is the MUI signal, and Wpp+u l s the complex baseband AWGN at the (Pp + / ) t h subcarrier. Let Gpp+/,i be the frequency domain equalization gain factor at the (Pp + Z)th subcarrier and assume that the phase shift caused by the channel can be estimated perfectly. The decision variable X™ for zth symbol at the p t h branch after equalization and despreading for the


uth user can be expressed as

r— L _ 1 v"u IT? YU x* f u r^u r~,u

AP,i y^cAp* 1^ ^ p p + i ^ P p + i i ^ i ^ u

J l L l

. + E \ E c X ™i E GPp+l,iHPp+l,iCl?iC?,i m = 0 , m / u 1=0 L l

+ E Gpp+i,iCijWpp+i4 ■ 1=0

(4.7)

In the case of Zero Forcing (ZF) equalization, sometimes referred to as ORC, the gain factor Gpp+ij is given by (assuming perfect channel estimation)

G 1 Pp+l,i — (4.8)

Hpp+i ti

The corresponding decision variable at the pth branch for the uth user is expressed as1

L l 1

D ; = L ^ E C X ; + Y T T — c ? w P p + l . (4.9)

1=0 n P P + i

Therefore, the decision variable at the output of the parallel to serial converter for the L l

uth user is 1

Du = L^ECXU + Y TJ C?WPp+l 1=0 n P p + i

We denote d as the desired signal component

6 = L J Ë C X ; ,

and £ is the noise component given by

* = E 7^—crwPp+l. 1=0 n P p + l

(4.10)

(4.11)

(4.12)

We assume that the average power of the data symbol before spreading is 2? j|A""| } = 1) where E {•} denotes the expected value, and i?{|Cu | 1 = 1 Therefore, we have

E{d2} = LEa ,

and

E{e} = L l

Y w—WW 1=0 Hpp+i

Pp+i ~ ~TT 2 , n P p + l ■ z 1=0

(4.13)

(4.14)

^ o r the sake of simplicity, the symbol time index i is removed.


The BER conditioned on the set of L subcarrier fading values {Hpp+t} is then given by [63]

Pe = Q 2EX L

\ N o ^ o l H p 2p + l ) '

(4.15)

where Q(-) is the Gaussian Q function and E\, is the bit energy. The Q function is defined as [5]

Q(x) = lerfc U = x > 0

where erfc(x) is the complementary error function and is defined as

2 f°° / \ erfc(x) = —= / exp (— u2) du ,

^JlT Jx v '

and its polar representation is defined as [5]

erfc(x) = — / n Jo

Thus, (4.15) can be rewritten as follows:

2 f t I exp -

sin^0 dB

(4.16)

(4.17)

(4.18)

P e = -erfc 2

E. \ No E U Hp2

p+l (4.19)

It is interesting to note that (4.19) becomes exactly the BER expression for BPSK in an AWGN channel [5] if there is no fading effect, i.e. ^ L - f _2 = 1. Therefore,

this approach will be extended for higher modulation levels such as QPSK, 16QAM, 64QAM and higher. In fact, the SER for a square M-QAM (M = 2k, k is even) signal constellation in term of E S / N Q in an AWGN channel is approximated by [64]

Pr = 4 1 £W 3Ea

( M - 1 ) N 0 (4.20)

where E s = £,ftlog2(M) is the symbol energy. Substituting the Q(-) function in (4.20) by (4.16), the SER can be rewritten as follows:

2 1 -1

\ /M erfc 3EM

2(M - 1)AT0 (4.21)

Inserting v ^ _ i L _2 into (4.21), it follows that the symbol error rate (SER) conditioned ' p p + i

on the set of subcarrier fading values {Hpp+i} for square M-QAM is given by

1 Pe = 2 1

y/M erfc 3ES

p i M - V N o Z l S o ^ H p ^ , - 2 (4.22)


To compute the average SER, Pe must be statistically averaged over the joint distri

bution function of the fading values f (H0 ,H\ , • ■ ,i?Li) for L subcarriers [63], i.e.

roo /-oo P e = ■■■ P J (HQ, H u ■ ■ ■ , H L I ) dHo, dH l t ■■■ , dHL_x . (4.23)

Jo Jo The Hpp+i are identically distributed Rayleigh random variables with Probability Den

sity Function (PDF) given by

/*Pp+i(oO = | e x p ( | Q , * > 0 , (4.24)

where % is the average SignaltoNoise Ratio (SNR) ratio per symbol. We define i/m;n

as the minimum value of the set of L fading values {Hpp+i} for the p t h branch, i.e.

Hm\a = mm.{Hpp ,Hpp + i ,• , H p p + L i ) . . (4.25)

Therefore we have:

Ë T T T 1 ^ ^ (426) 1=0 ( n P p + l ) "min

Substituting the above in (4.23) yields the lower bound expression

P e > P e i— * lower 2 i1 ~ 7M) Cerfc ( v 3 ? S * H '» <**>*** • (4'27)

where /#„,,„ (Hmin) is the PDF of the random variable HmiD. Since the PDF of the random variable Hm[n is Rayleigh distributed, the corresponding instantaneous SNR per symbol 7S = j j f H^ is distributed according to an exponential distribution given by

A.(7) = i « P ' ( ~ ) , 7 * > 0 . (4.28)

The PDF of the minimum order statistic of N i.i.d. exponential distributed random variables is given by [65]

/ 7 s ( % ) = L [ l F 7 s ( 7 , ) ] L 7 7 s ( 7 s ) (429)

where F7,(7S) is the cumulative distribution function (CDF) of variable 7« and is given by

F M = r'U(a)da •>o

f % \ (4.30) = 1 — exp — —

V 7 ,


Inserting (4.28) and (4.30) into (4.29), we have

/ Y . ( 7 . ) = — exp ( — = r ). 7s > 0 7s V 7s ;

Thus the lower bound can be rewritten as

J l n w e r — Z I 1 1 w=

yfM) JO erfc 3L

7s J Â. (7s) « 7s ■

(4.31)

(4.32) W 2 ( M 1 )

Inserting (4.18) and (4.31) into (4.32), the lower bound of the average SER is expressed as

" = H1 ' 7M) L* fexp {2(M3%^e)kexp H E )d l 'd e ■ (4'33)

The inner integral is the moment generating function M7s(s) of 7S [65], given by /■oo _

M i X s ) = \ exp(s7s)/7s(7,)(i7s Jo

= — / exp (s7,) exp I —7« J d 7 7s 'o \ 7s / L f°° 77 ./n 7s Jo

L

exp 7S

L s — —

7s

(4-34) dry.

L S7s '

where s = ~2(M3L)sin2e" Substituting (4.34) into (4.33) gives a single integral form of the lower bound

3 f e _ * 2(Ml)sin26>

Plower " n I 1 yfM ) I

i/i—LWfi TT V y / M ) h \ 377 + 2(M l ) s in 2 0

d0 i

3% (4.35)

de

= 2 1 37s rdfl .

37J + 2 ( M l ) s i n 2 f /

Making use of the definite integral in [66, p. 87], the lower bound is reduced to

127; ■*iower — * I ■*

1 TT

1 1

y/Mj V 3 7 7 [ 2 ( M 1 ) + 3T7]

t a n _ l L / 3 7 s + 2 ( M l ) w

37s (4.36)

2 1 1 37s

2 ( M 1 ) + 3T1


where J 3 l , + p ^ ~ ^ « 1 for large values of %. The lower bound of the average BER for MQAM with Gray code encoding is then given by

psq MQAM ■* lower L log2 M

log2(M) V y/MJ V V 2 (M 1) + 37 '1 ' ^ ) I1 \ /ô? r r r^ r^ I 2 (x 1 \ L / 37log2M

log2(M) V s /MJ V V 2(M 1) + 37log2 M

where 7 is the average SNR per bit and % = 7"log2 M. For QPSK (M = 4) modulation, the BER lower bound expression is given by

,QPSK _ 1 / , / 7 p '^"IVVitrJ (4'38) This BER expression is the same as the BER for BPSK in [63], i.e. it reduces to the BER formula for the single carrier flat Rayleigh fading channel case [36]. Similarly, the BER lower bounds for 16QAM, 64QAM and 256QAM are obtained as follows:

jl6QAM ... 3 I , / 27 ^5T^I

,64QAM . 7 / , / 7 I T ~ » à 11 J ^ I . (4.40)

5256QAM . I 5 / , / 8 7 L 6 4 ^ y 170 + 8 7 / v ;

When an oddfc MQAM (M = 2fc, fc is odd) signal constellation is considered, the lower bound is computed in the same way as for a square MQAM signal constellation except that a tight upperbounded approximation SER in terms of E S / N Q in AWGN channel is used [36]

Note that this expression differs from (4.20) by only a factor, which is a function of M in the latter. Therefore, the BER lower bound in this case is

,oddk MQAM _ 2 I , I 37log2M poddk MyAM _ " 1 1 — / ' b 2 f 4 43)

L " l o g 2 M V V 2 ( M l ) + 3 7 l o g 2 M ; • V ' '

For example, the BER lower bound for 32QAM and 128QAM are obtained as follows:

^2QAM 1 » 4 w 3 s i )


p128QAM 2 / / 2 1 7 \ ( 4 4 5 ) 1 7 V V 254 + 217/ k '

As we have used the approximated BER expressions for both square and odd-fc M-QAM in AWGN channel in this chapter, the resulting BER bounds are not strict lower bounds.

4.3 Conclusion

In this chapter, we have presented an extension of the work of Zhang and Guan [63] in order to compute a lower bound for the BER performance of a downlink MC-CDMA system with M-QAM modulation. We used the same one-dimensional integration approach for computing the lower bound expressions for M-QAM modulation. Computer simulation results of the M-QAM downlink MC-CDMA systems will be presented in Chapter 5 in order to verify these lower bound expressions.

Chapter 5

Simulation results and discussions

5.1 Introduction

This chapter presents computer simulation results using MATLAB software for our MC-CDMA system. As we have mentioned in Section 1.6, the parameters of the OFDM system provide a basis for MC-CDMA system design and simulations. Therefore, before proceeding with the simulation of the MC-CDMA system, we present the simulation results of the OFDM system under two channel models: indoor-to-outdoor/pedestrian and vehicular, then we present the simulation results for the downlink MC-CDMA system under the same channel models. The receiver with the indoor-to-outdoor/pedestrian configuration is chosen for an implementation into an FPGA platform due to its low resource consumption. The synchronization issues of the selected receiver configuration are also simulated and presented in this chapter. Finally, simulations of the complete receiver will be employed in order to compare with simulation results of the same receiver under perfect synchronization conditions.

Furthermore, simulations of the decision-directed-based channel estimation methods for our MC-CDMA system are also presented for the indoor-to-outdoor/pedestrian channel in order to compare the performance of the decision-directed-based channel estimators with the conventional low-pass FIR channel estimator used in the FPGA implementation in Section 3.2.10.

Chapter 5. Simulation results and discussions 111

Bit in Modulator

Pilot

IFFT P/S

Bit out

Demodulator Equalizer

t Channel estimator «—

FFT S/P



' ' Multipath channel

Remove cyclic prefix

Remove cyclic prefix

Figure 5.1: Simulation block diagram for the OFDM system.

5.2 OFDM system simulation results

Simulations are run to determine and analyze the performance of the systems under various configurations and no timing and frequency synchronizations error condition. The Bit Error Rate (BER) as a function of the SNR is plotted for QPSK-, 16QAM-, and 64QAM-OFDM modulations over the indoor-to-outdoor/pedestrian and vehicular channels. Generation of the Rayleigh random variables in order to realize the multipath fading channel is based on Young and Beaulieu's method [67]. The comb-type channel estimation uses the spline (built-in MATLAB function) and low-pass FIR interpolation [54] methods with different pilot tone spacings. The simulation block diagram of the OFDM system is illustrated in Figure 5.1.

5.2.1 Results for the indoor-to-outdoor/pedestrian channel

Figures 5.2, 5.3 and 5.4 show the BER as a function of Eb/N0 for QPSK-, 16QAM- and 64QAM-OFDM over the indoor-to-outdoor/pedestrian channel. The solid curves with diamond markers represent the performance for the system with perfect knowledge of the channel information. The performance of the system with perfect knowledge of the channel information is used as a benchmark for all BER vs. Eb/N0 curves. The solid curves with upward-pointing triangle markers and circle markers represent performance of the system with pilot spacing Nf = 8 and Nf = 12, respectively.


10 15 20

(a) Spline interpolation. (b) Low-pass FIR interpolation.

Figure 5.2: Performance of QPSK-OFDM over the indoor-to-outdoor/pedestrian channel.

10 15 20

(a) Spline interpolation.

10 15 20

(b) Low-pass FIR interpolation.

Figure 5.3: Performance of 16QAM-OFDM over the indoor-to-outdoor/pedestrian channel.

Consider the performance of QPSK-OFDM in figures 5.2(a) and 5.2(b). The BER performance with the spline interpolation technique is always better than with low-pass FIR interpolation. Similar observations can be seen in figures 5.3 and 5.4 for 16QAM-and 64QAM-OFDM, respectively. This is because spline interpolation always provides better accuracy than the low-pass FIR interpolation technique. However, it is complex to implement in hardware. There is a gap of about 2.5 dB between the system with perfect channel knowledge and the system with channel estimators. The degradation


15 20 syNo<dB>


Figure 5.4: Performance of 64QAM-OFDM over the indoor-to-outdoor/pedestrian channel.

comes from the power loss due to pilot insertion, and the additive noise on the pilot subcarriers, making the channel estimation less accurate.

As a result, the spline interpolation mentioned here is used for comparison purposes only. It can be seen that the BER curves with Nf = 8 and Nf = 12 are slightly different in both figures 5.2(a) and 5.2(b). This can be explained two ways: first, smaller pilot spacing leads to accurate channel estimation; second, the number of taps for the FIR interpolation method in the Nf = 12 case is less than in the Nf = 8 case, leading to less channel estimation accuracy. As we can see in Table 1.2, the maximum number of pilot tones per OFDM symbol for Nf = 8 and Nf = 12 are 8 and 6 tones, respectively. Therefore, we set the number of taps used in for Nf = 8 and Nf = 12 as 8 and 6 taps, respectively. It is clear that the latter will have less interpolation accuracy.

Figures 5.2(a) and 5.2(b) also show that the differences between the perfect estimation curve and the spline and low-pass FIR curves for Nf = 8 at low SNR values are constant, about 2.5 dB and 2.6 dB, respectively. Similarly for 16QAM-OFDM, the differences are constant, about 2.6 dB and 2.7 dB, as shown in figures 5.3(a) and 5.3(b), respectively. For 64QAM-OFDM, the gaps are a little bit larger. They are constant, about 2.8 dB and 3.5 dB at low SNR values, as shown in figures 5.4(a) and 5.4(b), respectively. At high SNR values, the performance for low-pass FIR interpolation is worse than for spline because of the low number of filtering taps. As a result, the low-pass FIR method suffers from the deep fades of the channel. For example, consider a BER of 10~3 with Nf = 8 in Figure 5.2(a). The difference between spline and low-pass


(a) Spline interpolation. (b) Lowpass FIR interpolation.

Figure 5.5: Performance of QPSK-OFDM system over the vehicular channel.

FIR interpolation is about 1.25 dB. In order to obtain an equivalent performance similar to the spline interpolation case, increasing the number of taps of the FIR filter is necessary [54]. However, increasing the number of taps means increasing the number of pilot tones per OFDM symbol, i.e. shortening pilot tone spacing, leading to inefficient data transmission and an increasing of the complexity of the interpolator. Therefore, we must consider the trade-off between BER performance and transmission complexity.

In addition, we can see that as the modulation level increases, the BER performance is degraded at the same Eb/N0 value, i.e. given a BER value, a higher order modulation scheme requires a larger signal power. For example, consider a BER of 10~2 with low-pass FIR interpolation and Nf = 8. One can observe that the QPSK-, 16QAM-, and 64QAM-OFDM require a value of Eb/N0 of about 16.8 dB, 21.2 dB, and more than 30 dB, respectively. Furthermore, since low-pass FIR interpolation is much less complex to implement in digital circuits than the spline interpolation, it is reasonable to use the simulation results of low-pass FIR interpolation as a reference for FPGA implementations.

5.2.2 Results for the vehicular channel

Figures 5.5, 5.6 and 5.7 show the performance of QPSK-, 16QAM- and 64QAM-OFDM over the vehicular channel. In these figures, the same representation of the curves for two sets of figures was used, one for spline and another for low-pass FIR interpolation. The markers definition in these figures are also similar to the indoor-to-outdoor/pedestrian


15 20 E,/N0<

dB>

(a) Spline interpolation. (b) Lowpass FIR interpolation.

Figure 5.6: Performance of 16QAMOFDM system over the vehicular channel.

10» ♦■■■■ Perfect

— a — N , = 12

♦■■■■ Perfect

— a — N , = 12


— a — N , = 12


— a — N , = 12

5 10 15 20

VI»W


10 15 20

ft tm (b) Lowpass FIR interpolation.

Figure 5.7: Performance of 64QAMOFDM system over the vehicular channel.

channel as well. In these figures, the number of taps used for lowpass FIR interpolation for both Nf — 8 and Nf = 12 methods is set to 8. As a result, the influence of channel estimation accuracy for the lowpass FIR interpolation technique depends only on pilot tone spacing. No significant difference in BER performance between the curves with Nf = 8 and Nf = 12 can be seen because both pilot spacings satisfy the sampling theorem and the channel estimator in this configuration has higher raw channel infor

mation compared to the indoortooutdoor/pedestrian configuration. This leads to less estimation error at the subcarriers at the edges of the OFDM symbol. Compared to the spline estimator, the lowpass FIR estimator also suffers from the deep fades of the


channel at high Eb/No values similar to the indoor-to-outdoor/pedestrian case.

Consider the performance of QPSK-OFDM shown in figures 5.5(a) and 5.5(b). The difference between the perfect estimation curve and the spline and low-pass FIR curves at low SNR values are about 2.4 dB and 2.5 dB, respectively. For 16QAM-OFDM, the gaps are about 2.5 dB and 2.7 dB for the spline and low-pass FIR interpolation at low SNR values as shown in figures 5.6(a) and 5.6(b), respectively. For 64QAM-OFDM, the gaps are a bit larger. They are about 2.6 dB and 3.7 dB at low SNR values as shown in figures 5.7(a) and 5.7(b), respectively.

Similar observations to the ones presented for the indoor-to-outdoor/pedestrian environment can be made for the BER performance differences between spline and low-pass FIR interpolation in the vehicular environment. For example, the difference between the two methods is about 2.4 dB at a BER of 10 - 3 in Figure 5.5 while the differences are about 3 dB at a BER of 3 x 10~2 for 16QAM-OFDM and 6.5 dB at a BER of 3 x 10~2 for 64QAM-OFDM, as shown in figures 5.6 and 5.7, respectively.

The impact of modulation on the BER performance of the OFDM system is also worthy of mention for this channel. Consider a BER of 10~2 with low-pass FIR interpolation and a pilot spacing N f = 8. QPSK-, 16QAM-, and 64QAM-OFDM require a value of Eb/No of about 17.3 dB, 21.7 dB, and more than 30 dB, respectively. Since the vehicular channel is a much more severe channel, BER performance in this case is worse than for the indoor to outdoor/pedestrian channel, as expected. The system requires more transmit power in order to obtain the same BER performance compared to the indoor-to-outdoor/pedestrian channel.

5.3 MC-CDMA systems simulation results

The MC-CDMA system is simulated in a similar manner to the OFDM system, as in Section 5.2, i.e. no timing and frequency synchronization error, different channel conditions, channel estimation methods and pilot tone spacing values. The only difference is that the MC-CDMA system is simulated under different numbers of active users (Au = 1,4,8) as illustrated in Figure 5.8. In this figure, user 1 is assumed to be the desired user; other user signals are considered as Multiple User Interference (MUI) components with respect to the desired user's signal. Because the orthogonal spreading codes are used in conjunction with Zero Forcing (ZF) equalization, i.e. Orthogonal Restoring Combining (ORC), the MUI components are theoretically eliminated in the downlink communication scheme as mentioned in Section 4.2. Simulation results of


Bit in

User 8

Modulator

Bit out Demodulator

Spread

Despread

■e Pilot

. .

p Channel estimator Pilot

IFFT P/S

FFT +■ S/P


Multipath channel

Remove cyclic prefix 4 —

Figure 5.8: Simulation block diagram for the downlink MCCDMA system.

10° — ♦ — Perfect —&— 1 user (no tnteferer) O 4 users (3 interferers) D 8 users (7 interferes)

10°

10 15 20 E / y d B )

25 30

(a) Indoortooutdoor/pedestrian.

Perfect 1 user (no inteterer)

O 4 users (3 interferers) * 8 users (7 interferers)

10 15 20 E^ldS)

(b) Vehicular.

25 30

Figure 5.9: Impact of the number of active users on the performance of the QPSK

MCCDMA system.

the QPSKMCCDMA system with different numbers of active users (Nu = 1,4,8) in Figure 5.9 illustrate the impact of the number of users on BER performance versus


10 15 20


10 15 20


Figure 5.10: Performance of QPSK-MC-CDMA over the indoor-to-outdoor/pedestrian channel.

Eb/No of the desired user.

We can see that the BER performance of the desired user in the case of a single user (no interférer), 4 active users (desired user + 3 interferers) and 8 active users (desired user + 7 interferers) is invariant, as could be expected in the absence of synchronization errors given orthogonal codes, as expected. Therefore, the remaining simulation results in the following sections will be provided for a single user for the sake of simplicity.

5.3.1 Results for the indoor-to-outdoor/pedestrian channel

Figures 5.10, 5.11 and 5.12 show the BER as a function of the Eb /N0 for QPSK-, 16QAM- and 64QAM-MC-CDMA over the indoor-to-outdoor/pedestrian channel. The solid curves with diamond markers represent the performance for the system with perfect knowledge of the channel information. The solid curves with upward-pointing triangle markers and circle markers represent performance of the systems with pilot spacing Nf = 64 and Nf = 94, respectively.

We can see that the BER performance of the systems with the spline interpolation method is always better than for the low-pass FIR interpolation method. The effect of pilot spacing on the BER performance of MC-CDMA is similar to that of OFDM. The BER performance of the system for a pilot spacing of Nf = 64 is always better than the one for a pilot spacing of Nf = 94, especially at high SNR values. The differences


10 15 20


Figure 5.11: Performance of 16QAM-MC-CDMA over the indoor-to-outdoor/pedestrian channel.

15 20


Figure 5.12: Performance of 64QAM-MC-CDMA over the indoor-to-outdoor/pedestrian channel.

are about 1.5 dB at a BER of 10 - 3 , 2.5 dB at a BER of 2 x 10~3, and 2.7 dB at a BER of 5 x 10~3 for QPSK-, 16QAM-, and 64QAM-MC-CDMA, respectively. There are no significant differences between spline and low-pass FIR in both Nf = 64 and Nf = 94 because the modulated data symbol is spread in the frequency domain, i.e. the same modulated data symbol is transmitted over several different subcarriers (subchannels). As a result, the MC-CDMA receiver takes advantage of this frequency diversity scheme over the OFDM system by combining different equalized data from different subcarriers


10 15 20


E^N0<dB

>


Figure 5.13: Performance of QPSK-MC-CDMA over the vehicular channel.

in order to obtain the transmitted symbol.

10 15 20 E,/N0(dB)


10 15 20

■WdB

>


Figure 5.14: Performance of 16QAM-MC-CDMA over the vehicular channel.

The impact of modulation schemes is similar to the results obtained for OFDM systems. Consider a BER of 10~2 with low-pass FIR interpolation and pilot spacing Nf = 64. In this context, QPSK-, 16QAM-, and 64QAM-MC-CDMA require Eb/N0

values of about 17.3 dB, 20.1 dB, and 24 dB, respectively.


— 6 — Perfect — A — N,= 64 — 6 — Perfect — A — N,= 64 — 6 — Perfect — A — N,= 64 — 6 — Perfect — A — N,= 64

^ $ N / N > ^ t a . >ft \*

I "St. T t

S 10 15 20


10 15 20

E,/VdB>


Figure 5.15: Performance of 64QAM-MC-CDMA over the vehicular channel.

5.3.2 Results for the vehicular channel

Simulation results of MC-CDMA over the vehicular channel are presented in figures 5.13, 5.14 and 5.15. We can see that there are small BER performance gaps compared to the corresponding results obtained for the indoor-to-outdoor/pedestrian channel. This is because the vehicular channel has a longer delay spread and a larger Doppler shift than the indoor-to-outdoor channel. No significant differences in BER performance can be seen between pilot spacings of 64 and 94 subcarriers for all modulations because the channel estimator in this configuration has a large number of raw channel information at pilot subcarriers for the interpolation process, i.e. 32 pilot subcarriers per symbol (22 pilot subcarriers per symbol in the Nf = 94 case) versus 8 pilot subcarriers (6 pilot subcarriers for Nf = 94) per symbol for the indoor-to-outdoor/pedestrian configuration in the channel estimation process (see tables 1.4 and 1.8).

Similar impact of modulations on BER performance to the one observed in the indoor-to-outdoor/pedestrian channel can be seen. Consider a BER of 10 - 2 with low-pass FIR interpolation and pilot spacing Nf = 64. In this context, QPSK-, 16QAM-, and 64QAM-MC-CDMA systems require Eb /N0 values of about 17.5 dB, 21 dB, and 24.5 dB, respectively.


10°

^ ^ Ï Ç ! ^ , ^

■ 32-QAM sim ♦ 128-QAM s*n

- H - 32-QAM bound - ♦ - 128-QAM bound

: :■ • ;: ï:::

: : : : ; ::

: : : : j : : : : : : : : : : * ■ 32-QAM sim ♦ 128-QAM s*n






15 20 W

dB>

15 W

dB>

(a) Square MQAM MCCDMA. (b) Oddfc MQAM MCCDMA.

Figure 5.16: Lower bound on downlink BER performance.

5.4 Lower bound for downlink BER performance of the MCCDMA system

As we have mentioned in Section 4.2, perfect channel knowledge is assumed at the receiver and there are no significant differences in BER performance for different num

bers of users because the MUI component is completely eliminated in this downlink MCCDMA scheme. Thus, computer simulations are performed for a single user with conditions similar to that mentioned in Section 5.3, i.e. no timing synchronization er

ror and no CFO. The BER analytical lower bound and computer simulation results for an MCCDMA system with QPSK, 16QAM, 64QAM and 256QAMMCCDMA modulations are shown in Figure 5.16(a). In these figures, the dashed and the solid curves represent the lower bound and the simulation results. The solid curves for QPSK, 16QAM, and 64QAMMCCDMA in Figure 5.16(a) are similar to the per

formance of the system with perfect channel estimation presented in Section 5.3. For oddA; signal constellations, 32QAM and 128QAMMCCDMA are simulated as shown in Figure 5.16(b). As we can see, the lower bound is very close to the simulation results. Therefore, the computed lower bound expressions in (4.37) and (4.43) can be used to compute quickly and accurately the performance of the downlink MQAM MCCDMA scheme.

■

■


M,(n) M,(n)

.^M<J+*^^<ty^^%t\ '< 200 400 600 800 1000 1200 1400

(a) Eb/No = 5 dB. (b) Eb/N0 = 20 dB.

Figure 5.17: Simulation of coarse frame detection.

5.5 Timing and frequency synchronization simula

tion results

This section presents the computer simulation results of preFFT and postFFT syn

chronizations. The preFFT synchronization consists of the coarse timing and the frac

tional Carrier Frequency Offset (CFO) synchronization which are performed in the time domain. In postFFT synchronizations, the fine timing and integer CFO estimations are performed in the frequency domain. In order to allow the postFFT synchronization to work effectively in the frequency domain, the coarse timing and frequency synchro

nizations in the time domain must be performed first before proceeding to the frequency domain synchronization. Since the receiver for the indoortooutdoor/pedestrian con

figuration was chosen for implementation into an FPGA platform, simulation results are only presented for this channel model.

5.5.1 Coarse timing simulation results

Given the preamble design in Section 2.2.2, computer simulations of the proposed coarse frame detection technique similar to the method in [11, 12] at some SNR values, i.e. 5 dB and 20 dB, over the indoortooutdoor/pedestrian channel are illustrated in figures 5.17(a) and 5.17(b), respectively. We can see that the peak of the second timing metric (2.3) is always located at the first sample of the 9 th short training symbol for both low and high SNR conditions. As a result, the start of data position in a frame in


i

09

0.8

07

0.6

0.5

0 4

03

0 2

0 t -

-10 - 8 - 6 - 4 - 2 0 2 4 CFO (normalized to i f )

0.9

~T -^ ~ Proposed

Method in [14]

t 0.8

« I

0 7

1 0.6

S 0.5

•5 ° * £ 3 0.3 a

£ 0.2

0.1

0

" t 0.8

« I

0 7

1 0.6

S 0.5

•5 ° * £ 3 0.3 a

£ 0.2

0.1

0

t 0.8

« I

0 7

1 0.6

S 0.5

•5 ° * £ 3 0.3 a

£ 0.2

0.1

0

t 0.8

« I

0 7

1 0.6

S 0.5

•5 ° * £ 3 0.3 a

£ 0.2

0.1

0

. . . , . . f . \ \ , , IK-I

■ • ' - ■ : - , V : <

; :

t 0.8

« I

0 7

1 0.6

S 0.5

•5 ° * £ 3 0.3 a

£ 0.2

0.1

0

. . . , . . f . \ \ , , IK-I

■ • ' - ■ : - , V : <

; : ' 1

x

1 i ^

t 0.8

« I

0 7

1 0.6

S 0.5

•5 ° * £ 3 0.3 a

£ 0.2

0.1

0

t 0.8

« I

0 7

1 0.6

S 0.5

•5 ° * £ 3 0.3 a

£ 0.2

0.1

0

4 ■■■.MIM-I-.-JI 1 1 1 ■ 1 \—i - 4 - 2 0 2 4 6 8 10

CFO (normalized to Af)

(a) Eb/N0 = 5 dB. (b) Eb/No = 20 dB.

Figure 5.18: Probability of correct frame boundary detection.

10 15 20 W d B )

30

Figure 5.19: RMS timing error of the coarse timing synchronizer.

Figure 2.1 is obtained by simply adding an amount of Na = 64 samples to the resulting peak position of the second timing metric. In order to evaluate the performance of the proposed timing estimator, it is simulated over a wide range of CFO values at low and high SNR conditions. The following parameters are used in the simulations:

• Sampling frequency: fa = 5 MHz

• Channel model: indoortooutdoor/pedestrian


• Input fractional CFO range: [-10A/, 10A/]

• Number of frames per CFO point: 10000

Timing synchronization error statistics at Eb/No = 5 dB and Eb/No = 20 dB versus CFO are shown in figures 5.18(a) and 5.18(b), respectively. In these figures, the solid curves represent the proposed CFO estimator while the dashed curves represent the reference method in [14]. The simulation results show that the proposed synchronization technique is able to accurately detect the frame boundary using the proposed preamble in a wide range of CFO G [-4A/, 4A/] at Eb/N0 = 5 dB and Eb /N0 = 20 dB while the method in [14] shows poor synchronization performance using the same proposed preamble. The drawback of the proposed timing synchronization is that the initial CFO of the received signal must be limited within ± 4 A / so that the coarse timing estimator works efficiently. Therefore, integer CFO correction must be used in order to reduce the CFO to the effective operation range of the coarse timing estimator. The Root Mean Square (RMS) estimation error of the proposed method with various SNR values is shown in Figure 5.19. Obviously, we can see that the proposed method is much better than the reference method. Therefore, we can conclude that the coarse timing synchronization methods in [11, 12] can be adapted to our MC-CDMA system.

5.5.2 Fractional CFO estimation simulation results

As we have mentioned earlier in Section 2.2.3, the proposed method not only provides the same CFO estimation range £ [—4A/, 4A/] as the technique in [14], but also significantly reduces the complexity by eliminating the use of the auto-correlation process, by eliminating a complex multiplier, a 256-word RAM, and an accumulator. The following parameters were used in the simulations:


• Channel model: indoor-to-outdoor/pedestrian

• Input fractional CFO range: [—5A/, 5A/]

• Number of frames per CFO point: 10000

Figures 5.20(a) and 5.20(b) compare the proposed CFO estimation scheme with the methods in [11, 14]. In these figures, the solid curves represent the proposed CFO estimator while the dashed and dash-dot curves represent the method in [11] and [14],


' Method in [14] Method in [11)

■ Proposed method

- 2 - 1 0 1 2 CFO offset (normalized to Af)

(a) Eb/N0 = 5 dB.

- 2 - 1 0 1 2 CFO offset (normalized to Af)

(b) Eb/N0 = 20 dB.

Figure 5.20: Proposed fractional CFO estimator performance.

respectively. We can clearly see that the method in [11] only estimates the CFO within ± A / as we have mentioned in Section 2.2.3, while the proposed method is able to estimate a wide range of CFO at both low and high SNR values, as expected. At Eb/No = 5 dB, the proposed method shows some errors when the input CFO is near ± 4 A / because of phase ambiguity of the argument function (2.13). Furthermore, it is not as smooth as the method in [14] due to the lack of averaging of the CFO estimate results. Therefore, an optional simple moving average circuit at the output of the estimator could be used in order to obtain more accurate results.

5.5.3 Fine timing simulation results

Simulation results of the frequencydomain fine timing synchronization for our MCCDMA receiver mentioned in Section 2.2.4 are discussed here. Similar to preFFT synchro

nization (coarse timing and fractional CFO synchronization), simulations of fine timing are performed at both low and high SNR values in order to verify the performance of this synchronization technique adapted from OFDM systems [49] and used in our MCCDMA system. In the simulations, the fine timing synchronization works after the coarse timing synchronization is done such that the orthogonality between the subcar

riers is not effectively lost. The following parameters were used in the simulations:

Sampling frequency: fa = 5 MHz

Channel model: indoortooutdoor/pedestrian


0.8 ■

9 5 0.6 S 8 0 5

•s £ 0.4

1 ° 3 Q.

0.2

0.1

H-e-D a a a n

- 4 - 2 0 2 Sample

(a) Eb/N0 = 5 dB.

6 8 10

(b) Eb/N0 = 20 dB.

Figure 5.21: Probability of correct fine timing synchronization.

101

h 10"

5 or.

1 0 -

10

D Full pilot subcarriers ; D Full pilot subcarriers ;

O Half pilot subcdiiiets

; :- ;

^TTTrrTTS'Cl^TTrrr^ — " o e : 1)

TH

OU

1

10 15 20 25 30

Figure 5.22: RMS timing error of the fine timing synchronizer.

Input timing offset: [—4,4] samples

Number of frames per simulation: 10000

Figures 5.21(a) and 5.21(b) show fine timing synchronization statistics for both low and high SNR values. It can be seen in these results that the statistics of correct timing estimate at the input timing offset of 4 samples are very low due to phase ambiguity of the argument function (2.17). Therefore, the actual fine timing offset


- 4 - 2 0 2 4 CFO (normalized to Af)

(a) Input CFO = 6A/.

- 4 - 2 0 2 4 CFO (normalized to Af)

(b) Input CFO = - 6 A / .

Figure 5.23: Integer CFO correlator output at Eb /N0 = 20 dB.

tracking range is limited to the range [—4, 3) samples. From an implementation point of view, the fine timing estimation accuracy depends on the number of pilot tones used in the average correlation function (2.16). Figure 5.22 shows the RMS error of the estimator in the case of full pilot subcarriers (8 pilot subcarriers for the indoor-to-outdoor/pedestrian configuration) and half pilot subcarriers. We can see that there is an RMS error floor at high SNR values because of the deep fades of the channel. Depending on the target BER of a specific application, the number of pilot subcarriers for the fine timing estimator must be carefully chosen in order to reach an appropriate trade-off point between accuracy and complexity.

5.5.4 Integer CFO estimation simulation results

The frequency-domain integer CFO estimator in Section 2.2.5 was simulated in a wide range of input integer CFO in order to evaluate the performance of the estimator derived from the OFDM system [50]. With the presence of the integer CFO in the received signal, the orthogonality between the subcarriers is significantly lost. Assuming that the integer CFO is within [—4A/, 8A/] , the time-domain coarse timing synchronization must be performed first so that it is able to detect a significantly disturbed signal during the acquisition process of the receiver. As a result, the integer CFO estimator will effectively estimate the integer CFO. In the simulations of the integer CFO estimator, it is reasonable to assume that the data frame is correctly detected by the coarse timing synchronizer in order to evaluate the performance of the estimator. The following

'


a ° * f I I 07 o S 0.6

g 0.5

•S 0.4 * f 0.3 I o- 0.2

0 1

0

B B H B B B O B O O B U P B O B B B B I I

_ i 1 i i _ - t o - 8 - 6 - 4 - 2 0 2 4 6 8 10

CFO (normalized to Af)

S 0.5

"5 0.4 f 1 0.3 E d- 0.2

m o D O O B n a □ D p B a a D -B - o Q a M

■

i . i

(a) Eb/N0 = 5 dB.

-10 - 8 - 6 - 4 - 2 0 2 4 CFO (normalized to Af)

(b) Eb/N0 = 20 dB.

Figure 5.24: Probability of correct integer CFO synchronization.

■H— Full pilot subcarriers ^ — Half pilot subcarriers

15 20 W d B )

Figure 5.25: RMS error of the integer CFO synchronizer,

parameters are used in the simulations:



• Input integer CFO range: [-10A/, 10A/]

• Pilot spacing: Nf = 64


• Number of frames per CFO: 10000

The cross-correlation metrics of the estimator with the input CFO in the ± 6 A / range are illustrated in figures 5.23(a) and 5.23(b). In these figures, the correlation peaks indicate exactly the amount of integer CFO introduced at the input samples. Probabilities of the estimated integer CFO versus SNR values are shown in figures 5.24(a) and 5.24(b) using all pilot subcarriers, i.e. 8 pilot subcarriers with pilot spacing Nf = 64 subcarriers. It can be seen that this method accurately estimates the integer CFO for Eb/No equal to 20 dB. At Eb/N0 = 5 dB, the probability of correct estimation of the integer CFO is about 0.8 due to significant amount of the additive noise. The RMS error of the estimator with different numbers of pilot subcarriers versus various SNR values is shown in Figure 5.25. It can be seen that the estimator with full pilots (8 pilot subcarriers) gains 4 dB in terms of Eb/N0 compared to using half of the total pilot subcarriers. Therefore, we can conclude that the frequency-domain integer CFO estimation method for the OFDM system is adaptable to our MC-CDMA system.

5.6 Complete receiver simulation results

With all the synchronization aspects having previously been studied, a complete MC-CDMA receiver for the indoor-to-outdoor/pedestrian configuration has been simulated with the closed-loop timing/frequency synchronizations and the comb-type low-pass FIR channel estimation method. The following parameters are used in these simulations:



• Modulation scheme: QPSK, 16QAM, 64QAM

• Pilot spacing: Nf = 64 subcarriers

• Spreading code: length-8 OVSF code

• Normalized natural frequency: fn/fa = 0.05

• Loop filter damping factor: n = 0.707

• Loop filter proportional gain: Kp = 0.8884

• Loop filter integral gain: K = 0.3948


10

UJ 10" m

10

10

0 □ A

Ô

B A

QPSK with perfect sync. 16QAM with perfect sync. 64QAM with perfect sync. QPSK with sync. 16QAM with sync. 64QAM with sync.

10 15 20 25 30

Figure 5.26: BER performance of the complete MCCDMA receiver.

10

O QPSK with perfect sync. - 0 — QPSK with sync. (1-user). - B — QPSK with sync. (4-user). - 6 — QPSK with sync. (8-user)

10 15 E,/No<

dB>

20 25 30

Figure 5.27: BER performance of the complete QPSKMCCDMA receiver with differ

ent numbers of active users.

• Randomly generated input CFO: [4A/, 8A/]

In the simulations, we assume that the input CFO is limited to [—4A/, 8A/] so that the receiver can initially coarse synchronize with the transmitted frame in order to allow the integer CFO to work effectively. All pilot subcarriers are used in both postFFT timing/frequency synchronizations, i.e. 8 pilot subcarriers. After that, the receiver


15

Figure 5.28: BER performance of the complete 16QAM-MC-CDMA receiver with different numbers of active users.

10"

10

0 64QAM with perfect sync. - 0 — 64QAM with sync. (1-user). - H — 64QAM with sync. (4-user). - 6 — 64QAM with sync. (8-user)

10 15 20 25 30

Figure 5.29: BER performance of the complete 64QAM-MC-CDMA receiver with different numbers of active users.

performs the fine timing and fractional CFO estimation tracking loops during the next frames in order to lead the receiver to the steady state. Once the receiver reaches the steady state, the BER counter counts the number of error bits and computes the BER. The BER performance of the complete receiver for a single user were compared to the performance of the receiver under perfect synchronization conditions similar to


0 QPSK with pecfect sync. O 16QAM with pecfect sync D 64QAM with pecfect sync + — QPSK with sync. ♦ — 16QAM with sync.

- 64QAM with sync.

10 15

Figure 5.30: BER performance of the system over an AWGN channel.

that mentioned in Section 5.3, i.e. no timing and frequency synchronizations error, as shown in Figure 5.26. We can see that the BER performance of the receiver with QPSK modulation (solid diamond curve) is almost similar to perfect synchronization (dashed diamond curve). There is a small difference at Eb/No = 30 dB due to small residual CFO and timing synchronization errors at the deep fades of the channel that were mentioned earlier. Similar observations can be made for the BER performances of 16QAM and 64QAM modulation. The effect of multiple active users on the BER performance is also presented in figures 5.27 to 5.29 for the complete receiver. The simulation results show that the BER performance is consistent with the case of almost no MUI component, as expected.

Additional simulation results of the single user system (no simulation of multiple active users provided due to no noticeable BER differences) over an AWGN channel are also showed in Figure 5.30. They can be compared with the measurement results of the receiver implemented in FPGA (Chapter 6) because testing of the real receiver has been carried out in a static indoor laboratory condition similar to that of an AWGN channel. We can see that there are no noticeable differences between the BER performance of the system with perfect synchronization and the system with synchronization, as shown in this figure.


0.001

0.0009 -

0.0008 -

g 0.0007

0.0006

0.0005 0.1 0.2 0.3 0.4

Threshold 0.5

Figure 5.31: BER performance versus virtual pilot selection thresholds (Eb/No = 30 dB).

5.7 Decision-directed channel estimation simulation results

In this section, simulations of the decision-directed based channel estimators for downlink MC-CDMA system mentioned in Section 2.3 are presented. For the sake of simplicity, only the QPSK-MC-CDMA system is simulated for channel estimator performance comparison purposes. The predefined virtual pilot selection threshold in the decision-directed virtual pilot based channel estimation scheme is obtained by a numerical simulation approach. For example, given a target Eb/N0 value of 30 dB, the system in Figure 2.7 is simulated with various values of the virtual pilot threshold selection as shown in Figure 5.31. Simulation results indicate that a reasonable threshold value is around Ô = 0.07. In [38], the cut-off frequency in the transform domain channel estimation is dynamically selected at each simulation loop. However, in order to reduce the processing power at the receiver, the cut-off frequency can be computed beforehand using simulations. Figure 5.32 shows the BER performance of the system with the decision-directed transform domain algorithm at a target Eb/No = 30 dB and various cut-off frequencies and iteration values. We can see that as the number of iteration increases, we get better BER performance. It can be shown that a good value for the cut-off frequency is usually larger than the channel length. For example, the cut-off value is larger than 10 samples from DC in this figure. For higher values of the cut-off frequency, no significant improvement can be seen as the number of transform domain iterations increases. Therefore, we will fix the number of iterations to 4 for the following


15 20 "cut-off" frequency

Figure 5.32: BER performance versus cut-off frequencies (Eb/N0 = 30 dB).

10° Perfect Lineer interp.

B— Decision-directed iterative FFT

5 10 15 20

(a) Pilot spacing Nf — 64.

Perfect Linear interp Decision-directed iterative FFT

(b) Pilot spacing Nf = 94.

Figure 5.33: Decision-directed virtual pilot versus iterative transform domain method over the indoor-to-outdoor/pedestrian channel.

simulation results in order to study the trade-off between performance and complexity.

Comparisons of the BER performance of the decision-directed virtual pilot and the iterative transform domain methods with different pilot tone spacing values under both indoor-to-outdoor/pedestrian and vehicular channels are presented in Figure 5.33. In these simulation results, the initial channel estimation results are obtained using the simplest channel estimation method: a first-order linear interpolation as suggested

.


Perfect ' > Linear interp

— B — Decision-directed iterative FFT ^* Decision-directed virtual pilot

^K| 5 10 15 20

E^N0(dB)

(a) Pilot spacing Nf = 64.

Perfect -=^— Linear interp - B — Decision-directed iterative FFT ~^—~ Decision-directed v

10 15 20 E,/N0(dB)

(b) Pilot spacing Nf = 94.

Figure 5.34: Decision-directed virtual pilot versus iterative transform domain method over the vehicular channel.

in [40] in order to compare the proposed channel estimation methods with the initial estimation results. We can see that the decision-directed virtual pilot curves show no significant improvements over the indoor-to-outdoor/pedestrian and vehicular channels. This is because the selected virtual pilots in our channel model are unreliable due to error propagation in the decision-directed channel estimation. Therefore, the decision-directed process does not aid much during the selection of virtual pilots. There is just a very small performance improvement of the decision-directed iterative transform domain method at high SNR values. This leads to a question: does the decision-directed process provide any improvement for both channel estimators in the vehicular channel? Consider the simulation results of the two channel estimators in the vehicular channel as illustrated in Figure 5.34. We can see that the decision-directed virtual pilot-based method again does not provide significant improvements, while the improvements can easily be seen for the decision-directed iterative transform domain method, i.e. the solid curves with square makers in figures 5.34(a) and 5.34(b). At a BER = 2 x 10"3, the improvements are about 1.2 dB and 2.7 dB for pilot spacings Nf = 64 and Nf = 94, respectively.

However, our objective is to improve the channel estimation quality on the receiver to be implemented in an FPGA platform, i.e. the receiver with the indoor-to-outdoor/pedestrian configuration. As a result, for the decision-directed overlap Linear Minimum Mean Square Error (LMMSE) channel estimator, simulation results are limited to this channel condition. This method is expected to provide better performance improvement than the two other methods. BER performance comparisons of the pro-


10 15

Figure 5.35: BER performance comparison of the overlap LMMSE estimator (linear interp.).

15 W B >

Figure 5.36: BER performance comparison of the overlap LMMSE estimator (FIR interp.).

posed decision-directed overlap LMMSE estimator are shown in figures 5.35 and 5.36 with the initial channel estimation obtained by the linear and the low-pass FIR interpolation, respectively. The conventional pilot-based LMMSE channel estimator in [29] is also simulated for reference purposes (solid curves with circle markers). We can see that there is a small difference (about 0.5 dB at a BER = 10 -3) between the two ini-


26.2

■ Proposed - - - FIR interp.

Conventional LMMSE

0 50 100 150 200 250 300 350 400 450 Sub-matrix size

Figure 5.37: Eb/No versus submatrix size (target BER = 10 3)

x10

D Proposed - - ~ FIR interp.

Conventional LMMSE

100 150 200 250 300 350 400 Submatrix size

Figure 5.38: Complexity versus submatrix size (target BER = 10 3) .

tial interpolation methods over the overlap LMMSE method with a fullsize 448 x 448 channel correlation matrix (the solid curves with diamond markers) while there are no noticeable difference on the overlap LMMSE estimator with the 64 x 64 channel correlation submatrix (solid curves with square makers). Simulation results show that there are performance improvements compared with the above decisiondirected chan

nel estimators, as expected. The improvements are about 1.8 dB and 0.8 dB for the channel correlation matrix sizes 448 x 448 and 64 x 64, respectively. There is a 0.2 dB performance loss between the overlap LMMSE estimator with a 64 x 64 channel correla


tion matrix size and the conventional pilot-based LMMSE estimator. Furthermore, the proposed decision-directed overlap LMMSE estimator not only provides performance that is close to the conventional LMMSE estimator, but also substantially reduces computation complexity by a factor of about 6 (see Appendix C).

Intensive simulations of the proposed estimator with various sub-matrix sizes were performed in order to find a trade-off between complexity and BER performance. Figure 5.37 shows the Eb/No gain in terms of sub-matrix size for the proposed decision-directed overlap LMMSE estimator at a target BER of 10 - 3 . Figure 5.38 illustrates the number of complex multiplications (CM) versus sub-matrix size at the same target BER. In this figure, complexity of the conventional FIR interpolation and the conventional LMMSE estimator were used as the lower (3584 CMs/symbol) and upper (204288 CMs/symbols) limits, respectively. For example, we can see that the overlap LMMSE estimator with a sub-matrix size of 64 x 64 is a good trade-off between performance and complexity.

5.8 Conclusion

This chapter has presented computer simulation results of the OFDM and the downlink MC-CDMA systems of section 1.6 over the indoor-to-outdoor/pedestrian and the vehicular channels. Lower bounds on BER performance of downlink MC-CDMA were also verified with simulations. The lower bounds can be used to compute quickly and accurately the performance of the downlink M-QAM MC-CDMA scheme. The receiver with the indoor-to-outdoor/pedestrian configuration was selected to be implemented in FPGA because its complexity was estimated to fit into our current FPGA platform. A new preamble structure was proposed for this receiver and lead to a joint coarse timing and fractional CFO synchronization scheme. Simulation results of timing and frequency synchronizations showed that the proposed preamble provided better synchronization performance compared to existing methods. Furthermore, simulation results of frequency-domain fine timing and integer CFO synchronizations for the OFDM system worked well in our MC-CDMA system. Complete receiver simulation results showed that the true receiver performance was close to the performance of the ideal receiver, i.e. the receiver with perfect timing and frequency synchronization.

Further improvements on channel estimation for our MC-CDMA system were also studied in this chapter. Three approaches for decision-directed-based channel estimation for downlink MC-CDMA systems were proposed: decision-directed virtual pilot selection, decision-directed iterative transform domain, and decision-directed overlap


LMMSE estimators. The simulation results showed that the first two methods did not provide much improvement for the indoor-to-outdoor/pedestrian channel. In the vehicular channel, a significant improvement can be seen for the decision-directed iteration transform-domain estimator. However, our goal is to improve the channel estimation quality for the indoor-to-outdoor/pedestrian configuration receiver. Therefore, the third decision-directed-based method was used in order to improve channel estimation quality for the indoor-to-outdoor/pedestrian channel. Simulation results show that there are performance improvements compared to the first two decision-directed channel estimators, as expected. The decision-directed overlap LMMSE estimator with the channel correlation sub-matrix of size of 64 x 64 was chosen to be a good trade-off between performance and complexity for further hardware implementations.

,

Chapter 6

MC-CDMA downlink receiver testing

6.1 Introduction

In this chapter, the implemented baseband MC-CDMA downlink receiver is tested in a static laboratory wireless channel environment. The measurement setup is illustrated in Figure 6.1. The transmitter is implemented in the same FPGA platform as the one used to implement the receiver (the implementation of the transmitter is detailed in Appendix B). The transmit computer sends a periodic data frame consisting of 5 MC-CDMA symbols in the 2.4 GHz band with an FFT size of 512, a pilot spacing Nf = 64 subcarriers, and a spreading code of length L = 8. Test patterns were generated with MATLAB software and transmitted over the wireless channel. A test pattern consisted of a training symbol and 5 data symbols and could be downloaded to the transmitter via custom software developed in Visual C + + 2005. The transmitter control software allows the user to download data files to be transmitted to the transmitter and controls the transmitter RF front-end functions such as: channel, antenna, transmit power etc, as shown in Figure 6.3. The measurements are conducted with the aid of an Agilent digital oscilloscope and an HP spectrum analyzer. The custom receiver control software on the receiver computer is also written using Visual C + + 2005 as shown in Figure 6.4. A software-based BER counter is also implemented in the receiver control software in order to evaluate the performance of the implemented receiver. Measured waveforms at the output of the integrated debugger interface of the receiver are obtained using a digital oscilloscope and reproduced using MATLAB software. Recall that internal data paths of the modules inside the receiver were multiplexed and fed to a pair of spare

Chapter 6. MC-CDMA downhnk receiver testing 142

Digital oscilloscope

Figure 6.1: MC-CDMA system measurement setup.

Figure 6.2: Photo of the receiver testbed.

DAC chips on the receiver's FPGA platform as mentioned in Section 3.2.15. A photo of the receiver testbed is shown in Figure 6.2. The following section will present the measurement results of the implemented receiver.

Chapter 6. MCCDMA downlink receiver testing 143

& MC -CDMA Tx Control

- Transceiver

r Transceiver 1

r Transceiver 2

f t Transceiver 3

r Transceiver 4

r Set t ing——

Channel

Band select

Antenna

Gain

r Set t ing——

Channel

Band select

Antenna

Gain

F d r Set t ing——

Channel

Band select

Antenna

Gain

r Set t ing——

Channel

Band select

Antenna

Gain

|2.4 GHz i l

r Set t ing——

Channel

Band select

Antenna

Gain

r Set t ing——

Channel

Band select

Antenna

Gain

lAntenna B d

r Set t ing——

Channel

Band select

Antenna

Gain

r Set t ing——

Channel

Band select

Antenna

Gain [31

r Set t ing——

Channel

Band select

Antenna

Gain

SW Reset

Transmit

Load File

Quit

Figure 6.3: Transmitter control software.

Receive Power ConvoirSon Autocorrelator Autocorreiatar Level Peak level Real Imag

Freq. Offset (x 152.6 Hz) 14

Coarse Tirairiy Offset (sample) 67

Fine Timing Offset (sample) 0

In t Freq. Offset (sobcaiTier) 0

RxFIFO Status Fid!

ADC Status Normal

Receiver Settings

Transmit Message

Mediation j 64-QAM User No. [ j

Welcome to Univers i té L a v a l Bienvenue à l 'Université L a v a i Quebec ci ty Canada. This is an Muft i carr ie r Code Division Mult iple Access (MC CDMA) rece iver t e s t f i e . This rece iver is implemented using V e r y H

Receive Message

Welcome to Univers i té L a v a l Bienvenue à l 'Universi té L a v a l <(uebec c i t y Canada. This is an Muft i car r ie r Code Division Mult iple Access (MC-CDMA) rece iver t e s t fite. This rece iver is implemented using V e r y

C:y*nh Cjuw^VCIwxyecefver_v2\toplevd.brt

L>gita; Front-end

AGC Réf. Power tl! 16384

| 1024 JH AGC Réf. Power tl! 16384

AGC Ref. Noise |25S z 1 [0 1023]

RFFE Sail [0 127) [73 * 1 AGC loop MterKp \4 . V : » 5 « j

Not* H u Coef. [31130 324«]

[31130 cy Not* H u Coef. [31130 324«]

Gam Msmatdi [0 255) | 123

BiaseMitmatdi [0 2551 [ 123 *:f

rmwHj^equency Offset -

Peak Detector Threshold ! <»96 [20008000| ■ -

Trr tngtoopmer^ [3 * * \ * * \ * w

' .

Freq. Loop Piter Xfc |5 v K316 v

Debug Interface

Internal Data Path

Figure 6.4: Receiver control software.

Chapter 6. MC-CDMA downlink receiver testing 144

[ Printer m ^ N

i ) ~ * l

5 Standing panel.

^ L D«k w L

EÈ1 s: o

Figure 6.5: Fixed indoor-to-outdoor office environment test scenario.

6.2 Functional testing

During functional testing, the transmitter was randomly located at different positions while the receiver was fixed at one position as shown in Figure 6.5. The RF front-ends of the transmitter and the receiver were initially calibrated to the correct frequency band prior to performing the measurements. Figure 6.6 shows the in-phase and quadrature parts of a data frame captured at the output of the digital front-end unit (block 1 in Figure 3.41). The following figures, 6.7 and 6.8, show the measurement results at the output of the convolution unit (block 3) and the auto-correlator unit (block 4), respectively. Note that the auto-correlation based fractional CFO synchronizer in Section 2.2.3 was implemented in the receiver because it was a part of a contract project with Defence Research and Development Canada - Ottawa (DRDC-O). The peak value of the convoluted samples are detected by the peak detector unit as shown in Figure 6.9 (block 3). The CFO-corrected samples at the output of phase derotator are shown in Figure 6.10 (block 2). After the CFO correction, the cyclic prefix samples and preamble part of a received frame are then removed as shown in Figure 6.11 (block 5) prior to FFT processing. Figures 6.12(a) and 6.12(b) show the FFT output (block 6) results and


Inphase

-3 -2 -1 0 1 2 3 4 5 6 Sample

Quadrature x10

x10

Figure 6.6: Results at the output of the digital front-end unit (block 1).

their enlarged version at the 5 th symbol with frequency bins mapped into positive and negative frequencies. We can see that there is no DC component at the DC subcarrier anymore but the distortion of the neighboring subcarriers induced by the RF front-end circuit is still present. As a result, this problem may degrade the performance of the receiver.

The channel estimator results based on a conventional low-pass FIR interpolation of the channel responses at the pilot tones are presented in Figure 6.13 (block 8). The resulting equalized samples and their enlarged version at the 5 th symbol are shown in figures 6.14(a) and 6.14(b) (block 9). We can see that the equalized symbols look better than before and the effects of the channel are almost eliminated after this stage. After equalization, the useful information at the data subcarriers (block 10) is extracted prior to frequency-domain despreading as illustrated in Figure 6.15(a). In Figure 3.41, the data extractor collects the useful information from the spread symbol on several subcarriers (8 subcarriers in our receiver), and then arranges them in the right order before despreading. This is sometimes referred to as frequency domain de-inter leaving. The enlarged version at the 5 th extracted symbol is shown in Figure 6.15(b). The extracted data subcarriers are then frequency-domain despreaded (block 11) as shown in figures 6.15(c) and 6.15(d), respectively. The QPSK symbols are demapped and converted to a 4-level analog signal using the on-board DAC chips for verification purposes. Figure 6.16 (block 12) shows the output of the demapper unit for QPSK. Note that only QPSK demodulation is presented here for demonstration purposes. It is evident that the resulting signal amplitude has 4 levels corresponding to the QPSK symbol set.


Sample x10

Figure 6.7: Results at the output of the convolution unit (block 3).

Inph;

I i 0 S H » — H ili«*il«»%Mlr,*Sw*»i*'li'liM '''W

_1 L_ - 1 0 1 2 3 4 5

Sample 7 8

X10""

Figure 6.8: Results at the output of the auto-correlator unit (block 4).

Finally, the QPSK symbols are then converted from parallel to serial bit streams and descrambled before being sent to the host computer in order to evaluate the BER.


x10

Figure 6.9: Results at the output of the peak detector unit (block 3).

Inphase

- 1 0 1 2 Sample

Quadrature

5 6

x10""

x10

Figure 6.10: Results at the output of the derotator unit (block 2).

6.3 Receiver BER performance results

In order to control the signal's dynamic range at the receiver, a custom interface card between the receiver's AGC circuit and the RF front-end had to be designed. However, since the fabrication of that interface card was not completed at the time of the measurements were taken, the gain of the RF front-end had to be manually adjusted


2 Inphase

2 ' .

1 ■ 1 1 n. i Ei

■ T l

n i^iMhd*f*L_^m^ai^Ê B É M H Hfpr i f lH ■ U H I M h w ^ l M r ~ CL

1 1 rMM '

,rf"P' 11 ^ II

1 F'1 , , ,

'1 ,

' l | [ ' - i

-? — i 1 1 1 1 1 1 1 i i_

x10

Figure 6.11: Results at the output of the cyclic prefix removal unit (block 5).

(a) Complete frame. (b) Enlarged version at the 5 th symbol.

Figure 6.12: Results at the output of the FFT processor unit (block 6).

so that the received signal level was within the operating range of the receiver during the tests. Also, because of this limitation and of the general laboratory setup, tests could only be performed at fixed locations (as indicated in Section 6.1). Therefore, the BER performance results provided here are valid for a static wireless indoor channel only. When the interférer signals are added to the signal of the desired user at the transmitter, the resulting received signal power at the receiver proportionally increases as the number of interférer increases. The resulting signal can overload the ADC circuit at the receiver because the amplitude of the input signal is out of its operational range.


inphase

= 0

-1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1 -0.9 Sample

Quadrature x 1 0 '

} o \miEEiAfotyfhfij\fuiyfammm

-1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1 -0.9 Sample x 1 0 '

Figure 6.13: Results at the output of the channel estimator unit (block 8).

ÉÉ ÉÉ ttita iAk ÉÊ I

j i i i i i i i — i — i —

-14 -13 -12 -11 -10 -9 Sample

Quadrature

-7 -6 -5

-14 -13 -12 -11 -10 Sample

(a) Complete frame.

-7.4 -7.3 -7.2 -7.1 -7 -8.9 -8.8 -6.7 -6.6 -6.5 Sample .

(b) Enlarged version at the 5 th symbol.

Figure 6.14: Results at the output of the channel equalizer unit (block 9).

Therefore, the power of the transmit signal must be scaled so that no saturation and overloading problem occurs at full load. The received SNR was measured at the input of the receiver's RF front-end which was obtained with the aid of the spectrum analyzer. Averaging on the spectrum analyzer was used in order to obtain the adequate received SNR values. In the measurement process, the desired SNR range for all measurement scenarios was carried out by controlling the transmit power in conjunction with an external attenuator.

file:///miEEiAfotyfhfij/fuiyfammm


-4rtrT!Jrtu- i^mimiâM -14 -13 -12 -11 -10 -9

Sample

Quadrature

-4JOTIIIL-(a) Complete frame. (b) Enlarged version at the 5 t h symbol.

Results at the output of the data extractor unit (block 10).

(c) Complete frame.

- 1 2 4 -1.23 -1.22 -1.21 -1.2 -1.19 -1.18 -1.17 -1.16 -1.15 Sample .

(d) Enlarged version at the 5 symbol.

Figure 6.15: Results at the output of the despreader unit (block 11).

The measured BER for various average received SNR values are summarized in tables 6.1 to 6.3. The BER performance of the receiver in the static indoor laboratory tests is plotted in figures 6.17 to 6.20. In these figures, the SNR values for the number of active users Nu = 4 and Nu = 8 were normalized so that the transmit power of the desired user and interferers is equal to the transmit power of the desired user in the case of a single user, i.e. only the desired user. Since a static indoor channel was used for the measurements, computer simulations of the system over an AWGN channel model (dash curves in these figures) were performed for comparison with the measurement results because of very low multipath fading channel effects in this channel condition. Figure 6.17 shows the performance of the receiver under different modula-


I ■ T r i

-o.s _ l I l _

-1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1 -0.9 Sample x 1 Q -3

Figure 6.16: Result at the output of the demapper unit (block 12).

Table 6.1: BER performance of the receiver in a static wireless labora

tory channel, 1 user.

SNR (dB)/Modulation QPSK 16QAM 64QAM 4 4.9 x 10"2 2.56 x 10"1 3.52 x 10"1

2 1.99 x 102 2.21 x 101 3.18 x 101

0 4.4 x 10"3 1.29 x 10"1 2.09 x 10"1

2 5.42 x 10~4 8.43 x 102 1.48 x 101

4 3.09 x 10"5 2.64 x 10"2 1.19 x 10"1

6 1.63 x 106 8.64 x 10"3 8.6 x 10"2

8 N/A J 2.7 x 10"3 4.72 x 102

10 N/A J 1.95 x 104 2.76 x 102

12 N/A J 2.78 x 10~5 1.09 x 10"2

14 N/A J 2.82 x 10~6 4.51 x 10~3

1 BER is low.

tion schemes for a single user setup. As expected, we can see that higher modulation schemes (i.e. 16QAM or 64QAM) require more transmit power than QPSK modula

tion. Figures 6.18 to 6.20 show the performance of the receiver for QPSK, 16QAM

and 64QAMMCCDMA modulation under different numbers of active users. We can see that the MUI components were not completely eliminated compared to result of Section 5.3. This could be explained by the nonideal nature of electronic components in the RF transceivers causing a loss of orthogonality and also by measurement errors.

.


Table 6.2: BER performance of the receiver in a static wireless laboratory channel, 4 users.

SNR (dB)/Modulation QPSK 16-QAM 64-QAM -4 1.82 x 10-1 3.1 x 10"1 3.7 x 10-1

-2 1.52 x lu"1 2.96 x 10-1 3.61 x 10-1

0 9.87 x 10-2 2.57 x 10-1 3.18 x 10-1

2 5.19 x 10~2 1.98 x 10"1 2.69 x 10"1

4 2.12 x HT2 1.52 x 10-1 2.24 x HT1

6 3.97 x 10~3 1.09 x 10"1 1.75 x 10-1

8 3.8 x 10-4 6.07 x 10-2 1.31 x 10-1

10 1.41 x 10~5 2.28 x 10-2 9.39 x 10"2

12 N/A J 5.54 x 10-3 6.84 x 10"2

14 N/AJ 7.73 x 10-4 3.26 x 10-2

16 N/A ' N/A 2 1.64 x 10-2

1 BER is low. 2 The transmit power amplifier did not reach the required power during

measurement.

Table 6.3: BER performance of the receiver in a static wireless laboratory channel, 8 users.

S N R ( d B ) / M o d u l a t i o n QPSK 16-QAM 64-QAM -4 2.72 x 10-1 3.49 x 10"1 3.89 x 10-1

-2 2.08 x 10-1 3.39 x 10-1 3.64 x 10-1

0 1.56 x 10-1 3.17 x 10"1 3.36 x 10-1

2 1.18 x 10-1 2.65 x 10-1 3.03 x 10-1

4 6.37 x 10"2 2.12 x 10-1 2.79 x 10"1

6 2.29 x 10~2 1.66 x 10-1 2.45 x 10-1

8 6.7 x 10"3 1.55 x 10"1 2.08 x lu"1

10 5.86 x 10-4 5.95 x 10-2 1.45 x 10-1

12 7.44 x 10~5 2.51 x 10~2 1.14 x 10-1

14 N/A2 8.42 x 10~3 7.82 x 10-2

16 N/A J 1.96 x 10-3 5.14 x 10-2

1 BER is low.

The measured BER results presented here are in close agreement with the simulation results over an AWGN channel. Remaining discrepancies stem from synchronization errors and roundoff errors due to the fixed-point arithmetic implementation (MATLAB


0 QPSK sim O 16QAM sim D 64QAM sim ♦—QPSK • — 16QAM B— 64QAM

5 10 15 ^N„<dB)

Figure 6.17: Measured BER performance under different modulation schemes.

□ QPSK sim O—QPSK 1-user

QPSK 4-user QPSK 8-user

5 10 15 E,/N0(dB)

Figure 6.18: QPSKMCCDMA performance under different numbers of active users.

simulations relied on floating point arithmetic).


5 10 15

Figure 6.19: 16QAM-MC-CDMA performance under different numbers of active users.

jjj IQ"3

m

10

10

10

D 64QAM sim 64QAM 1-user 64QAM 4-user 64QAM 8-user

5 10 15 W d B >

20 25

Figure 6.20: 64QAM-MC-CDMA performance under different numbers of active users.

6.4 Conclusion

In this chapter, functional tests of the receiver were performed in a static laboratory channel environment. The results of these tests were compared to simulations in a static wireless indoor channel to demonstrate that the receiver performs as expected. Further tests in a moving laboratory channel were not performed due to lack of time for the

.


integration of the interface card between the AGC circuit in the receiver and the RF front-end box in order to maintain the signal dynamic range within the operating range for the test. Future works should concentrate on testing of the receiver in a time-varying channel so that the performance of the receiver can be evaluated and compared to the results of the MATLAB simulations for the indoor-to-outdoor/pedestrian channel.

Chapter 7

Conclusions and Future Work

The contributions of this thesis are the following:

• design and simulation of a complete downlink MC-CDMA system;

• a timing and frequency synchronization method for a downlink MC-CDMA system based on a new training sequence design method;

• a new decision-directed-based channel estimator for an MC-CDMA receiver;

• a lower bound on BER for a downlink M-QAM MC-CDMA system in a multipath fading channel;

• implementation of a complete baseband downlink MC-CDMA receiver in an FPGA platform;

• testing of the receiver in a laboratory static wireless environment.

In this research, we have studied the synchronization and channel estimation issues for a downlink MC-CDMA receiver. The downlink receiver has been designed under 3GPP's channel model specifications, i.e. 5 MHz channel bandwidth over the indoor-to-outdoor/pedestrian and vehicular channels. As a result, two design parameter sets for the receiver were proposed: one for the indoor-to-outdoor/pedestrian channel and another for the vehicular channel. Since an OFDM system is a special case of the proposed MC-CDMA architecture with a spreading factor L = 1, it was reasonable to study the OFDM system in order to obtain basic parameters for the MC-CDMA system. A spreading factor L = 8 was assumed in order to support up to 8 users.

H

Chapter 7. Conclusions and Future Work 157

The downlink receiver was simulated using MATLAB software under various parameters such as: channel conditions, channel estimation methods, pilot tone spacing and modulation schemes. Pilot-symbol-assisted modulation was used in order to help the receiver estimates the channel frequency response. The channel information at pilot tones uniformly distributed in the frequency domain, sometimes referred as comb-type pilots, was interpolated with several interpolation techniques such as: linear, spline and low-pass FIR filter in order to obtain the channel frequency response at the other subcarriers. Computer simulation results of the downlink MC-CDMA systems in both channel models showed that the systems worked well in the mobile environment.

An analytical study of the BER performance expression for an MC-CDMA system in a Rayleigh multipath fading channel was also presented. A one-dimensional integration approach similar to the method in [63] was used for computing the lower bound expressions for M-QAM modulation. The lower bounds were verified with computer simulation and it was showed that they could be used to compute quickly and accurately the performance of downlink M-QAM MC-CDMA.

In addition, three techniques for decision-directed pilot-based channel estimation for the downlink MC-CDMA system were also studied in order to improve the performance of the system. The first technique was the decision-directed virtual pilots for channel estimation, the second technique was the decision-directed iterative transform domain, while the third was based on a decision-directed process with a low-complexity overlap LMMSE estimator. Simulation results showed that the decision-directed virtual pilot technique did not provide significant improvements over the indoor-to-outdoor/pedestrian and vehicular channels. In this case, the decision-directed process did not help much in the selection of the candidate subcarriers to be virtual pilots. The second method provided better improvements in both channel models, especially in the vehicular channel model with a pilot tone spacing of 94 subcarriers. The third method was an effort to further improve the performance of the system under the indoor-to-outdoor/pedestrian channel since the receiver with this channel configuration was intended for implementation in an FPGA chip. Simulation results showed that the robust overlap LMMSE channel estimator was able to achieve performance comparable to the conventional pilot-based LMMSE channel estimator. Furthermore, the proposed decision-directed overlap LMMSE estimator not only provides a performance that is close to the conventional LMMSE estimator, but also substantially reduces computation complexity. The overlap LMMSE estimator with a 64 x 64 channel autocorrelation sub-matrix proved to be a good trade-off between performance and complexity for hardware implementation.


The indoor-to-outdoor/pedestrian MC-CDMA receiver was chosen for implementation in an FPGA development platform in order to form a basis for the receiver with the vehicular configuration. Therefore, all signal processing issues related to the real receiver such as training sequences, timing and frequency synchronization and tracking were also studied in this thesis. The proposed preamble for downlink MC-CDMA had a duration of only one MC-CDMA symbol. This was partitioned into ten short symbols similar to the short training symbol structure of typical 802.11 WLANs. A low PAPR short training sequence design algorithm based on the well-known Zadoff-Chu (ZC) sequences was also proposed for our receiver, i.e. suitable ZC sequences were found, thus yielding a training sequence structure. A joint time-domain coarse timing and fractional CFO estimation method was also presented based on this preamble structure. It is in fact a pre-FFT synchronization method due to its operation prior to FFT processing (OFDM demodulation). The coarse timing technique was adapted from the timing synchronization method proposed for WLANs [12]. The proposed fractional carrier frequency offset estimator was developed based on the proposed estimator in [11]. The advantage of this technique is that it entails no additional hardware complexity compared to the traditional estimators [12-24]. Simulation results showed that the joint coarse timing synchronization and fractional CFO synchronization techniques worked well in our proposed MC-CDMA receiver using the proposed training sequence. The coarse timing estimator allows accurate detection of the frame boundary while the fractional CFO estimator can estimate a wide range of frequency offsets up to ±4 subcarrier spacings and requires no additional complexity. The proposed coarse timing RMS estimation error was much smaller than the traditional coarse timing estimator in [14].

In our receiver design, post-FFT synchronization was studied in order to develop a complete MC-CDMA receiver. The post-FFT processing exploited the useful information at the pilot subcarriers after the FFT processing in order to estimate the fine timing offset and integer CFO in the frequency domain. The integer CFO estimator performed multiple frequency-domain correlations between the received pilot symbols with the shifted version of transmitted pilot tones, similar to the method in [50]. The integer CFO required that the time-domain coarse timing synchronization run first so that some of the data frames disturbed by the channel and noise were detected during the acquisition process of the receiver. For the fine timing synchronizer, the phase difference between two adjacent pilot subcarriers in the frequency domain was used to estimate the fine timing offset similar to the method in [49]. The fine timing estimator worked effectively after the coarse timing synchronization was performed, such that the orthogonality between the subcarriers is not completely lost. RMS timing estimation error of the fine timing estimator in terms of numbers of pilots tone was also studied in order to find a good trade-off.


A complete receiver design was simulated using MATLAB under the indoor-to-outdoor/pedestrian channel environment. The simulation results showed that the BER performance of the complete receiver was very close to the same receiver under perfect timing/frequency synchronization, i.e no CFO at input samples and known exact FFT window sampling point. A small BER difference was found at a high Eb/N0 = 30 dB because the deep fades of the channel degrade the timing synchronization of the receiver.

Finally, the design of a baseband downlink MC-CDMA receiver for the indoor-to-outdoor/pedestrian channel environment was implemented in a Nallatech development platform [6] powered by a Xilinx's Virtex 4-SX35 FPGA chip. The implementation of the receiver presented in this thesis was part of a contract project for the Defence Research and Development Canada - Ottawa (DRDC-O). The low-complexity fractional CFO estimator and the decision-directed overlap LMMSE channel estimator were not integrated into the complete receiver design at the time of testing the receiver. The traditional fractional CFO synchronizer and the low-pass FIR channel estimator were implemented in the receiver instead. The overlap LMMSE channel estimator was independently tested using Hardware-in-the-Loop (HIL) in an FPGA development platform different from the Nallatech platform. The receiver was implemented using a modified Xilinx FPGA design flow in order to compare with MATLAB fixed-point models. The complete receiver consumed less than 40% of the available resources of the Virtex-4 SX35 device. Xilinx's synthesis tool reported that the maximum clock frequency of the design was 149 MHz and the total power consumption was about 889 mW. A modular approach was used throughout the implementation of the receiver for future expansion and maintenance. A simple transmitter was implemented in an identical FPGA platform to the one used to implement the receiver in order to perform a real functional test of the receiver. We have tested the receiver within a static wireless laboratory (indoor) environment due to the limitations of our testing equipments. The channel environment used to test the receiver had very good conditions: low Doppler spread, small RMS delay spread and Line-of-Sight (LOS) communication. Therefore, it was reasonable to compare the measured BER performance of the receiver with the simulation of the receiver over an AWGN channel. The measured BER results presented were in close agreement with simulations over an AWGN channel. Remaining discrepancies came from synchronization errors and roundoff errors due to the fixed-point arithmetic implementation.

Future works for this receiver could include the replacement of the traditional autocorrelation-based fractional CFO estimator by the proposed low-complexity CFO estimator. This is expected to provide the same CFO tracking performance while reducing the FPGA resource consumption. Integration of the decision-directed overlap


LMMSE estimator in the receiver is also worthwhile to improve the BER performance of the system. The complete receiver could be tested under a moving wireless indoor-to-outdoor/pedestrian channel. Furthermore, it is feasible to take advantage of the modular implementation approach in order to expand the current receiver to another configuration using this framework without much modification. It would be interesting to apply channel coding techniques to improve BER performance for the indoor-to-outdoor/pedestrian and vehicular channels. Modern channel coding codes such as a family of turbo or low density parity check code (LDPC) codes would be good candidates for our MC-CDMA system. Further study of the multiple-input multiple-output (MIMO) technique that uses multiple transmitters and receivers to transfer more data at the same time could also be interesting for our MC-CDMA system.

Appendix A

RTL simulation results

This section presents the Register Transfer Level (RTL) simulations of the complete receiver using VHDL. All simulation results were obtained using Modelsim 6.4 software with the standard IEEE and Xilinx simulation libraries. The following figures present the simulation results for each sub-block detailed in Figure 3.5.

mbpolyfilter_tb_vhd/clk |

/hbpolyfilter_tb_vhd/rstl _ |

/hbporyfilter_tb_vhd/en _ |

/hbpolyfilter_tb_vhd/valid

mbpolyfilter_tb_vhd/din 0 KO"

/hbpolyftrter_tb_vhd/rdy |

/tibpolyfilter_tb_vhd/dout KÇLHOX

dout_analog

JK

M

BE

Figure A.l: Impulse and step response simulation of the first half-band decimation filter.

Appendix A. RTL simulation results 162

* b p d y f * w _ t j _ v h * d k |

A^polyfiH»r_1b_vhdVT»t

ft*poJyfi(tw_fc_vhd/vtW

/hbpdyfiltoMb_vhdMy |

/htopo*y«i»*jb_vhtfiin

A*polyfilt»x_lj_vhcydout

Figure A.2: Random input data simulation of the first halfband decimation filter.

/tibpolytiKer1_tD_vtid/clk ■

fl*pot*(itor1_tb_vhd/r*tl J ~

/hbpolyfiH«f1Jb_vtid/wi J ~

ft*0Olyfitor1_tb_vhdrv.lid J ~

rt*po1yfiltor1Jt)_**i/*i Ô]J(cP

_a>_vbdMy, l l l l l l l l

fl*pol,<il»r1 .«.. .hd/dout H H E C C O E f f l E

IT"

DE rarainnizE araram

Figure A.3: Impulse and step response simulation of the second halfband decimation filter.

ftibpoiyfilt«r1_tb_vtvi/clk ■

/hbpolyfitor1_tb_vtyJ/r««l

/hbpolyfitor1_tb_vhd/én ~~™

/hbpoJyfiB^1_tb_vhd/vrtd

/htopdyfilt»w1Jb_yhdfrdy I I I I

/hfapolyfilttwl _tb_vhd/*n

/hboolyfitof 1 _tt)_vhd/dout

I I I I I 11II11 I 1111 11 1111 I I I I I I 1111

Figure A.4: Random input data simulation of the second halfband decimation filter.


/decpolyfiltor_tb_vhd/clk ■

/d»xpoly1lltor_(b_vhdfrri J~~

/d^poly1l*r_tb_vrid*n J~~

/d«cpoly!llterJb_vh<i>V»lid J~~

Zd«cpolyfitt»fJb_vr»dHin Ho ~

/d^po-y f i te r . tb .vhdM, , l l l l l l l

MtcpolylltaJb_vMMMit ^HGECOOOCOOff mmmmfïïfjmmKmmmmm \ mn

/d«polyflttwJb_vhd/r*l

A*»cpolyflrter_tb_vh<ifrn

/d#tpolyfiltor_tb_vt,cW*lid

Atecpolyfilfer tb vhdMin 0 lo ' * W M H i ' tb vhdMy IIIIIIIIIIIIIIHIIIIIII llllllllllllll llllllllllllll lllllllllllllll llllllllllllll lllllllll

Atecpolyfiltar tb vhd/dout 0

Ooul_»na!og

miypiA' miypiA'

(a) Impulse response of the upper arm. (b) Impulse response of the lower arm.

Figure A.5: Impulse response simulation of the polyphase decimation filter.

' d * cpo ry f i lM r_ lb_vhd /nU

d » cp o ryf itta r _ lb _vhd ,■■ n

.■■d«cpolyfrlt«i_lb_vhd/vilKl "

/d«cpoN«»r_lb_vtid/difi 0

/d»efwlyMlir_»_wMAdy [

(a) Step response of the upper arm. (b) Step response of the lower arm.

Figure A.6: Step response simulation of the polyphase decimation filter.

/decpolyfilter_tb_vhd/clk

/decpolyfitter_tb_vhd/rstJ

/decpofyfitter_tb_vnd/en

/decpolyfittBr_ttj_vhd/va lid

/decpofyfilter_tb_vhd/rdy

/docpotyfitter_tb_vtid/din

/decpolyfiHer_tb_vhd/dout

Figure A.7: Random input data simulation of the polyphase decimation filter.


convotution_tt>_vh*»n [

Figure A.8: Simulation results for the convolution block.

/peak_detector_tb_vhd/clk |

/peak_detector_tb_vhd/rstn |~

/peak_detector_tb_vhd/en [~

/peak_detactor_tb_vhd/nd |

/pea k_detector_tb_vh d/din

/pea k_detector_tb_vhd/peak

/peak_detector_tb_vrid/rdy L

.-^JJ^LAJw^

Figure A.9: Simulation results for the moving sum and the peak detector.

/•utocorrelator_tb_vrtd/c(k

/autocorrelator_tb_vfKl/ritn [

/a utocorrelator_tb_ vhd/en

/a utocorre lator_tb_vhd/nd

/autocorrelafor_tb_vrKl/a_re

/a utocorrelator_tt>_vtid/a_im

/autocorrelator_tb_vhd/b_re

/autocorrelator_tb_vhd/b_im

/a utocorre lator_tb_ vhd/rdy

/a utocorrelator_tb_vhd/y_re

/autocorre lator_tb_vfid/y_im

r~ i ^ ^ ^ ^

/ \

Figure A. 10: Simulation results for the auto-correlator.


/cordic tb vhd/clk

/cordic_tb_vhd/rstn J

/cordic_tb_vhd/en _

/cordic_tb_vhd/nd _

/cordic tb vhd/mode

/cordic tb vhd/z 0

/cordic_tb_vhd/phase_in 0

/cordic_tb_vhd/rdy -j_

/cordic_tb_vhd/xp JÔ~

/cordic_tb_vhd/yp ]5~

/cordic_tb_vhd/zp ]5~

/cordic_tb_vhd/phase_out 0

/cordic_tb_vhd/x 0 >8192 |0

/cordic_tb_vhd/y J3 18192

Ï8192 Ï-8192

1-8192 >8192 ï-él92 (8192 )T8Ï92~l8192 1-8192 f

K8195 (8Ï94 (8195

DE 112870 ï-12868

|1.571041-1.57Ô8

11586

) 12868 FJ2868 f

li.57oe FÏ87Ô8T

1-8194

6434 (^6434 K19302 1-19302 Ifl

0.7854 (^785412.3562 1-2.3562 (Ô~

Figure A. 11: Vector mode simulation results for the serial CORDIC.

/cordic_tb_vhd/clk | | |

/cordic_tb_vhd/rstn

/cordic_tb_vhd/en

/cordic_tb_vhd/nd J [

/cordic tb vhd/mode

/cordic tb vhd/x -8192 X J&

/cordicJb_vhd/y 1-8192 <8192 H)Ï92 fq

/cordic_tb vhd/z 0

/cordic_tb_vhd/phase_in J)

/cordic_tb_vhd/rdy _

l i l i i l i ^ ^ l i i l l i

15793

15793 X )12868 I-12868 J19302 16434 "1-19302 125736

11.5708 1-1.5708 (2~3562 1(0.7854 (-2.3562 13.1416

/cordic_tb_vhd/xp 11586

/cordic_tb_vhd/yp j l

/cordic_tb_vhd/zp j ^

/cordic_tb_vhd/phase_out j j ^

19302 19302

5790 5792 •5796

-5792

5795

Figure A. 12: Rotation mode simulation results for the serial CORDIC.

/pitot_generator_tb_vtid/dk JH

/pik>t_genarator_tb_vhd/rstn J~~

/pilot_gen«rator_tti_vrid/en |_

/pilot_generator_tb_vhd/load

/pilot_generator_tb_vhd/init 0100111011

/pikrt_generator_tb_vhd/rdy i |

/pilot_generatoMb_vhd/uut/ser_out -\

/pilot_generator_tb_vrid/pout ÏÔ~ "Ï1831S T " )i63ie

1 I H

^(-1B318 ]16316;.1831T ~)16316

J L

^1-16318 I l 8318

Figure A. 13: Simulation results for the pilot tones generator.


/frt_angina_tb_vfioVeii |

/fft_a ngina_tb_vhd/r»tn

/fft angina tb vhd'tn 1 /nt angina tb vhdfrt raquaat

Ml angina tb vtid/uut/idr 1 1 HU «ngine tb vbd/uut/TR start I 1

■ft angina tb vhd-'uufbusy

.'fft angina tb vhd/uut/adona

/lit angina tb vhd'uut/tfl out start

.'fft angina tb vhoVK ra 0 IlZ 87 10

/Rl angina tb vhdht im 0

II 12 Ï3 1- 5 Ï6 | 7

/fft angina tb vhd/fft out start

m angina tb vhdVy ra C

■■'fft angina tb vtid/y im 0

/fit angina tb vhd/y indax 0

/ffl_e n g ine_tb_y hd/clk

/fft_e n g ine_tb_vh d/rstn

/fft_e n gine_tb_vhd/e n

/ffl_«ngifie_tb_vhd/frt_reqtJest

/flt_engine_tb_vhdAiut/sclr

fft1_engine_tb_vhd/uut/fît_start

/Tft_e n g j n e_tb_vh d/u ut/b u sy

/fft_engine_tb_vtid/uut/edone

/tft_engine_tb_vtid/uut/fft_out_staft

/fft_enginejb_vhd/x_re 0

/fft_engine_tb_vhdA<_im 0

/fft_engine_tb_vhd/x_jndex 0

/ffl_engine_tb_vtid/ffl_out_start

/fft_engine_tb_vhd/y_re 0

/fft_engine_tb_vhd/y_im 0

/Tfl_engine_tb_vrid/y_index 0_

(a) Load input samples.

rLi~LTUTjxhrLjxn^^

l 4Ô95_

unooL « n i

(b) Unload results.

Figure A. 14: Simulation results for the 512point Radix2Lite FFT core.


teu_tb_vbd/cH(

* u _ t b _ v h o V d i n FDP90242

« e u j b _ v h d / r d y

. ' tcu_tb_vh d'u ut'i n st_ inp ut_ buff c r / * mpty

/tc u _tb_ vh d/ u ut/in i t _ i npu t_b uffar/f uH

/ t cu_ lb_v t id /u utft<_r i 0 0 0 0

j _ t b j m d / u u t / * _ i m 0000 I

/ teu_tb_vt>d/uufwr_aodr 0

/ t cu_ tb_vnd /uuMf l_ rd_addr 0 _

rteuJb_vtid/out/ram_add< 0

n * u j b _ v h d / d o u « OOOOOOOO

I F F C E F D Q »

J 1

Emmmmmnmm

iKmfflraimnnniim rramrramirmimjammi

mnn

MEH

(a) Load input samples to FIFO.

rtau_tb_vtid/rstn

flcu_tb_vtid/an

t c u tb vhdVnd

.tcu tb vt id /dm 195FEAE

/ tcu t b vhd/ rdy i n n 1 _ J _ 1 u_tb_vh d/u ut/i n st_ in p ut_buff • r ! * mpty

t c u t b vhd/uut/K ra 195 t

/ tcu tb vhoVuut/K i m FEAE

Aeu tb vhdVuuWTt rd addr 0 I I I

/ t c u _ t b _ v t i d / u u t / r « m _ i d d ' XCOOQOO0CO0O0COO3 WWtOBXW _ * * W M w M u2 5 6

■r | 25« 125» 122. IH1 1» / tcu tb vttd/dout OOOOOOOO I 10021 FITS I0O3OFTF3 ]o033 FED JOOJ; FEB ItflîEFFEB lex

(b) Unload and reorder frequency bins.

Figure A. 15: Simulation results for the complete F F T processor.


/post_fTt_tirriing_tb_vhd/uut/rstn J

/post_frt_timing_tb_vhd/uut/en |

/post fft timing tb vhd/uut/nd JLJLJL i i JL /post fft timing tb vhd/uut/din — |™» ÏFDC82D33 I™ - (FDC82D33 I0238D2CD

t /post fft timing tb vhd/uut/din re 0 Ï568 ï-568 Ï568 Ï-568 (568

T 1 /post fft timing tb vhd/uut/din im 0 I l«» Ï11571 J>s»ïll571 I-H571

/post fft timing tb vhd/uut/conj_rdy i i i n 1 1 JL_ /posUft timing tb vhd/uut/conj re Ï0 568 Ï-568 1568 1-558

1 /post fft timing tb vhd/uut/coni im Ï0 ÏHS71Ï-11571 1.15711-11571

/post fft timing tb vhd/uut/p re J(0 1 | M -« 1134210665 I —11— I --1-134210665

/post fft timing tb vhd/uut/p im JO

/post fft timing tb vhd/uut/p trunc re Ï0 Ï-4096Ï4095 ï-4096l(ï^096Ï4095 Ï-4096

/post fft timing tb vhd/uut/p trunc im JO

/post fft timing tb vhd/uut/pitot cnt Ï0 1 b Ï3 !l Ï5 Is J 7 _ j 0 _ /post fft timing tb vhd/uut/acc nd i J L_ i 1 J L_ /postfft timing tb vhd/uut/acc re Ï0 Ï-4096Ï-1 Ï4094 (-2 I-4098I-3 1-4099

/post fft timing to vhd/uut/acc im )[0

/post fft timing tb vhd/uut/cordic x JO 1-513

/post fft timing to vhd/uut/cordic y \\0

/post fft timing tb vhd/uut/cordic z 0

/post fft timing to vhd/uut/cordic rdy i i /post fft timing to vhd/uut/cordic zp ](0 1-25742

/post fft timing tb vhd/uut/scaling reg JO 1268489060

/post fft timing tb vhd/uut/rdy i [

/post fft timing tb vhd/uut/dout \0 3^1

Figure A.16: Simulation of the fine timing estimator unit with an input timing offset of 4 samples.


/post_fTt_freq_tb_vhd/uiJt/clk |

/pc^_fftjreq_to_vhû7uut/rstn f

/post_fft_freq_tb_vhd/uut/en J

/posCfft_freq_tb_vhd/uut/din_re

/post^fftJreq_tb_vhd/uut/din_im

/rjost.mjrecLto.vhd/uut/pilot.re H E O O ^

/post_fft_freq_tb_vhd/uut/pilot.im 0

/pc«t_fft_freq_tb_vhd/uut/acc_nd L

I Œ

/posCfft_fre<Ltb_vtld/uut/acc.re J H M E I M ^

/posLfftJwutb.vM/uul/accJm E M E O H ^ ^

/post_m_freq_tt)_vtld/uut/con-.re lo f2Ô43~ll84 11-796 l l 7 8 1 )-1574"

/post_m_freq_tb_vhd/uul/corrJm 10 ^703 Ï2119

/post_fTt_freq_tb_vhd/uut/mag_rdy i I I I

/post_fTt_freq_tb_vhd/uuVmag_reg 10

/post_fft_freq_tb_vhd/uuVstate Mle IsearciT

/postL.m_freq_tb_vhd/uut/current_peak R256 12746"

/post_rTt_freq_tt_vhd/uuVfreq_shift ?0~

/post_tTt_freq_tb_vhd/uut/scaling_reg W

/post_fft_freq_tb_vhd/uut/rdy i_

/post_fft_freq_tb_vhd/uut/dout 0

/post_fft_freq_tb_vhd/uuVpriase_rotate ÏÔ~

J^aJ^^Jim. J I L [1794 12996 11813

J299jT

l3TJo_

lëûjsr 641 1-534 fQ

138 1-812 ( F

J I L >79 11346 W

J I 1

1-77208

±

Figure A. 17: Simulation of the integer CFO estimator with an input CFO of 3 subcarrier spacings.


/topleveljb/uut/sys_clk

Aoplevel_tb/uut/int_reset

Aoplevel_tb/uut/user_dcm_locked

/toplevel_tb/uui/ref_pilot - )u 1^-11585

/toplevel_tb/uut/pilot_re 0

/toplevel_tb/uut/pilotjm 0

/toplevel_tb/uut/coarse__chan_re

Aoplevel_tb/uut/coarse_chan_im

/tople vel_tb/uut/freq_m ap_re

/top le vel_tb/uut/freq_m ap jm

/toplevel_tb/uut/chan_est_re

ftoplevel_tb/uut/chan_est_im

/tople vel_tb/uut/equalized_re

/tople vel_tb/uut/equal ized_im

Figure A. 18: Simulation of the channel estimator and ZF equalizer.


/despreader_tb_vhd/uut/clk H

/despreader_tb_vhd/uut/rstn j

/despreader_tb_vhd/uut/en J ~

/despreader_tb_vhd/uut/nd I I I I I I I I I I

/despreadef_tb_vhd/uut/dir» "j—[OXHMOOO ] — P

/despreader_tb_vhd/uut/user_eode 3

/despreader_tb_vhd/uut/din_im H J E J O M ^ J C E X

/yesprBader_tb_vhd/uut/output_vaSd |

ryesrxeaderJb_vrtd/uut/output_rdy |

/despreader_tb_vhd/iiut/rdy |

/despfeaderJb_vhdAjut/acc_trunc_r« \0

I l l l I ™ H E E

l l l l l l l i

/despreader_tb_vhd/uut/rom_en J

/despreaderjb_vhd/uirt/din_re 0 ^ 1 - 1 8 3 8 4 1 - p f l l 6 3 8 4 f ^ l - I - I S a W - T - l l l g l M l-1«3»4fH3M f

■EEE I I I

HEI^=]E

F J 6 3 WT^

rÏ Ï S » 4 T Ï 6 3 8 4 T i 6 3 8 t F ^ ^ FÎ83M)FT

J_ 116384 "116384 T 6 3 8 T

/despreader_tb_vhd/uij1/Bcc_trijncJm 1Ô~~

/despreader_tb_vhd/uut/doirt ÏÔ~

"116384 116384 TÏ636T

_L Ti6384~

E(^~HE

ÏEEëMEI

I ^ ^ B E I _LL Ï40OOC000

lTi63^4l^^TÏ6384l!l6384

x 116384 1-16384|16384 [-16364

^16384 l i!63S4_

"116384 J3sm 11073790976 l107372544Ïni073790976~~l1073692672 11073725440 1ÎÔ73758208 (1073692672

Figure A. 19: Simulation of the despreader.

Appendix B

MC-CDMA transmit ter implementation

Figure B.l illustrates the block diagram of a simple MC-CDMA transmitter. The block diagram of the transmitter consists of the following blocks: data scrambler, mapper, inverse FFT, pilot generator, windowing, preamble, and 8x interpolation filter. The input data file is scrambled with a length-127 scrambler by the data scrambler block that uses the same generator polynomial as in typical WLAN systems. The generator polynomial is given by

PSIC(x) = x7 + x4 + 1 (B.l)

and is illustrated in Figure B.2. The same generator polynomial is used to scramble the transmit data and to descramble the receive data.

The windowing block is based also on the windowing specifications of a typical WLAN system [51]. In a typical implementation, the windowing function will be represented in discrete time. The windowing function for an MC-CDMA symbol with symbol duration Ta = 128 ps, 512-point FFT, is defined as

9{n) =

1 2 < n < 639

0.5 1,640 (B.2)

0 otherwise

The preamble structure for the MC-CDMA system was described in Section 2.2.1. The preamble has 640 samples, which is exactly the length of an MC-CDMA symbol. The 8x interpolation filter is implemented using a multistage polyphase interpolation technique. The filter's characteristics are the same as the receiver's decimation filter.

Appendix B. MCCDMA transmitter implementation 173

Precomputed

Data file Data scrambler Mapper

Pilot generator

M U X

Pilot generator

M U X

M U X

Spreader

M U X

Spreader

M U X

Cyclic prefix

"U Preamble

M U X

Preamble

M U X

M U X

Windowing

M U X

Windowing

M U X

8X Interpolation

filter

Figure B.l: Simple MCCDMA transmitter block diagram.

Data in

i f l p\ )

' >|

A +A +A •> «

\

J \ '

Data out

Figure B.2: Data scrambler.

We assumed that a transmit frame consisted of a preamble and 5 data symbols. The resulting interpolated complex samples, preprocessed in MATLAB, were stored in a binary file so that they are downloadable to the transmit buffer (block RAM) in the user FPGA via the user control software.

■

Appendix C

Channel estimator complexity

The following table compares the complexity of the conventional LMMSE channel estimator with that of the decision-directed overlapped LMMSE channel estimator. The following parameters are used in the computation of the number of multiplications per symbol:

• Number of subcarriers: N = 448

• Initial low-pass FIR estimator filter length: D = 8

• Sub-matrix size: [ 5 x 5 ] = [64 x 64]

Table C.l: Channel estimator complexity.

Multiplications per symbol Low-pass FIR ND = 3584 Conventional LMMSE N2 + ND = 204288 Overlapped LMMSE [%\S2 + ND = 32256

Appendix D

Virtex-4 SX35 overview

D . l Configurable logic block

The Configurable Logic Block (CLB) is the main logic resource for implementing sequential as well as combinatorial circuits. Each CLB element is connected to a switch matrix to access the general routing matrix (shown in Figure D.l). A CLB element contains four interconnected slices [7]. They are grouped in pairs and organized as a column. SLICEM indicates the pair of slices in the left column, and SLICEL designates the pair of slices in the right column. Each pair in a column has an independent carry chain; however, only the slices in SLICEM have a common shift chain [7].

SUCEM (Logic or Distributed RAM or Shift Register)

SHIFTIN

<

Switch Matrix

I CLB

c

COUT

- M SLICE (2) V X0Y1

I c

_ M SLICE (0) V X0Y0 c SHIFTOUT CIN

SLICEL (Logic Only)

COUT

I *v SLICE (3) A L K • I X1Y1 |N r V

f Kl SLICE (1) M IN . • XIYO | N — r V

CIN

^

:>

Interconnect to Neighbors

ogOTO_5_01_OT1504

Figure D.l: Arrangement of slices within the CLB [7].

Appendix D. Virtex-4 SX35 overview 176

D.2 Block RAM

Each block RAM stores 18 Kbits of data. Write and Read are synchronous operations; the two ports are symmetrical and totally independent, sharing only the stored data. Each port can be configured in any "aspect ratio" from 16K x 1, 8K x 2, to 512 x 36, and the two ports are independent even in this regard [7]. Figure D.2 and D.3 illustrate the dual-port and single-port I/O ports, respectively.

CASCADEOUTA CASCADEOUTB

18-Kbit Block RAM

DIA DIPA ADDRA Port A

WEA ENA SSRA

>CLKA REGCEA 18 Kb

Memory Array

DIB DIPB ADDRB WEB ENS SSRB

>CLKB REGCEB

DOA DOPA

DOB DOPB

I CASCADEINA CASCADEINB

ug070_4_01_071204

Figure D.2: Dual-port I/O ports [7].

CLK

Configurable Options UGOT0_4_05_03O7M

Figure D.3: Single-port I/O ports [7].


D.3 First In First Out (FIFO)

The block RAM can be configured as FIFO memory with common or independent read and write clocks as illustrated in Figure D.4. Port A of the block RAM is used as a FIFO read port, and Port B is a FIFO write port. Data is read from the FIFO on the rising edge of read clock and written to the FIFO on the rising edge of write clock [7].

I/Os not used in FIFO Mode

RO.EN

SSR

RD_CLK

DI

L_

*l DIA

I AA[13:0] DOA

*l WEA[3:0] -* ENA

I - | SSRA

* j> CLKA_ _ _p£rt_A _ 1 DIB

•j AB[13:0) DOB *l WEB[3:0) -J ENB

I - i SSRB

*}> CLKB _ _ _ p f r t ^ _

FIFO

Logic

DO

EMPTY

ALMOST_EMPTY

FULL

ALMOST.FULL

RDCOUNT WRCOUNT

UO070_4_07_071204

Figure D.4: FIFO I/O ports [7].

D.4 DSP48

The DSP48 slice efficiently performs a wide range of basic math functions, including adders, subtracters, accumulators, multiply-accumulators, multiply multiplexers, counters, dividers, square-root functions, and shifters [8]. Each DSP48 slice has a two-input multiplier followed by multiplexers and a three-input adder/subtracter. The multiplier accepts two 18-bit, two's complement operands producing a 36-bit, two's complement result. The result is sign extended to 48 bits and can optionally be fed to the adder/subtracter. The adder/subtracter accepts three 48-bit, two's complement operands, and produces a 48-bit two's complement result as illustrated in Figure D.5 [8].


Cascade Out to Next Slice

Cascade In from Previous Slice Ufl073_c2_01_061304

Figure D.5: DSP48 slice architecture [8].

Appendix E

R F front-end overview

E.l Description

The front-end unit is based on the zero-IF MAX2829 single chip RF transceiver IC which has been designed specifically for WLAN applications. The transceiver covers the 2.4 GHz - 2.5 GHz and 4.9 GHz - 5.875 GHz frequency bands and support up to 40 MHz channel bandwidth [69]. The front-end unit consists of four independent transceiver chips can be controlled via the external I /Ol or 1/02 interfaces located on the front panel as illustrated in Figure E.l.

Figure E.l: RF front-end front panel [69].

E.2 Specifications

The following table shows the specifications of the MAX2829 transceiver chip.

Appendix E. RF front-end overview 180

PARAMETER CONDITIONS MIN TYP MAX UNITS

Supply voltage (30 Watts) 10 12 15 V

Frequency bands 2.412-2.5 4.9 - 5.875 GHz

Phase noise 802.11g 802.11a

-107 -97 dBc/Hz

Reference clock 40 MHz ± 1 5 ppm

Frequency resolution 802.11a 802.11g

382 230

Hz

TX MODE

Output Power

1 dB compression over frequency band

802.11a 802.11g

20 25 26 23 25 26

dBm

Output IP3 Pout = 10 dBm (802.11a) Pout = 10 dBm (802.11g)

25 37.5 dBm

Output Power control range B6:B1=111111 to B6:B1=000000 30 dB

TX maximum gain 802.11 a/g 26 dB Carrier leakage Without DC offset cancellation -27 dBc

Tx Sideband Suppression (Uncalibrated)

802.11g 802.11a

-46 -51 dB

Baseband -3 dB Corner Frequency

Nominal mode Turbo mode 1 Turbo mode 2

12 18 24 MHz

TX Baseband I/Q Input Impedance 50 Ohms

TX Between-Channel isolation 802.11 a/g 50 dB Baseband inputs maximum

voltage 2 Vpp

RXMODE

Baseband -3 dB Corner Frequency

Narrowband mode Nominal mode Turbo mode 1 Turbo mode 2

7.5 9.5 14 18 MHz

Total receiver gain 802.11 a/g 95 dB Receiver Noise Figure 802.11 a/g 9 dB

Receiver Sensitivity S/N = 15dB BW=300 kHz -95 dBm

Baseband Filter Rejection (Nominal Mode)

fBASEBAND = 15 MHZ 'BASEBAND = 20 M H Z 'BASEBAND > 40 MHz

20 39 80

dB

In-Band Input IP3

Tones at 4.6 MHz and 5.1 MHz, IM3 at 4.1 MHz and 5.6

MHz, RF Gain = MAX (B7:B6)=(11 )

(35 dB)

802.11a

802.11g

-17

-15 dBm

Output P-1dB At maximum RX gain (802.11a/b/g) 16 dBm

RX Gain control range 93 dB

IMD Dynamic Range Input S/N of 10 dB, receiver

bandwidth = 20 MHz, EVM at 5% 77 dB

RX Baseband I/Q Input Impedance 50 Ohms

Maximum RF Input Power 802.11 a/g 10 dBm RX Between-Channel Isolation 802.11 a/g 50 dB

Figure E.2: MAX2829 specifications [69].

Bibliography

[1] M. Ergen, Mobile Broadband: Including WiMAX and LTE. Springer, 2009.

[2] Xilinx, Fast Fourier Transform vS.2, 2005.

[3] IEEE, Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications - High-speed Physical Layer in the 5 GHz Band, 2003.

[4] K. Wang, M. Faulkner, J. Singh, and I. Tolochko, "Timing synchronization for 802.11a under multipath channels," in Australian Telecommunications, Networks and Applications Conference, Melbourne, 2003.

[5] H. Schulze and C. Liiders, Theory and applications of OFDM and CDMA. Wiley, 2005.

[6] Nallatech, XtremeDSP Development Kit-IV User Guide, 2005.

[7] Xilinx, Virtex-4 FPGA User Guide, 2008.

[8] , XtremeDSP for Virtex-4 FPGAs, 2008.

[9] J. Lee, J. Han, and J. Zhang, "MIMO technologies in 3GPP LTE and LTE-Advanced," EURASIP Journal on Wireless Communications and Networking, vol. 2009, 2009.

[10] L. Hanzo, M. Munster, B. Choi, and T. Keller, OFDM and MC-CDMA for Broadband Multi- User Communications, WLANs and Broadcasting. Wiley, 2003.

[11] S. Roy, J.-F. Boudreault, and L. Dupont, "An end-to-end prototyping framework for compliant wireless LAN transceivers with smart antennas," Comput. Commun., vol. 31, no. 8, pp. 1551-1563, 2008.

[12] Y. Wang, G. Jian-Hua, B. Ai, L. Zong-Qiang, and N. Yuen-Fei, "A novel scheme for symbols timing in OFDM WLAN systems," ECTI Transactions on Electrical Eng., Electronics, and Communications, vol. 3, pp. 86-91, 2005.

Bibliography 182

[13] H. Minn, V. Bhargava, and K. Letaief, "A combined timing and frequency synchronization and channel estimation for OFDM," in IEEE International Conference on Communications, 2004.

[14] K. Wang, J. Singh, and M. Faulkner, "FPGA implementation of an OFDM-WLAN synchronizer," in 2nd IEEE International Workshop on Electronic Design, Test and Applications, 2004.

[15] Y.-T. Hwang, K.-W. Liao, and C.-H. Wu, "FPGA realization of an OFDM frame synchronization design for dispersive channels," in International Symposium on Circuits and Systems, 2003.

[16] Y.-J. Ryu and D.-S. Han, "Timing phase estimator overcoming Rayleigh fading for OFDM systems," IEEE Trans. Consum. Electron., vol. 47, pp. 370-377, 2001.

[17] D. Landstrom, N. Petersson, P. Odling, and P. Borjesson, "OFDM frame synchronization for dispersive channels," in 6th International Symposium on Signal Processing and its Applications, 2001.

[18] E. G. Larsson, G. Lui, and G. B. Giamakis, "Joint symbol timing and channel estimation for OFDM base WLANs," IEEE Commun. Lett, vol. 5, pp. 325-327, 2001.

[19] S. Johansson, M. Nilsson, and P. Nilsson, "An OFDM timing synchronization ASIC," in IEEE 7th International Conference on Electronics, Circuits and Systems, 2000.

[20] Y.-C. Liao and K.-C. Chen, "A new digital signal processing implementation of OFDM timing recovery," in IEEE 51 s t Vehic. Tech. Conf, 2000, pp. 1517- 1521.

[21] J. J. van de Beek, P. O. Borjesson, M.-L. Boucheret, D. Landstrom, J. M. Arenas, P. Odling, and S. K. Wilson, "Three non-pilot based time and frequency estimators for OFDM," Elseviser Signal Processing, vol. 80, pp. 1321-1334, 2000.

[22] S. Johansson, P. Nilsson, and M. Torkelson, "Implementation of an OFDM synchronization algorithm," in 42nd Midwest Symposium on Circuits and Systems, Las Cruces, 1999.

[23] T. Schmidl and D. Cox, "Robust frequency and timing synchronization for OFDM," IEEE Trans. Commun., vol. 45, pp. 1613-1621, 1997.

[24] J. J. van de Beek, M. Sandell, and P. O. Borjesson, "ML estimation of time and frequency offset in OFDM systems," IEEE Trans. Signal Process., vol. 45, pp. 1800-1805, 1997.

Bibliography 183

[25] M.-H. Hsieh and C.-H. Wei, "Channel estimation for OFDM systems based on comb-type pilot arrangement in frequency selective fading channels," IEEE Trans. Consum. Electron., vol. 44, pp. 217-225, 1998.

[26] S. Coleri, M. Ergen, A. Puri, and A. Bahai, "A study of channel estimation in OFDM systems," in IEEE 56t/l Vehic. Tech. Conf, 2002, pp. 894 - 898.

[27] E. Golovins and N. Ventura, "Low-complexity channel estimation for the wireless OFDM systems," in 13tft European Wireless Conference, 2007.

[28] M. Noh and H. Park, "Low complexity LMMSE channel estimation for ofdm," in IEE Proceedings on Communications, vol. 153, 2006, pp. 645 - 649.

[29] O. Edfors, M. Sandell, J. J. van de Beek, S. K. Wilson, and P. O. Borjesson, "OFDM channel estimation by singular value decomposition," IEEE Trans. Commun., vol. 46, pp. 931-939, 1998.

[30] C. Mehlfuhrer, S. Caban, and M. Rupp, "An accurate and low complex channel estimator for OFDM WiMAX," in 3 r d International Symposium on Communications, Control and Signal Processing, 2008, pp. 922 - 926.

[31] P.-Y. Tsai and T.-D. Chuieh, "A low-power multicarrier-CDMA downlink baseband receiver for future cellular communication systems," IEEE Trans. Circuits Syst. I, vol. 54, pp. 2229-2239, 2007.

[32] P.-Y. Tsai and T.-D. Chiueh, "A 1.1-v 9.9-mw MC-CDMA downlink baseband receiver IC for next-generation of cellular communication systems," in Asian Solid-State Circuits Conference, vol. 54, Nov. 2005, pp. 489-492.

[33] S. Nours, F. Nouvel, and J.-F. Helard, "Design and implementation of MC-CDMA systems for future wireless networks," EURASIP Journal on Applied Signal Processing, vol. 10, pp. 1604 - 1615, 2004.

[34] 3GPP, TR 25.943 vl.2.0, Technical Specification Group (TSG) RAN WG4, Deployment Aspects, 2001.

[35] ETSI, Project Broadband Radio Access Networks (BRAN), HIPERLAN Type 2; Physical Layer, Technical specification, 1999.

[36] J. G. Proakis, Digital Communications 4 th Edition. McGraw Hill, 2001.

[37] T. S. Rappaport, Wireless Communications Principles and Practice. Prentice Hall, 2002.

Bibliography 184

[38] Y. Zhao and A. Huang, "A novel channel estimation method for OFDM mobile communication systems based on pilot signals and transform-domain processing," in IEEE 41 t h Vehic. Tech. Conf, 1997.

[39] P. Hoecher, S. Kaiser, and P. Roberson, "Two-dimensional pilot-symbol-aided channel estimation by Wienner filtering," in IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, 1997.

[40] J. Zhu and W. Lee, "Channel estimation with power-controlled pilot symbols and decision-directed reference symbols," in IEEE 58th Vehic. Tech. Conf., vol. 2, 2003, pp. 1268 - 1272.

[41] S. Wei and P. Zhiwen, "Iterative LS channel estimation for OFDM systems based on transform-domain processing," in WiCom 2007 International Conference on Wireless Communications, Networking and Mobile Computing, 2007, pp. 416 -419.

[42] S. Beyme and C. Leung, "Efficient computation of DFT of Zadoff-Chu sequences," IET Electronics Letters, pp. 461-463, 2009.

[43] S. Hara and R. Prasad, "Overview of multicarrier CDMA," IEEE Commun. Mag., vol. 35, pp. 126-123, 1997.

[44] S. Le-Nous, F. Nouvel, and J.-F. Hélard, "Design and implementation of MC-CDMA systems for future wireless networks," EURASIP Journal on Applied Signal Processing, pp. 1604-1615, 2004.

[45] H. Lui, Signal processing application in CDMA communication. Artech House Publisher, 2000.

[46] 3GPP, TS 25.101v2.1.0, 3rd Generation Partnership Project (3GPP), Technical Specification Group (TSG), RAN WG4 UE Radio transmission and Reception (FDD), 1999.

[47] H. Liu and G. Li, OFDM-Based Broadband Wireless Networks. Wiley, 2005.

[48] M. Aldinger, A Multicarrier Scheme for HIPERLAN. Kluwer Academic Publishers, 2005.

[49] B. Mcnair, L. J. Cimini, and N. R. Sollenberger, "A robust timing and frequency offset estimation scheme for orthogonal frequency division multiplexing (OFDM) systems," IEEE 49th Vehic. Tech. Conf., vol. 1, pp. 690 - 694, 1999.

[50] H. Zou, B. McNair, and B. Daneshrad, "An integrated OFDM receiver for highspeed mobile data communications," IEEE GLOBECOM '01 Global Telecommunications Conference, vol. 5, pp. 3090 - 3094, 2001.

Bibliography 185

[51] IEEE, Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications High-speed Physical Layer in the 5 GHz Band, 1999.

[52] Y. Shayan and T. Le-Ngoc, "All digital phase-locked loop: concepts, design and applications," IEE Proceeding of Radar and Signal Processing, vol. 136, pp. 53 -56, 1989.

[53] J. G. Proakis and D. G. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications, 3 r d ed. Prenctice Hall, 1996.

[54] X. Tang, M. S. Alouini, and A. J. Goldsmith, "Effect of channel estimation error on M-QAM BER performance in Rayleigh fading," IEEE Trans. Commun., vol. 47, pp. 1856-1864, 1999.

[55] Y. Qiao, S. Yu, P. Su, and L. Zhang, "Research on an iterative algorithm of LS channel estimation in MIMO OFDM systems," IEEE Trans. Broadcast, vol. 51, pp. 149 - 153, 2004.

[56] Xilinx, Xilinx ISE 10.1 Design Suite Software Manuals and Help, 2008.

[57] S. K. Mitra, Digital Signal Processing: A Computer-Based Approach. McGraw Hill, 2006.

[58] F. Churchill, G. Ogar, and B. Thompson, "The correction of I and Q errors in a coherent processor," IEEE Trans. Aerosp. Electron. Syst, vol. AES-1, pp. 131 -137, 1981.

[59] J. Voider, "The CORDIC trigonometric computing technique," IRE Trans.Electron. Comput, vol. EC-8, pp. 330-334, 1959.

[60] Xilinx, Divider vl.0, 2006.

[61] IEEE, Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications - High-speed Physical Layer in the 5 GHz Band, 2003.

[62] Xilinx, Xilinx Virtex-II Multimedia Development Board User Guide, 2002.

[63] K. Zhang and Y. L. Guan, "Simplified BER formulations for MC-CDMA systems with orthogonality restoring combining," in The Fourth Pacific Rim Conference on Multimedia, 2003, pp. 668- 672.

[64] T. S. Rappaport, Wireless commnuniations: Principles and Practice. Prentice Hall, 1996.

[65] A. Papoulis and S. U. Pillai, Probability, Random Variables and Stochastic Processes. McGraw Hill, 2002.

Bibliography 186

[66] M. R. Spiegel and J. Liu, Mathematical Handbook of Formulars and Tables. McGraw Hill, 1999.

[67] D. J. Young and N. C. Beaulieu, "The generation of correlated Rayleigh random variâtes by inverse discrete Fourier transform," IEEE Trans. Commun., vol. 48, pp. 1114-1127, 2000.

[68] W. Cheney and D. Kincaid, Numerical Mathematics and Computing, Sixth Edition. Bob Pirtle, 2008.

[69] Comblab, Quad Dual Band RF Transceiver, 2007.

Design and implementation of a downlink MC-CDMA receiver

Documents

Transcript of Design and implementation of a downlink MC-CDMA receiver