Performance analysis of the DCT-LMS adaptive filtering algorithm
Transcript of Performance analysis of the DCT-LMS adaptive filtering algorithm
*Corresponding author. Tel.: 44-171-594-6220; fax: 44-171-823-8125.E-mail address: [email protected] (D.I. Kim).
Signal Processing 80 (2000) 1629}1654
Performance analysis of the DCT-LMS adaptive"ltering algorithm
Dai I. Kim*, P. De Wilde
Department of Electrical and Electronic Engineering, Imperial College of Science, Technology and Medicine, The University of London,Exhibition Road, London SW7 2BT, UK
Received 19 July 1999; received in revised form 10 December 1999
Abstract
This paper presents the convergence analysis result of the discrete cosine transform-least-mean-square (DCT-LMS)adaptive "ltering algorithm which is based on a well-known interpretation of the variable stepsize algorithm. Thetime-varying stepsize of the DCT-LMS algorithm is implemented by the modi"ed power estimator to redistribute thespread power after the DCT. The performance analysis is considerably simpli"ed by the modi"cation of a powerestimator. First of all, the proposed DCT-LMS algorithm has a fast convergence rate when compared to the LMS, thenormalised LMS (NLMS), the variable stepsize LMS (VSLMS) algorithm for a highly correlated input signal, whilstconstraining the level of the misadjustment required by a speci"cation. The main contribution of this paper is thestatistical performance analysis in terms of the mean and mean-squared error of the weight error vector. In addition, thedecorrelation property of the DCT-LMS is derived from the lower and upper bounds of the eigenvalue spread ratio,j.!9
/j.*/
. It is also shown that the shape of sidelobes a!ecting the decorrelation of the input signal is governed by thelocation of two zeros. Theoretical analysis results are validated by the Monte Carlo simulation. The proposed algorithmis also applied in the system identi"cation and the inverse modelling for a channel equalisation in order to verify itsapplicability. ( 2000 Elsevier Science B.V. All rights reserved.
Zusammenfassung
In dieser Arbeit wird eine Konvergenzanalyse des `discrete cosine transform-least mean squarea (DCT-LMS)adaptiven Filteralgorithmus praK sentiert, welche auf einer bekannten Interpretation des Variablen SchrittgroK {e-Algorith-mus beruht. Die zeitvariante SchrittgroK {e des DCT-LMS-Algorithmus wird durch den modi"zierten LeistungsschaK tzerimplementiert, um die gestreute Leistung nach der DCT umzuverteilen. Die Analyse der LeistungsfaK higkeit wird durchdie Modi"kation eines LeistungsschaK tzers erheblich vereinfacht. Der vorgeschlagene DCT-LMS-Algorithmus besitztverglichen mit dem LMS-Algorithmus, dem normierten LMS-Algorithmus (NLMS-Algorithmus) und dem LMS-Algorithmus mit variabler SchrittgroK {e (VSLMS-Algorithmus) eine schnelle Konvergenzrate fuK r ein stark korreliertesEingangssignal, wobei die durch eine Spezi"kation geforderte GroK {e der Fehleinstellung eingeschraK nkt wird. DerHauptbeitrag dieser Arbeit liegt in der statistischen Analyse der LeistungsfaK higkeit hinsichtlich des Mittelwerts und desmittleren quadratischen Fehlers des Gewichts-Fehlervektors. ZusaK tzlich wird die Dekorrelationseigenschaft des DCT-LMS aus der unteren und oberen Schranke der Konditionszahl j
.!9/j
.*/abgeleitet. Es wird weiters gezeigt, da{ die
Form von Nebenmaxima, die die Dekorrelation des Eingangssignals beein#ussen, durch die Lage zweier Nullstellen
0165-1684/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved.PII: S 0 1 6 5 - 1 6 8 4 ( 0 0 ) 0 0 0 9 8 - 0
bestimmt wird. Die Ergebnisse der theoretischen Analyse werden durch Monte Carlo-Simulation uK berpruK ft. Dervorgeschlagene Algorithmus wird zur Veri"zierung seiner Anwendbarkeit auch auf die Systemidenti"kation und inverseModellierung im Rahmen einer Kanalentzerrung angewandt. ( 2000 Elsevier Science B.V. All rights reserved.
Re2 sume2
Cet article preH sente le reH sultat de l'analyse de convergence de l'algorithme de "ltrage adaptatif par transformation encosinus discret et moindre carreH s moyens (DCT-LMS), qui repose sur l'interpreH tation bien connue de l'algorithme a taillede pas variable. La taille de pas variant dans le temps de l'algorithme DCT-LMS est mise en wuvre par l'estimateur depuissance modi"eH pour redistribuer la puissance eH taleH e apres la DCT. L'analyse de performance est consideH rablementsimpli"eH e par la modi"cation de l'estimateur de puissance. Tout d'abord, l'algorithme DCT-LMS proposeH a un taux deconvergence rapide en comparaison avec les algorithmes LMS, LMS normaliseH , et LMS a taille de pas variable pour unsignal d'entreH e hautement correH leH , tout en contraignant le niveau de meH sajustement demandeH par les speH ci"cations. Laprincipale contribution de cet article est l'analyse de performances statistique en termes de l'erreur moyenne et de l'erreurquadratique moyenne du vecteur d'erreur des poids. De plus, la proprieH teH de deH correH lation du DCT-LMS est deH riveH e desbornes infeH rieures et supeH rieures du rapport d'eH talement des valeurs propres j
.!9/j
.*/. Nous montrons aussi que la forme
des lobes lateH raux a!ectant la deH correH lation du signal d'entreH e est gouverneH e par la position de deux zeH ros. Des reH sultatsd'analyse theH orique sont valideH s par une simulation Monte Carlo. L'algorithme proposeH est aussi appliqueH en identi"ca-tion de systemes et en modeH lisation inverse pour l'eH galisation de canal a"n de veH ri"er son applicabiliteH . ( 2000 ElsevierScience B.V. All rights reserved.
Keywords: DCT-LMS adaptive "lter; Eigenvalue spread ratio
1. Introduction
Adaptive "ltering algorithms based on the stochastic gradient method are widely used in many applica-tions such as system identi"cation, noise cancellation, active noise control and communication channelequalisation. The least mean square (LMS) which belongs to the stochastic gradient-type algorithm has beenthe focus of much study due to its simplicity and robustness. However, it is well known that the convergencerate is seriously a!ected by the correlation of an input signal. To circumvent this inherent limitation, manyalgorithms have been implemented.
As one of popular approaches, the transform domain least-mean-square (TDLMS) adaptive "lteringalgorithms [3,11,18,19,21,8, pp. 208}238] have been developed to improve a slow convergence rate caused byan ill-conditioned input signal.
In 1983, Narayan [19] "rst introduced the TDLMS algorithm which uses the orthogonal transformmatrices of the discrete Fourier transform (DFT) and the discrete cosine transform (DCT). The enhancedconvergence rate when compared with the conventional LMS algorithm was veri"ed empirically. However,focus was not placed on theoretical analysis. The performance was judged purely by computer simulation. In1988, Florian [11] analysed the performance of the weighted normalised LMS algorithm via exponentialweighted parameters. It was analysed only for the mean behaviour of weights. However, a general derivationwas not obtained. In 1989, Marshall [18] investigated the convergence property through the computersimulation for several unitary transform matrices. In his work, transform domain processing was character-ised by the e!ect of the transform on the shape of the error performance surface. In 1995, Beaufay [3] alsostudied analytically the behaviour of the eigenvalue spread for a "rst-order Markov process in the discreteFourier transform least-mean-square (DFT-LMS) and the discrete cosine transform least-mean-square(DCT-LMS) algorithms. In most recent work (1997) [21], Parikh proposed the modi"ed escalator structureto improve the performance of the LMS adaptive "lter. The algorithm utilised the sparse structure of thecorrelation matrix. The sparse structure is extracted from the unitary transform matrix of the DCT to be
1630 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
applied in the escalator structure of the lattice model. This "lter is not an e$cient "ltering structure in that itemploys two transform layers: a unitary transform matrix and an escalator structure of the lattice model. The"rst transform layer by a unitary transform matrix does not a!ect the convergence speed because a correlatedinput signal is decorrelated by the escalator structure of the second layer.
As another alternative technique to overcome a slow convergence rate, the variable step-size LMS(VSLMS) algorithms [1,7,13,17] have been developed to enhance the convergence rate and to reduce themisadjustment error in the state space. However, they might not be also e!ective for a highly correlated inputsignal. This is because the dynamic range of the variable stepsize is restricted by a directional convergencenature. In addition, the normalised LMS (NLMS) might be e$cient and robust algorithm for the nonstation-ary input process, however, it also su!ers from a slow convergence speed if driven by a highly correlatedinput signal. To resolve this problem, Ozeki [20] and Rupp [22] proposed the so-called a$ne projectionalgorithms to decorrelate an input signal.
In the "rst part of this paper, we analyse the decorrelation properties by the measure of the eigenvaluespread ratio, the complementary spectrum principle and the pole-zero location. It has been known that theDCT decorrelates e!ectively the input signal whose power spectrum lies in the low-frequency band.Boroujeny [9] explained intuitively the decorrelation feature of the DCT for the lowpass input process fromthe "ltering viewpoint. However, this work did not show analytically how the DCT can decorrelate the inputsignals with the low-frequency input spectrum, similarly to the Karhunen}Loeve transform (KLT).
In the second part of this paper, we analyse the convergence behaviour of the DCT-LMS adaptive "lteringalgorithm which is based on a well-known interpretation of the variable stepsize algorithm. A time-varyingstepsize is implemented by the modi"ed power estimator to redistribute the spread power after thetransformation. This modi"cation makes the performance analysis simple.
As we have investigated the previous work relevant to the transform domain adaptive "ltering structure[3,11,18,19,21], so far, only a limited analysis of the TDLMS algorithm has been performed due to thedi$culty of the analytical derivation for the normalisation term. The exponential weighted method isgenerally used for obtaining the convergence parameter k
i(n) which is
ki(n)"
ko
PKi(n)
"
ko
(1!b)+=k/0
bkDxi(n!k)D2
(1)
at the ith bin of transform domain, where b3[0,1], PKiis the power estimator, k
idenotes elements of the
diagonal matrix de"ned as diag[ki(n), i"0,2, N!1] and x
i(n) is the transformed input signal at the ith
bin. The exponential weighted parameter also has the recursive form
PKi(n)"bPK
i(n!1)#(1!b)Dx
i(n)D2. (2)
In this paper, we propose the modi"ed power estimator based upon (1)
ui(n)"c(1!b)
=+k/0
bk1
e#(1/M)xTi(n!k)x
i(n!k)
, (3)
where b3[0,1], c3[0,1], 0(e@1, i"0,2,N!1, and M denotes the size of sample to estimate the powerat the ith bin after transformation.
The main contribution of this paper is the statistical performance analysis of the DCT-LMS adaptive"ltering algorithm based on the modi"ed power estimator. In addition, the decorrelation properties of theDCT is described from the lower and upper bounds of the eigenvalue spread ratio. In particular, it is shownthat the shape of sidelobes a!ecting the decorrelation of the input signal is governed by the location of twozeros. The theoretical analysis results are validated by the Monte Carlo simulation.
The rest of this paper is organised as follows. In Section 2, the decorrelation properties are investigated. InSection 3, the DCT-LMS adaptive "lter via the modi"ed power estimator is described. In Section 4, the
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1631
convergence behaviour of the proposed algorithm is analysed. In Section 5, the computer simulation isundertaken in the system identi"cation and the channel equalisation examples to verify the performance ofthe proposed DCT-LMS algorithm. The simulation results are compared to the standard LMS, the NLMSand the VSLMS algorithms [17]. Conclusions are then given in Section 6.
2. Decorrelation properties of DCT
The convergence speed of the TDLMS depends on the condition number or eigenvalue spread ratio of thetransformed autocorrelation matrix which is typically much smaller than that of the input autocorrelationmatrix. To support this fact in the following theorem, the decorrelation property of the TDLMS based uponan orthogonal transform matrix is derived from the eigenvalue spread ratio.
Theorem 1. Let Ruu3RNCN be a correlation matrix of a wide sense stationary discrete time stochastic process.The eigenvalue spread ratio of RTT3RNCN transformed by the optimal transform matrix is always less than orequal to that of Ruu .
Proof. The eigenvalue spread ratios of Ruu and RTT"K~1p
¹M
Ruu¹TM
K~1p
transformed by an optimaltransform matrix are de"ned as follows:
s(Ruu)"DDRuu DD2 DDR~1uu DD2, (4)
s(RTT)"DDK~1p
¹M
Ruu¹TM
K~1p
DD2DDK
p¹
MR~1uu ¹T
MK
pDD2, (5)
where ¹M
is an optimal transform matrix, i.e., the Karhunen}Loeve transform (KLT), DD ) DD2
indicates theEuclidean or l
2matrix norm, ( ) )T denotes the transpose of a matrix/vector, and K
presults from the square
root of the power of the input signal with diagonal elements, K2p"diag [¹
MRuu¹T
M]. When the autocorrela-
tion matrix Ruu is diagonalised by the KLT and then RTT is normalised by the power of the transformedmatrix ¹
MRuu¹T
M, the transformed matrix RTT is always an identity matrix. The lower bound on the
eigenvalue spread ratio is s(A)"DDADD2DDA~1DD
2*DDAA~1DD
2*DDIDD
2*1 for an arbitrary square matrix A.
Thereby, the eigenvalue spread ratio s(Ruu) of an input signal is greater than or equal to unity. Hence, s(RTT )is always less than s(Ruu). Note that s(RTT) is equal to s(Ruu) if the input signal is white.
The asymptotic equivalence between the DCT and the KLT was proved for the "rst-order Markov processin [2,6]. The Toeplitz matrix Ruu is asymptotically equivalent to the circulant matrix Cuu for a higher orderand Cuu is diagonalised by the DFT, i.e., K2
p+¹
MCuu¹T
M[12, pp. 18}41]. Accordingly, if the transform
matrix ¹M
is a suboptimal transform matrix such as the DFT and DCT, this theorem is satis"edapproximately.
Regarding Theorem 1, let us derive the lower and the upper bounds of s(Ruu) for the suboptimal transformmatrix. We can rewrite (5) by applying the mutual consistency property of the matrix norm
s(RTT)"DDK~1p
¹M
Ruu¹TM
K~1p
DD2DDK
p¹
MR~1uu ¹T
MK
pDD2
)DDK~1p
DD2
DD¹M
Ruu¹TM
DD2
DDK~1p
DD2
DDKpDD2
DD¹M
R~1uu ¹TM
DD2
DDKpDD2. (6)
A sequence of unitary transformations does not change norms by a unitary invariance property,DDJBGDD
2"DDBDD
2for an M]N matrix B if, and only if J and G are orthonormal, i.e., JTJ"I
Mand GTG"I
N.
Therefore, in (6), DD¹M
Ruu¹TM
DD2
and DD¹M
R~1uu ¹TM
DD2
are equal to DDRuu DD2 and DDR~1uu DD2, respectively, and we can
rewrite (6) as
s(RTT))DDK~1p
DD2DDRuu DD2 DDK~1
pDD2DDK
pDD2DDR~1uu DD
2DDK
pDD2"s2(K
p)s(Ruu)+s2(Ruu), (7)
1632 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
where s2(Kp) is approximately equal to s(Ruu). We note that s(Ruu ) is bounded as
s(Ruu)"j.!9
j.*/
)
S.!9
S.*/
, (8)
where S.!9
and S.*/
denote the maximum and minimum power spectral density, respectively. This impliesthat as the dimension N of the correlation matrix Ruu approaches in"nity, j
.!9approaches S
.!9, and
j.*/
approaches S.*/
[14, pp. 170}171].Consequently, the upper and the lower bounds of s(Ruu) from (7) and (8) is given by
Js(RTT))s(Ruu))S.!9
S.*/
. (9)
The minimum value of the eigenvalue spread ratio is s(A)*1 for an arbitrary square matrix A. Therefore, the
lower bound of (9) implies that Js(RTT ) is greater than or equal to unity. h
Let us consider the spectrum of eigenvectors of the KLT which is an optimal data-dependent transformand is capable of decorrelating any input signal perfectly to compare it with the DCT transformation. TheKLT generates eigenvectors from the covariance matrix of the input process such that the power spectrum ofeigenvectors complements the input power spectrum. The complementary spectrums make the biased powerdistribution of the input signal #at exactly. We call this complementary spectrum principle.
Theorem 2. Let ji, i"1,2,2,N, denote the eigenvalues of the correlation matrix Ruu3RNCN of a wide sense
stationary discrete time stochastic process and qi
denote the eigenvectors. The eigenvector qi
can be viewed as theith eigenxlter which consists of a set of FIR xlters equivalent to q
i, i"1,2,2,N, as a xlter bank. A xlter bank
constructed from the eigenvectors has a complementary spectrum to the input spectrum Suu(e+w).
Proof of this theorem is given in Appendix A.To verify this theorem, we present the simulation results in Figs. 1 and 2. Fig. 1 shows the complementary
spectrum between the lowpass spectrum and the averaged power spectrum DQMi(e+w)D2 of the eigen"lter bank.
Fig. 2 depicts the complementary spectrum of a bandpass spectrum.According to the complementary spectrum principle of Theorem 2, we show the asymptotical equivalence
between the DCT and the KLT. It results from the unbalanced shape of sidelobes for the input signal withlowpass power spectral density.
Fig. 3 depicts the frequency spectrum of the DCT "lter bank. Here, dotted lines presents the bandpass "lterbank of 16 order, and the bold lines represent spectra of DH(w)D
9and DH(w)D
12to show the asymmetrical
sidelobes which a!ect directly the decorrelation of the input signal where DH(w)D9
and DH(w)D12
denote themagnitude of 9th and 12th bandpass "lter, respectively. The averaged frequency spectrum of the DCT shownin Fig. 4 (lower) suggests that the low-frequency magnitude is smaller than that of high-frequency band suchas in a highpass "lter spectrum. The bold line of Fig. 4 (upper) also clearly shows the feature of sidelobe ofDH(w)D
12.
It is shown that the signal of the low-frequency spectrum is decorrelated e!ectively by the DCT whencompared with that of the high-frequency band spectrum. Clearly, this fact is because the low-frequencysidelobes are smaller than sidelobes laid in the high-frequency range for the bank number i*N/2.Fig. 5 shows that the spectrum of the DCT is approximately equivalent to that of KLT for the signalspectrum with a low-frequency spectral density.
In Fig. 5, averaged frequency spectrums of the KLT are obtained from bandpass signals (e.g. Figs. 1 and 2)with centre frequencies of f
0"0.05 and 0.25. The eigenvalue spread ratio of the bandpass signal whose centre
frequency is located at 0.05 is equal to 746.96. After the DCT, it dropped to 4.069, whereas the eigenvalue
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1633
Fig. 1. Complementary spectrum (&o') of the lowpass signal (solid line).
Fig. 2. Complementary spectrum (&o') of the bandpass signal (solid line, f0"0.25).
spread ratio of the bandpass signal with f0"0.25 dropped to 25.4 from 293.21. It is shown that input signals
located in the low-frequency band can be e!ectively decorrelated by the DCT, this includes any low-frequency components of a bandpass signal.
Secondly, we investigate the decorrelation property of the DCT based upon the pole}zero diagram. TheDCT performs linear transformation from the input vector u(n)3RN to the output vector x(n)3RN by an
1634 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
Fig. 3. The frequency spectrum of the DCT for N"16 (bold line: the magnitude of DH(w)D9
and DH(w)D12
).
Fig. 4. The averaged frequency spectrum of the DCT (upper: the "lter bank spectrum for N"16 and lower: the averaged frequencyspectrum.
N]N unitary transform matrix. Each transformation is characterised by an impulse response as
hD(i, l)"S
2
NK
i
cos(i(l#12)p)
N,
(10)
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1635
Fig. 5. Averaged frequency spectrum of the DCT (dotted line) and the eigenvectors of the KLT [&o'"with the centre frequency f0"0.05
and &#'" with the centre frequency f0"0.25] for the input process with the bandpass power spectral density [s(Rxx )"746.96,
s(RTT)"4.063 with f0"0.05 and s(Rxx)"293.21, s(RTT )"25.14 with f
0"0.25].
where Ki"1/J2 for i"0, K
i"1 otherwise, i and l"0,2,N!1. The ith transfer function in
Z-transform domain is equal to
Hi(z)"S
2
NK
icosA
ip2NB
(1!z~1)(1!(!1)iz~N)
1!2 cos(ip/N)z~1#z~2. (11)
The transfer function of (11) represents a bank of bandpass "lters with two poles and N#1 zeros. Its polesand zeros are given by
Pi1,2
"expA$jip
NB, i"0,1,2, N!1, (12)
Zk"G
expAj2kp
N B, i: even and k"0,1,2,N!1,
expAj(2k#1)p
N B, i: odd and k"0,1,2, N!1.
The w centre frequencies of ith DCT "lter bank, w0"exp[$j(ip/N)], lie in the positive and negative
frequency plane as shown in Fig. 6 and each conjugate pole and zero pair is located symmetrically on the unitcircle with the central frequency in the centre. The DCT "lter bank is equal to the all zero "nite impulseresponse (FIR) "lter because conjugate poles Pi
1,2and zeros Z
kDi/k
are cancelled by each other. Hence, anasymmetrical shape of sidelobes is a direct consequence of the cancellation of two zeros and poles.
Lemma 1. Let the magnitude response of the left and right sidelobes of the DCT be DH(w)DL
and DH(w)DR. Then, the
magnitude DH(w)DL
of the left sidelobe is always smaller than the magnitude DH(w)DR
of the right sidelobe in thespectrum of the DCT.
1636 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
Fig. 6. Pole and zero diagram of the DCT H9(z).
Proof. Let the centre frequency be located in the high-frequency region where i*N/2. The magnituderesponse is the product of the length of the zero vector D<
RDZi
or D<LDZi
divided by the product of the length ofthe pole vectors D<
RDPi
or D<LDPi. The magnitude responses of the left and right sidelobes can be expressed as
DH(w)DL"
<N~1i/0
D<LDZi
<N~1i/0
D<LDPi
, DH(w)DR"
<N~1i/0
D<RDZi
<N~1i/0
D<RDPi
, (13)
where the subscript L and R represent the left and right sidelobes. Therefore, the symmetrical zero vectorsfrom (13) can be expressed by DK
LDZ and DK
RDZ, except for one zero vector placed at w"0 and an unbalanced
zero vector created by the pole}zero cancellation. Thereby, (13) can be rewritten as
DH(w)DL"D<
LDZ1.D<
LDZ2.DK
LDZ, DH(w)D
R"D<
RDZ1.D<
RDZ2.DK
RDZ, (14)
where D<LDZ1, D<
LDZ2, D<
RDZ1
and D<RDZ2
are the zero vectors made by a zero located in w"0 and anasymmetrical zero, respectively. The length of the left sidelobe D<
LDZ1
or D<LDZ2
is smaller than that of the rightsidelobe, D<
RDZ1
or D<RDZ2
because two zero vectors exist in the low-frequency region. Hence, DH(w)DL
is alwayssmaller than DH(w)D
Rbecause DK
LDZ"DK
RDZ2
in (14). The pole and zero diagram explained by the lemma aboveis shown in Fig. 6. h
3. DCT-LMS adaptive 5lter via the modi5ed power estimator
The block diagram of the adaptive plant modelling using the transform domain/layered adaptive "lter isgiven in Fig. 7. The system output error and the desired signal shown in Fig. 7 can be denoted by
e(n)"d(n)!wT(n)x(n), (15)
d(n)"wT015
(n)x(n)#mo(n), (16)
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1637
Fig. 7. Block diagram of adaptive plant modeling using the transform domain adaptive "ltering.
where w(n)3RN and w015
(n)3RN denote the "lter weight vector and the unknown system parameter,respectively, x(n)3RN is the transformed input data vector, and m
o(n) is the plant measurement/disturbance
noise.The DCT-LMS "lter to model an adaptive system is implemented by transforming an input signal u(n)
into x(n)"¹M
u(n) where u(n)"[u(n) u(n!1),2, u(n!N#1)]T is the tap delayed input data vector,x(n)"[x
0(n) x
1(n),2,x
N~1(n)]T is the transformed data vector and ¹
Mis the DCT unitary transform
matrix.The weight update equation of the DCT-LMS is given by
w(n#1)"w(n)#l(n)e(n)x(n), (17)
where l(n) is a variable stepsize with the diagonal matrix de"ned as l(n)"diag[ki(n), i"0,2,N!1].
Here, a new type of variable stepsize based upon the modi"ed power estimator is de"ned by
ki(n#1)"bk
i(n)#c(1!b)C
1
e#(1/M)xTi(n)x
i(n)D (18)
which are estimates of the reciprocal values of the input power at ith bin of DCT to redistribute the spreadpower of the transformed input signal.
The property of a weight vector in the transform domain is investigated in the following lemma.
Lemma 2. The weight vector of the DCT-LMS algorithm converges to the transformed optimal weight vector as
limn?=
w(n)"¹M
w@015
. (19)
Proof. Let w@015
be the unknown system parameter in a stationary environment, then the desired responsed(n) can be written as
d(n)"w@T015
u(n)#mo(n). (20)
1638 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
In (20), u(n) can be replaced with u(n)"¹~1M
x(n) since the unitary transform matrix, ¹M
is invertible.Therefore, Eq. (20) can be rewritten as
d(n)"w@T015
¹~1M
x(n)#mo(n),"(¹
Mw@015
)Tx(n)#mo(n), (21)
where ¹TM¹
M"¹
M¹T
M"I and ¹~1T
M"¹T
M. We can write the "lter output error as
e(n)"(¹M
w@015
)Tx(n)#mo(n)!w@T(n)x(n),"m
o(n)![w@T(n)!(¹
Mw@015
)T]x(n). (22)
In (22), limn?=
w(n)"¹M
w@015
since e(n)+mo(n) in a stationary environment. h
Hence, the optimal weight vector, w015
(n) from (16) is regarded as transformed vector set in the rest of thispaper.
Computational complexity of the proposed DCT-LMS algorithm is 4N#MN (multiplications) 3N#M(additions). Notice that conventional transform domain adaptive "ltering requires 5N (multiplications) and3N (additions). Therefore, additional computations of the modi"ed algorithm depends on the number of datasample, M. However, it should be noted that the computational complexity of the DCT-LMS algorithm stillkeeps the O(N) whilst maintaining a fast convergence speed. The complexity of the proposed algorithm andthe conventional DCT-LMS algorithm is the same for M"1. Computer simulation is undertaken for M"1and 10. Notice that the analysis for M"1 should be considered in di!erent approach (the condition of size ofM is given in Appendix B).
To make the convergence analysis of stochastic gradient-based algorithms mathematically more tractable,we introduce the following fundamental assumptions.
Assumption 1. Gaussian data assumption:f The input vector x(n) and desired signal d(n) are zero mean, wide sense stationary and jointly Gaussian
random processes with "nite variance.
Assumption 2. Independence assumption:f The input vectors x(1), x(2),2, x(n) constitute a sequence of statistically independent vectors.f At time n, the input vector x(n) is statistically independent of all previous samples of the desired signal,
namely d(1), d(2),2, d(n!1).f At time n, the desired signal d(n) is dependent on the corresponding vector x(n), but statistically
independent of all previous samples of the desired signal.
These assumptions have been used in the adaptive signal processing literature [1,4,10,14,15,17,23,25,26].
Assumption 3. Averaging principle: Averaging principle is one of the independence assumptions described inAssumption 2. Let x(n) and y(n) be two jointly stationary processes such that y(n) is slowly varying withrespect to x(n). Then the random variable x(n) is almost independent of the random variable y(n). By this wemean that
E[ f (x( ) ))g(y( ) ))]+E[ f (x( ) ))]E[g(y( ) ))]. (23)
In practice, this assumption is applied in
E[l(n)e(n)x(n)]+E[l(n)]E[e(n)x(n)], (24)
where l(n) is time-varying variable stepsize and x(n) and e(n) are the input data vector and output error,respectively.
This assumption is true if l(n) is a constant. However, this cannot really hold for this algorithm. We can saythat it is approximately true. This is since l(n) will vary slowly around its mean value when compared to e(n)
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1639
and x(n) if c from (18) is set to be very small. Therefore, this justi"es the independence assumption of l(n) andl2(n) with e(n), x(n) and w(n). This assumption allows us to derive the theoretical result. Making thisassumption is not an uncommon practice in the adaptive signal processing literature [1,17,23]. The feasibilityof these assumptions is veri"ed through the computer simulation.
We also de"ne the nonstationarity is due to the variation of the optimal coe$cients according to a randomwalk model given by
w015
(n#1)"w015
(n)#N0(n), (25)
where N0(n)3RN is a zero mean, Gaussian random process vector with "nite covariance matrix K3RNCN
de"ned by K"E[N0(n)NT
0(n)]"p2N
0I. For a stationary environment, p2N
0"0, and w
015(n)"w
015.
To evaluate the "rst and second moment behaviours of the "lter coe$cients, two additional assumptionsare employed as follows: w
015(n) in (25) is independent of x(n) and e
.*/(n). In the second assumption, it is valid
that d(n) is Gaussian when #uctuations of w015
(n) is far less than those of x(n) and e.*/
(n) wheree.*/
(n)"d(n)!wT015
(n)x(n). The analysis with these assumption produces some useful results for analysis anddesign of adaptive "ltering system.
4. Performance analysis
4.1. Behaviour of the mean weight vector
By inserting (15), (16) and (25) into (17), the weight error/misallignment vector can be obtained by
*(n#1)"[I!l(n)x(n)xT(n)]*(n)#l(n)mo(n)x(n)!N
0(n), (26)
where *(n)"w(n)!w015
(n) is the weight error vector. It is derived using Assumption 2 and the uncorrelate-ness of l(n) with x(n), i.e. Assumption 3 and e(n) [1,10,17]. Taking expectations and applying the indepen-dence assumption of l(n) with e(n), w(n) and x(n) from (26) gives
E[*(n#1)]"(I!E[l(n)]E[x(n)xT(n))E[*(n)]"(I!E[l(n)]Rxx)E[*(n)]. (27)
Eq. (27) is stable if and only if <nk/0
[I!E[ki(k)]Rxx]P0 as nPR. Accordingly, the following su$cient
condition can be obtained:
0(E[ki(n)](
2
j.!9
∀i, (28)
where j.!9
is the maximum eigenvalue of the transformed autocorrelation matrix, ¹M
Ruu¹TM
. However, theconvergence of the mean weight vector cannot guarantee the convergence of the mean-squared error. Notethat the upper bound of (28) is equivalent to that of the standard LMS algorithm because ¹
MRuu¹T
Mis
transformed without normalisation.In the case of conventional algorithm by (1) and (2), Eq. (26) can be expressed by
*(n#1)"CI!1
l(n)x(n)xT(n)D*(n)#
1
l(n)mo(n)x(n)!N
0(n). (29)
In order to analyse the performance in terms of the mean and mean-squared error, unusual assumptions arenecessary from (29), i.e., E[B/A]"E[B]/E[A] or E[1/A]E[B]"E[B]/E[A] for two random variableA and B. Although an independence assumption is applied in (29), an analysis is still complicate due toE[1/l(n)]. In this spirit, the modi"ed time-varying stepsize by the reciprocal power estimator providesa simple analysis way based upon a variable stepsize algorithm.
1640 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
4.2. Behaviour of the mean-squared error
We will now derive the mean-squared error when operating in stationary and nonstationary environments[10,15,26]. The behaviour of the mean-squared error of the weight error vector is derived in the nonstation-ary environment by a random walk model. We "rst derive the mean-squared error using (15) and (16) as
E[e2(n)]"E[(mo(n)!*T(n)x(n))(m
o(n)!xT(n)*(n))]
"m.*/
#E[xT(n)*(n)*T(n)x(n)]!2E[*T(n)x(n)e.*/
(n)]
"m.*/
#tr[RxxC(n)], (30)
where e.*/
(n)"d(n)!wT015
(n)x(n), m.*/
is the minimum mean-squared error (MMSE) de"ned by E[m2o(n)] or
E[e2.*/
(n)], C(n) is the covariance matrix of the weight error vector de"ned by C(n)"E[*(n)*T(n)]3RNCN,Rxx3RNCN is de"ned by E[x(n)xT(n)] and tr[ ) ] denotes a trace operator of a matrix.
We investigate the convergence behaviour of the covariance matrix of the weight error vector. For this, wecalculate the outer product of (26) by itself and take expectations on both sides. We obtain
C(n#1)"C(n)!E[l(n)][C(n)Rxx#RxxC(n)]#E[l2(n)]E[e2.*/
(n)x(n)xT(n)]
!2l2(n)E[e.*/
(n)*T(n)x(n)x(n)xT(n)]#E[l2(n)]E[*T(n)x(n)xT(n)*T(n)x(n)xT(n)]
#E[N0(n)NT
0(n)]. (31)
The "fth term of (31) can be simpli"ed by the moment factorisation for Gaussian random variables [10,17].The input autocorrelation matrix can be expressed as Rxx"QKxxQT because Rxx is symmetric where
Kxx"diag[j0, j
1,2,j
N~1] are the eigenvalues of Rxx , Q is the modal matrix of Rxx , and QQT"I and
Q~1"QT. Furthermore, let C @(n)"QC(n)QT. Therefore, (30) and (31) can be rewritten by this similaritytransform
E[e2(n)]"m.*/
#tr[KxxC@(n)], (32)
C @(n#1)"C @(n)!E[l(n)][C @(n)Kxx#KxxC @(n)]#E[l2(n)][m.*/
Kxx#tr[C @(n)Kxx]Kxx
#2KxxC @(n)Kxx]#p2NI. (33)
The (i, j)th element, c@ij(n) of the matrix C@(n) can then be identi"ed using (33) as
c@ij(n#1)"[1!E[k
i(n)](j
i#j
j)#2E[k2
i(n)]j
ijj]c@
ij(n)
#E[k2i(n)]j
iCm.*/#
N~1+
m/0
jmc@mm
(n)Dd(i!j)#p2N , (34)
d(i!j)"G1 if i"j,
0 otherwise.
Therefore, the mean-squared error of (32) can be obtained by the diagonal terms of (34). Following the sameresult in [10,17] a su$cient condition that ensures convergence of the mean-squared error is
0(E[l2(R)]
E[l(R)](
2
3tr[Rxx], (35)
where E[l(R)] and E[l2(R)] are the steady-state values of E[l(n)] and E[l2(n)].
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1641
Mean and mean-squared values of the variable stepsize from (18) can be given as follows:
E[ki(n#1)]"bE[k
i(n)]#c(1!b)EC
1
e#(1/M)xTi(n)x
i(n)D , (36)
E[k2i(n#1)]"b2E[k2
i(n)]#2bc(1!b)E[k
i(n)]EC
1
e#(1/M)xTi(n)x
i(n)D
#c2(1!b)2ECA1
e#(1/M)xTi(n)x
i(n)B
2
D. (37)
In (36) and (37), E[1/(e#(1/M)xTi(n)x
i(n))] and E[1/(e#(1/M)xT
i(n)x
i(n))2] can be derived by explicit forms
assuming Gaussian random processes
EC1
e#(1/M)xTi(n)x
i(n)D"EC
(1/M)xTi(n)x
i(n)!e
((1/M)xTi(n)x
i(n))2!e2D
+EC1
(1/M)xTi(n)x
i(n)D!eECA
1
(1/M)xTi(n)x
i(n)B
2
D"
M
p2xi
(M!2)!e
M2
p4xi
(M!4)(M!2), (38)
ECA1
e#(1/M)xTi(n)x
i(n)B
2
D+ECG1
(1/M)xTi(n)x
i(n)
!eA1
(1/M)xTi(n)x
i(n)B
2
H2
D"ECA
1
(1/M)xTi(n)x
i(n)B
2
D!2eECA1
(1/M)xTi(n)x
i(n)B
3
D# e2ECA
1
(1/M)xTi(n)x
i(n)B
4
D"
M2
p4xi
(M!4)(M!2)!2e
M3
p6xi
(M!4)(M!2)
#e2M4
p8xi
(M!8)(M!6)(M!4)(M!2), (39)
where e2@1/[(1/M)xTi(n)x
i(n)]2 is assumed. In (38) and (39), expectation terms can be obtained by
ECA1
(1/M)xTi(n)x
i(n)B
l
D"Ml
p2lxi
l<k/1C
1
(M!2k)D, (40)
where M should be larger than 2l. In (40), we assumed the data sample obtained by the DCT transformationis an independent and identically distributed (i.i.d.) sequence. Result (40) is equivalent to those of [24,25] inthe case l"1. This equation is derived in Appendix B. Therefore, the mean-squared error is analysedcompletely from (30) to (39).
4.3. The steady-state misadjustment
We present the result of a derivation for the misadjustment de"ned by M"m%9
(R)/m.*/
where m%9
(R) isthe steady-state value of the excess mean-squared error, m
%9(n). Recall that E[e2(R)]"m
.*/#m
%9(R). Mean
1642 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
and mean-squared values of stepsize in the steady state can be obtained from (36) to (39),
E[ki(R)]+cC
M
p2xi
(M!2)!e
M2
p4xi
(M!4)(M!2)D, (41)
E[k2i(R)]+
2bc1#bC
M
p2xi
(M!2)!e
M2
p4xi
(M!4)(M!2)DcCM
p2xi
(M!2)!e
M2
p4xi
(M!4)(M!2)D#
c2(1!b)
1#b CM2
p4xi
(M!4)(M!2)! 2e
M3
p6xi
(M!6)(M!4)(M!2)
#e2M4
p8xi
(M!8)(M!6)(M!4)(M!2)D, (42)
E[k2i(R)]
E[ki(R)]
"
2bc1#bC
M
p2xi
(M!2)DC1!eM
p2xi
(M!4)D#
c(1!b)
1#b CM
p2xi
(M!4)!eMD C1!2eM
p2xi
(M!6)#e2
M2
p4xi
(M!8)(M!6)D. (43)
Finally, the excess mean-squared error de"ned by E[e2(R)]!m.*/
can be obtained from (32) and (34) asfollows:
m%9
(R)"m.*/
2+N~1
i/0E[k2
i(R)]j
i/E[k
i(R)]!E[k2
i(R)]j
i1!1
2+N~1
i/0E[k2
i(R)]j
i/E[k
i(R)]!E[k2
i(R)]j
i
#
12+ N~1
i/0p2N/E[k
i(R)]!E[k2
i(R)]j
i1!1
2+ N~1
i/0E[k2
i(R)]j
i/E[k
i(R)]!E[k2
i(R)]j
i
. (44)
The misadjustment can be obtained from (44). M can be approximated when yiji@1, for i"0,1,2,N!1
as follows:
M"
12+ N~1
i/0yiji/(1!y
iji)
1!12+ N~1
i/0yiji(R)/(1!y
iji)#
12+ N~1
i/0p2N/E[k
i(R)][1!y
iji]m
.*/1!1
2+ N~1
i/0yiji/(1!y
iji)
+
1
2
N~1+i/0
yiji#
1
2
N~1+i/0
p2N
E[ki(R)]m
.*/
, (45)
where yidenotes E[k2
i(R)]/[E[k
i(R)] from (43).
4.4. Optimal convergence factor cH
A nonstationary environment may arise in practice in one of two basic ways: the frame of referenceprovided by the desired response may be time-varying and the stochastic process supplying the tap inputs ofthe adaptive "lter is nonstationary. The random walk model of (25) corresponds to the "rst way. In this casethe correlation matrix of the tap inputs of the adaptive "lter remains "xed as in a stationary environment,whereas the cross-correlation vector between the tap inputs and the desired response assumes a time-varyingform. Therefore the result of (40) can still be acceptable. This claim is supported by the computer simulation.
In (45), the "rst term is the misadjustment due to gradient noise and the second one is the misadjustmentresulting from lag in tracking the nonstationary. The "rst term of (45) is proportional to the convergencefactor c while the second one is inversely proportional to c and this shows the same trend with that of theLMS [26]. In (45), since there is a trade-o! relation between the two errors, we can derive the optimal value
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1643
Table 1Mean-squared error and misadjustment in the system identi"cation
Algorithm Mean-squared error MisadjustmentE[e2(R)] M"m
%9(R)/m
.*/(%)
LMS 0.1096]10~4 9.6VSLMS 0.1087]10~4 8.7NLMS 0.1091]10~4 9.1DCT-LMS III 0.1079]10~4 7.9Theory (DCT-LMS) 0.1073]10~4 7.3
cH minimising the total misadjustment so that
LMLc Kc/cH
"
2b(M!4)#(1!b)(M!2)
(1#b)(M!4)(M!2)!
1
cH2N~1+i/0
p2Np2xi
(M!2)
m.*/
"0. (46)
From (46), a solution is obtained by
cH"Sp2Nk@
i(R)
y@im.*/
,(47)
where y@iand k@
i(R) are de"ned by
y@i"
LLcA
1
2
N~1+i/0
yijiB, k@
i(R)"
N~1+i/0C
M
p2xi
(M!2)!e
M2
p4xi
(M!4)(M!2)D~1
.
5. Computer simulation
5.1. Application I: system identixcation
A system identi"cation application has been implemented for a stationary environment and an nonstation-ary environment to verify the performance of the DCT-LMS algorithm. All simulations were undertaken tomeet the misadjustment speci"cation of less than 10% (e.g. Table 1). In all simulations presented here, thedesired signal d(n) is corrupted by zero mean uncorrelated Gaussian noise of variance m
.*/. This variance
provides arti"cially the signal-to-noise ratio in the experimental model. The reference input signal for theadaptive "lter to provide the correlation of the input signal is represented using fourth-order autoregressive(AR) model as
u(n)"1.79u(n!1)!1.85u(n!2)#1.27u(n!3)!0.41u(n!4)#f(n), (48)
where f(n) is a zero mean, white Gaussian random process with variance such that the variance of u(n) isunity. Some algebraic calculations produce the variance of f(n) as p2f"E[f2(n)]"0.14817. The eigenvaluespread ratio of a reference input signal generated from (48) is 944.67. Its spectrum is shown in Fig. 8.
In each simulation, the DCT-LMS I denotes the conventional DCT-LMS algorithm, the DCT-LMS II isa time-varying variable stepsize algorithm in the case of M"1, and the DCT-LMS III is the proposedalgorithm in this paper. In the case of the DCT-LMS III, M is chosen as 5. Notice that the DCT-LMS justsays the DCT-LMS III.
Fig. 9 shows the learning curves of the conventional LMS, VSLMS, normalised LMS (NLMS) and theproposed DCT-LMS algorithms for system identi"cation. The unknown plant to be identi"ed was modelled
1644 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
Fig. 8. Power spectral density of the input process used for the simulation.
Fig. 9. Comparison of convergence curves of the system identi"cation (400 runs), LMS (a): k"0.0102, VSLMS(b): b"0.9985, c"0.0045, k
.!9"0.023, k
.*/"0.009; NLMS (c): e"0.001, k"0.15; DCT-LMS III (d): c"0.0095, b"0.9985,
e"0.0008.
as a "nite impulse response (FIR) "lter of order 16 and the adaptive "lter length was chosen to be identical.Signal-to-noise ratio at the output of the unknown system to be identi"ed was chosen to be 50 dB. Thelearning curves were obtained as an average of 400 Monte Carlo simulations. The "gure shows the superiorperformance of the proposed algorithm over the existing algorithms; as expected, the plain LMS, the VSLMSand the NLMS algorithms do not perform well when an input signal is highly correlated.
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1645
Fig. 11. Comparison of the theoretical prediction (solid line) with Monte Carlo simulation result (200 runs) of the mean-squared error,design parameters are equal to (b) and (c) of Fig. 9.
Fig. 10. Comparison of the convergence curves during the transient period, the system identi"cation (400 runs). LMS (a) k"0.0102,DCT-LMS 1 (c) k
0"0.0095, b"0.985, e"0.0008, DCT-LMS II (b) c"0.002, b"0.9985, e"0.0008, DCT-LMS III
(d) c"0.008, b"0.9985, e"0.0008, RLS (e) d"200, j"1.0.
Fig. 10 presents the convergence curves during the transient period in order to compare the proposedalgorithm with the RLS and the power estimator from (1). All parameters have been chosen to givecomparable misadjustment level in the state space. It is shown that the proposed algorithm has a fasterconvergence rate than the conventional algorithm, This is because the time-varying stepsize of the proposedalgorithm uses the vector norm.
Fig. 11 compares theoretical predictions with simulation results for the mean-squared error. Here, M ischosen as 10 to meet theoretical result of (39).
1646 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
Fig. 12. Comparison of the theoretical prediction (solid line) with Monte Carlo simulation result of the mean-squared error in annonstationary environment, m
.*/"0.01, p2N"10~6, b"0.45 and c"0.0053.
Fig. 12 compares the theoretical prediction with simulation result for the proposed DCT-LMS algorithmin the nonstationary environment. It can be seen that our analysis agrees very well with the simulationresults. The misadjustment of other algorithms compared to DCT-LMS algorithm is shown in Table 1. Thesevalues were made by averaging 2000 samples between the 148,000 and 150,000 iterations.
Finally, we shall verify the optimal value cH derived for the nonstationary environment. The theoreticalrelationship between the convergence factor cH and misadjustment is compared with simulation results inFig. 13. Through this simulation, we make the following observation.
Observation 1. This result con"rms numerically that the convergence factor cH of (47) is optimal.
Observation 2. A large size c exhibits the larger deviation between theoretical prediction and simulationresult. The deviation due to a large value of c exists in the stationary environment as well. This results fromAssumptions 2 and 3.
Observation 3. It is shown that the approximation yiji@1 from (45) is acceptable when the stepsize is chosen
to be small. However, it is shown that e seriously a!ects the convergence behaviour.
5.2. Application II: inverse modelling for channel equalisation
In the second computer simulation, we verify the performance of the developed algorithms applying themto the inverse modelling, i.e. communication channel equalisation. Fig. 14 shows the block diagram of thechannel equalisation. Test signal s(n) is used for probing the channel and the channel is corrupted by noisesignal v(n). We assume that these signals are independent of each other. The adaptive equaliser has the task ofcorrecting for the distortion produced by the channel in the presence of the additive white noise. Test signals(n), after suitable delay, also supplies the desired response applied to the adaptive equaliser. The test signals(n) applied to the channel input is in polar form with s(n)"$1, so the sequence s(n) has zero mean.
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1647
Fig. 13. Relationship between the excess mean-squared error and the c of the identi"cation example in nonstationary environment:optimal cH"0.00515, b"0.45, (o) measurement, solid line: theoretical prediction from Eq. (47), dash}dotted line (y
iji@1), dotted line
(e"0), dashed line (e"0, yiji@1), m
.*/"0.01, p2N"10~6.
Fig. 14. Block diagram of the adaptive channel equalisation.
1648 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
Fig. 15. Comparison of convergence curves of the adaptive channel equaliser (SNR"30 dB, 400 runs,="3.75), (a) LMS (k"0.0034),(b) NLMS (k"0.045, e"0.001), (c) VSLMS (b"0.9985, c"0.0044, k
.!9"0.02, k
.*/"0.007), (d) DCT-LMS I (k
0"0.0095,
b"0.998, e"0.01), (c) DCT-LMS III (b"0.998, c"0.005, e"0.0008).
The impulse response of the channel is described by
hn"G
12[1#cosM2p
W(n!2)N] n"1,2,3,
0 otherwise,
where the parameter= controls the amount of amplitude distortion produced by the channel, with the distortionincreasing with=. Equivalently, the parameter= controls the eigenvalue spread ratio of the correlation matrixof the tap inputs in the equaliser, with the eigenvalue spread ratio increasing with=. This model is originatedfrom [8, pp. 342}347]. The noise signal v(n) is set to variance, p2
v"0.001, with zero mean and white.
Fig. 15 shows the convergence rate which are adapted by the LMS, NLMS, VSLMS, and DCT-LMSalgorithms in SNR"30 dB. The DCT-LMS algorithm shows much faster convergence speed over the LMS,NLMS, VSLMS algorithms in the inverse modelling of the channel equalisation.
Fig. 16 presents the result of the inverse modelling using the adaptive channel equalisation by the variousalgorithms. The performance of the inverse modelling of each algorithm is compared with the optimalWiener solution. The Wiener solution of the channel equalisation is given by
w0(z)"
HH(z)U44(z)
U44DH(z)D2#U
vv(z)
, (49)
where U44DH(z)D2 and U
vv(z) are the signal power spectral density and noise power spectral density, respective-
ly, at the channel output.The parameter w(
0(n) of the inverse model is obtained at 5000th iteration time with 100 Monte Carlo
simulation. As shown in Fig. 16, the performance of DCT-LMS algorithm is comparable to the optimalsolution. In this simulation, the order of the adaptive channel equaliser is 11th by considering the delayelement and the symmetrical property of the equaliser. To verify the ability of inverse modelling in the highlydistorted channel. The DCT-LMS algorithm remains the superior modelling capability over the LMS,NLMS, and VSLMS algorithms (see Table 2).
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1649
Fig. 16. Comparison of inverse modelling by the adaptive channel equaliser (SNR"30 dB,="3.5, 100 runs), dotted line: NLMS,dashed line: VSLMS, dashdotted line: LMS, solid: DCT-LMS III, &o': optimal solution.
Table 2Mean-squared error of LMS, NLMS, VSLMS, and DCT-LMSalgorithms in channel equaliser
Algorithm E[e2(R)]
LMS 0.01498NLMS 0.01437VSLMS 0.01486DCT-LMS I 0.01438DCT-LMS III 0.01445
6. Conclusions
The DCT-LMS algorithm employing a new type of power estimator has been introduced. It was shownthat the modi"ed power estimator for the DCT-LMS algorithm works properly as a time-varying stepsize toredistribute the spread power after the DCT transformation. In particular, the decorrelation properties of theunitary transform matrix have been investigated theoretically. It was found that the decorrelation propertiesof the DCT is governed by the location of two zeros and its property was derived from the lower and upperbounds of the eigenvalue spread ratio. The performance analysis of the DCT-LMS algorithm with a variablestepsize has been derived in terms of the mean and mean-squared error. Monte Carlo simulation was foundto give a good "t to our theoretical predictions. However, it was shown that a large stepsize by the controlparameter c violates Assumption 3 as expected. The proposed "ltering algorithm has a convergence rate thatis greater than that of the plain LMS, the NLMS, the VSLMS algorithm at the expense of more NM!N(multiplications) and M (additions) than the conventional transform domain algorithm.
1650 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
Appendix A. Proof of Theorem 2
Let jiand q
i, i"1,2,2,N, denote the eigenvalues of the N]N correlation matrix, Ruu of a discrete time
stochastic process u(n) and this associated eigenvectors, respectively. An eigenvalue of the correlation matrixequals the Rayleigh quotient [14] of the corresponding eigenvector as
ji"
qHiRuuq
iqHi
qi
. (A.1)
The Hermitian form in the numerator of (A.1) may be expressed in its expanded form as follows:
qHiRuuq
i"
N+k/1
N+l/1
qikr(l!k)q
il, (A.2)
where qHik
is the kth element of the row vector, qi, r(l!k) is the klth element of the matrix Ruu , and q
ilis the lth
element of the column vector qi. Using the Einstein}Wiener}Khintchine relation of (A.2), we may write
r(l!k)"1
2pPp
~p
Suu (w) exp[jw(l!k)] dw, (A.3)
where Suu(w) is the power spectral density of the process u(n). Hence, we may rewrite (A.2) as
qHiRuuq
i"
N+k/1
N+l/1
qHikqilP
p
~p
Suu(w) exp[jw(l!k)] dw
"
1
2pPp
~p
dwSuu (w)N+k/1
qHik
exp[!jwk]N+l/1
qil
exp[jwl]. (A.4)
Accordingly, the eigenvalue jiof (A.1) is rede"ned in terms of the power spectral density as
ji"
(1/2p):p~p
D Qi(e+w)D2Suu(w) dw
(1/2p):p~p
D Qi(e+w)D2dw
, (A.5)
where Qi(e+w) is de"ned by +N
k/1qHik
exp[!jwk] with discrete time Fourier transform of the sequenceqHi1
, qHi2
,2, qHiN
.Using the Parseval's theorem, we obtain
DDqiDD22"qH
iqi"
1
2pPp
~p
DQi(e+w)D2dw. (A.6)
With the constraint DDqiDD22"1, this gives
1
2pPp
~p
DQi(e+w)D2 dw"1. (A.7)
Therefore, (A.5) can be written as
ji"
1
2pPp
~p
DQi(e+w)D2Suu(w) dw. (A.8)
If we de"ne that E[DxiD2]"j
ias the output of the ith eigen"lter Eq. (A.8) can be rewritten by the power
spectral density function as
Uxixi
(e+w)"DQi(e+w)D2Suu(e+w), (A.9)
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1651
where Uxixi
(e+w) is the spread spectrum in all over frequency band since an input signal is transformed by theKLT.
Hence, the complementary spectrum principle can be explained from (A.9) since there is an inverserelationship between eigen"lter spectrum density and input spectral density.
In (A.9), the expectation of the eigen"lter at each frequency bin is given by
DQMi(e+w)D2"E[DQ
i(e+w)D2]. (A.10)
Appendix B. Derivation of Eq. (40)
ECA1
DDx(n)iDD22B
l
D"P=
~=
2P=
~=A
1
DDx(n)iDD22B
l 1
(2pp2xi
)M@2expG!
1
2p2xi
DDx(n)iDD22Hdx
1i,2,dx
Ni, (B.1)
where l'1 and M'2l. To solve integral (B.1), multi-variables are transformed into the generalisedspherical coordinates as follows [5, pp. 98}103; 16, pp. 246}248]:
xi"rG
i~1<k/0
sin /kHcos/
i, (B.2)
xM~1
"rGN~1<k/1
sin/kHcos h, (B.3)
xM"rG
N~1<k/1
sin/kHsin h, (B.4)
where 1)i)M!2 and sin/0"1.
Eq. (B.1) is transformed according to the formula
P2C P f (x1i
,2,xMi
) dx1i
,2,dxMi
"P2S P f (r,h,/1,2,/
M~2) DJ(r,h,/
1,2,/
M~2) Ddrdh d/
1,2,d/
M~2, (B.5)
where DJ(r,h,/1,2,/
M~2)D is the Jacobian of the transformation.
The Jacobian of this transformation is given by
J(r,h,/1,2,/
M~2)"K
L(x1i
,x2i
,2, xMi
)
L(r,h,/1,/
2,2,/
M~2) K
"KLx
1i
Lr
Lx2i
Lr
Lx3i
Lr2
LxMi
LrLx
1i
LhLx
2i
LhLx
3i
Lh2
LxMi
LhLx
1i
L/1
Lx2i
L/1
Lx3i
L/1
2
LxMi
L/1
F F F } F
Lx1i
L/M~2
Lx2i
L/M~2
Lx3i
L/M~2
2
LxMi
L/M~2
K. (B.6)
1652 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654
Taking out common factors in columns of (B.6), we "nd that this determinant is equal to
DJ(r,h,/1,2,/
M~2)D"rM~1
M~2<k/1
sinM~k~1/k. (B.7)
We may rewrite (B.1) as
ECA1
DDx(n)iDD22B
l
D"1
(2pp2xi
)M@2P=
~=
rM~2l~1 expA!1
2p2xi
r2BdrP2p
0
dh]Pp
0
,2,
Pp
0
N~2<k/1
sinM~k~1/kd/
1,2,d/
N~2. (B.8)
From (B.8), we can integrate separately as follows:
Pp
0
sink/ d/"
C((k#1)/2)
C((k#2)/2)p1@2, (B.9)
2p
(2pp2xi
)M@2P=
0
rM~2l~1 expA!r2
2p2xiBdr"
2p
(2pp2xi
)M@2P=
0
(2vp2xi
)(M~2l~1)@2 exp(!v)pxi
(2v)1@2dv
"
2p
(2p)M@2p2lxi
2((M~2l~2)@2)CAM!2l
2 B. (B.10)
In (B.10), r2/2p2xi
and dr are transformed to v and p2xi
/J2vdv, respectively.Substituting (B.9) and (B.10) into (B.8), we may rewrite (B.8) as
ECA1
DDx(n)iDD22B
l
D"2p
(2p)M@2p2lxi
2(M~2l~2)@2CAM!2l
2 BC(M~1
2)C(M~2
2)C(M~3
2)
C(M2)C(M~1
2)C(M~2
2)
2
C(1)
C(32)n(M~2)@2
"
2p2(M~2l~2)@2C((M!2l)/2)
(2p)M@2p2lxi
C(M/2)p(M~2)@2"
(2p)((2~M)@2)2((M~2l~2)@2)n((M~2)@2)C((M!2l)/2)
p2lxiC(M/2)
"
Ml
p2lxi
l<k/1A
1
(M!2k)B. (B.11)
References
[1] T. Aboulnasr, K. Mayyas, A robust variable step-size LMS-type algorithm: analysis and simulations, IEEE Trans. Signal Process.45 (3) (March 1997) 631}639.
[2] N. Ahmed, T. Natarajan, K.R. Rao, Discrete cosine transform, IEEE Trans. Comput. 23 (1974) 90}93.[3] F. Beaufays, Transform-domain adaptive "lters: an analytical approach, IEEE Trans. Signal Process. 43 (2) (February 1995)
422}431.[4] N.J. Bershad, Y.H. Chang, Time correlation statistics of the LMS adaptive algorithm weights, IEEE Trans. Acoust. Speech Signal
Process. 33 (1) (February 1985) 309}312.[5] B.M. Budak, S.V. Fomin, Multiple Integrals, Field Theory and Series, Mir Publishers, Moscow, 1973, pp. 98}103.[6] R.J. Clarke, Relation between the Karhunen}Loeve and cosine transforms, IEE Proc., Part F. 128 (6) (November 1981) 359}360.[7] J.B. Evans, P. Xue, B. Liu, Analysis and implementation of variable stepsize adaptive algorithms, IEEE Trans. Acoust. Speech
Signal Process. 41 (8) (August 1993) 2517}2535.[8] B. Farhang-Boroujeny, Adaptive Filters: Theory and Application, Wiley, New York, 1999.[9] B. Farhang Boroujeny, S. Gazor, Selection of orthonormal transforms for improving the performance of the transform domain
normalised LMS algorithm, IEE Processings-F 139 (5) (October 1992) 327}335.
D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654 1653
[10] A. Feuer, E. Weinstein, Convergence analysis of LMS "lters with uncorrelated Gaussian data, IEEE Trans. Acoust. Speech SignalProcess. 33 (1) (February 1985) 222}230.
[11] S. Florian, N.J. Bershad, A weighted normalized frequency domain LMS adaptive algorithm, IEEE Trans. Acoust. Speech SignalProcess. 36 (7) (July 1988) 1002}1007.
[12] R.M. Gray, Toeplitz and circulant matrices: a review, http://www-isl.stanford.edu/gray/toeplitz.pdf, Stanford University, January1997.
[13] R.W. Harris, D.M. Chabries, A variable step (VS) adaptive "lter algorithm, IEEE Trans. Acoust. Speech Signal Process. 34 (2)(April 1986) 309}316.
[14] S. Haykin, Adaptive Filter Theory, 2nd Edition, Prentice-Hall, Englewood Cli!s, NJ, 1991.[15] L.L. Horowitz, K.D. Senne, Performance advantage of complex LMS for controlling narrow-band adaptive arrays, IEEE Trans.
Acoust. Speech Signal Process. 29 (3) (June 1981) 722}736.[16] M.G. Kendall, A. Stuart, The Advanced Theory of Statistics, 3rd Edition, Charles Gri$n and Co. Ltd., 1969, pp. 246}248.[17] C.P. Kwong, E.W. Johnston, A variable stepsize LMS algorithm, IEEE Trans. Signal Process. 40 (7) (July 1992) 1633}1642.[18] D.F. Marshall, W.K. Jenkins, J.J. Murphy, The orthogonal transforms for improving performance of adaptive "lter, IEEE Trans.
Circuits Systems 36 (4) (April 1989) 474}484.[19] S.S. Narayan, A.M. Peterson, M.J. Narasimha, Transform domain LMS algorithm, IEEE Trans. Acoust. Speech Signal Process. 31
(3) (June 1983) 609}615.[20] K. Ozeki, T. Umeda, An adaptive "ltering algorithm using orthogonal projection to an a$ne subspace and its properties, Electron.
Commun. Japan 67-A (5) (1984) 19}27.[21] V.N. Parikh, A.Z. Naraniecki, The use of the modi"ed escalator algorithm to improve the performance of transform LMS adaptive
"lter, IEEE Trans. Signal Process. 46 (3) (1998) 625}635.[22] M. Rupp, A family of adaptive "lter algorithms with decorrelation properties, IEEE Trans. Signal Process. 6 (3) (March 1998)
771}775.[23] C.G. Samson, V.U. Reddy, Fixed point error analysis of the normalised ladder algorithm, IEEE Trans. Acoust. Speech Signal
Process. 5 (October 1983) 1177}1191.[24] D.T. Slock, On the convergence behavior of the LMS and the normalized LMS algorithm, IEEE Trans. Signal Process. 41 (9)
(September 1993) 2811}2825.[25] M. Tarrab, A. Feuer, Convergence and performance analysis of the normalized LMS algorithm with uncorrelated Gaussian data,
IEEE Trans. Acoust. Speech Signal Process. 34 (4) (July 1988) 680}691.[26] B. Widrow, J.M. McCool et al., Stationary and nonstationary learning characteristics of the LMS adaptive "lter, Proc. IEEE 64 (8)
(August 1976) 1151}1162.
1654 D.I. Kim, P. De Wilde / Signal Processing 80 (2000) 1629}1654