Seminar Bermudez

7/29/2019 Seminar Bermudez

1/107

Adaptive Filtering - Theory and Applications

Jose C. M. Bermudez

Department of Electrical EngineeringFederal University of Santa Catarina

Florianopolis SCBrazil

IRIT - INP-ENSEEIHT, Toulouse

May 2011

Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 1 / 107
http://find/


2/107

1 Introduction

2 Adaptive Filtering Applications

3 Adaptive Filtering Principles

4 Iterative Solutions for the Optimum Filtering Problem

5 Stochastic Gradient Algorithms

6 Deterministic Algorithms

7 Analysis

http://find/http://goback/


3/107

Introduction

http://find/


4/107

Estimation Techniques

Several techniques to solve estimation problems.

Classical Estimation

Maximum Likelihood (ML), Least Squares (LS), Moments, etc.

Bayesian Estimation

Minimum MSE (MMSE), Maximum A Posteriori (MAP), etc.

Linear Estimation

Frequently used in practice when there is a limitation in computational

complexity Real-time operation

http://find/


5/107

Linear Estimators

Simpler to determine: depend on the first two moments of data

Statistical Approach Optimal Linear Filters Minimum Mean Square Error Require second order statistics of signals

Deterministic Approach Least Squares Estimators Minimum Least Squares Error

Require handling of a data observation matrix

http://find/


6/107

Limitations of Optimal Filters and LS Estimators

Statistics of signals may not be available or cannot be accuratelyestimated

There may not be available time for statistical estimation (real-time)

Signals and systems may be non-stationary

Memory required may be prohibitive

Computational load may be prohibitive

http://find/


7/107

Iterative Solutions

Search the optimal solution starting from an initial guess

Iterative algorithms are based on classical optimization algorithms

Require reduced computational effort per iteration

Need several iterations to converge to the optimal solution

These methods form the basis for the development of adaptivealgorithms

Still require the knowledge of signal statistics

http://find/


8/107

Adaptive Filters

Usually approximate iterative algorithms and:

Do not require previous knowledge of the signal statistics

Have a small computational complexity per iteration

Converge to a neighborhood of the optimal solution

Adaptive filters are good for:

Real-time applications, when there is no time for statistical estimation

Applications with nonstationary signals and/or systems



9/107

Properties of Adaptive Filters

They can operate satisfactorily in unknown and possibly time-varyingenvironments without user intervention

They improve their performance during operation by learningstatistical characteristics from current signal observations

They can track variations in the signal operating environment (SOE)

http://find/


10/107

Adaptive Filtering Applications

http://find/


11/107

Basic Classes of Adaptive Filtering Applications

System Identification

Inverse System Modeling

Signal Prediction

Interference Cancelation

http://find/


12/107

System Identification

+ +

_

+x(n) d(n)

d(n)

y(n) e(n)

eo(n)

other signals

unknownsystem

adaptive

adaptivealgorithm

filter

http://goforward/http://find/http://goback/


13/107

Applications System Identification

Channel Estimation

Communications systems

Objective: model the channel to design distortion compensation

x(n): training sequencePlant Identification

Control systems

Objective: model the plant to design a compensator

x(n): training sequence


h C ll
http://find/


14/107

Echo Cancellation

Telephone systems and VoIP

Echo caused by network impedance mismatches or acousticenvironment

Objective: model the echo path impulse response

x(n): transmitted signal

d(n): echo + noise

H H

Tx

Rx

H H

Tx

Rx

EC

+

_

x(n)

d(n)e(n)

Figure: Network Echo Cancellation

http://find/


15/107

Inverse System Modeling

Adaptive filter attempts to estimate unknown systems inverse

Adaptive filter input usually corrupted by noise

Desired response d(n) may not be available

++ +_Unknown

System

Adaptive

AdaptiveFilter

Algorithm

Delay

othersignals

x(n)

s(n) e(n)

y(n)

d(n)z(n)



16/107

Applications Inverse System ModelingChannel Equalization

++ +_

Channel

Adaptive

AdaptiveFilter

Algorithm

Local gen.

x(n)

x(n)

s(n) e(n)

y(n)

d(n)z(n)

Objective: reduce intersymbol interference

Initially training sequence in d(n)

After training: d(n) generated from previous decisions

http://find/


17/107

Signal Prediction

+_

Delay

Adaptive

AdaptiveFilter

Algorithmothersignals

x(n no)

x(n) e(n)

y(n)

d(n)

most widely used case forward prediction

signal x(n) to be predicted from samples{x(n no), x(n no 1), . . . , x(n no L)}


S


18/107

Application Signal Prediction

DPCM Speech Quantizer - Linear Predictive Coding

Objective: Reduce speech transmission bandwidth

Signal transmitted all the time: quantization error

Predictor coefficients are transmitted at low rate

+

_

++

Speech

signalsignal DPCM

Quantizer

Predictor

prediction error

Q[e(n)]

e(n)

y(n) + Q[e(n)] d(n)

d(n)

y(n)


I f C l i
http://find/


19/107

Interference Cancelation

One or more sensor signals are used to remove interference and noise

Reference signals correlated with the inteference should also beavailable

Applications: array processing for radar and communications biomedical sensing systems active noise control systems


A li i I f C l i


20/107

Application Interference CancelationActive Noise Control

Ref: D.G. Manolakis, V.K. Ingle and S.M. Kogon, Statistical and Adaptive Signal Processing, 2000.

Cancelation of acoustic noise using destructive interference

Secondary system between the adaptive filter and the cancelationpoint is unavoidable

Cancelation is performed in the acoustic environmentJose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 20 / 107


21/107

Active Noise Control Block Diagram

s++

AdaptiveAlgorithm

wo

w(n)

x(n)

xf(n)

S

Sy(n) ys(n) yg(n)

d(n) z(n)e(n)

g(ys)



22/107

Adaptive Filtering Principles


Ada ti e Filte Feat es
http://find/


23/107

Adaptive Filter Features

Adaptive filters are composed of three basic modules:

Filtering strucure Determines the output of the filter given its input samples Its weights are periodically adjusted by the adaptive algorithm Can be linear or nonlinear, depending on the application Linear filters can be FIR or IIR

Performance criterion Defined according to application and mathematical tractability Is used to derive the adaptive algorithm Its value at each iteration affects the adaptive weight updates

Adaptive algorithm

Uses the performance criterion value and the current signals Modifies the adaptive weights to improve performance Its form and complexity are function of the structure and of the

performance criterion


Signal Operating Environment (SOE)
http://find/


24/107

Signal Operating Environment (SOE)

Comprises all informations regarding the properties of the signals and

systemsInput signals

Desired signal

Unknown systems

If the SOE is nonstationaryAquisition or convergence mode: from start until close to bestperformance

Tracking mode: readjustment following SOEs time variations

Adaptation can beSupervised desired signal is available e(n) can be evaluated

Unsupervised desired signal is unavailable


Performance Evaluation


25/107

Performance Evaluation

Convergence rate

Misadjustment

Tracking

Robustness (disturbances and numerical)

Computational requirements (operations and memory)

Structure facility of implementation performance surface

stability


Optimum versus Adaptive Filters in Linear Estimation


26/107

Optimum versus Adaptive Filters in Linear Estimation

Conditions for this study

Stationary SOEFilter structure is transversal FIR

All signals are real valued

Performance criterion: Mean-square error E[e2(n)]

The Linear Estimation Problem

+

Linear FIR Filter

w

x(n) y(n)

d(n)

e(n)

Jms = E[e2(n)]




27/107


+

Linear FIR Filter

w

x(n) y(n)

d(n)

e(n)

x(n) = [x(n), x(n 1), , x(n N + 1)]T

y(n) =xT

(n)w

e(n) = d(n) y(n) = d(n) xT(n)w

Jms = E[e2(n)] = 2d 2p

Tw +wTRxxw

where

p = E[x(n)d(n)]; Rxx = E[x(n)xT

(n)]Normal Equations

Rxxwo = p wo = R1xxp for Rxx > 0

Jmsmin = 2d p

TR1xxp


What if d(n) is nonstationary?


28/107

What ifd(n) is nonstationary?

+

Linear FIR Filter

w

x(n) y(n)

d(n)

e(n)

x(n) = [x(n), x(n 1), , x(n N + 1)]T

y(n) = xT(n)w(n)

e(n) = d(n) y(n) = d(n) xT(n)w(n)

Jms(n) = E[e2(n)] = 2d(n) 2p(n)

Tw(n) + wT(n)Rxxw(n)

where

p(n) = E[x(n)d(n)]; Rxx = E[x(n)xT

(n)]Normal Equations

Rxxwo(n) = p(n) wo(n) = R1xxp(n) for Rxx > 0

Jmsmin(n) = 2d(n) p

T(n)R1xxp(n)


Optimum Filters versus Adaptive Filters


29/107

Optimum Filters versus Adaptive Filters

Optimum FiltersComputep(n) = E[x(n)d(n)]

Solve Rxxwo = p(n)

Filter with wo

(n) y(n) = xT(n)wo(n)

Nonstationary SOE:Optimum filter determined

for each value of n

Adaptive FiltersFiltering: y(n) = xT(n)w(n)

Evaluate error: e(n) = d(n) y(n)

Adaptive algorithm:

w(n + 1) = w(n) + w[x(n), e(n)]

w(n) is chosen so that w(n) is close to

wo

(n) for n large


Characteristics of Adaptive Filters
http://find/


30/107

Characteristics of Adaptive Filters

Search for the optimum solution on the performance surface

Follow principles of optimization techniques

Implement a recursive optimization solution

Convergence speed may depend on initializationHave stability regions

Steady-state solution fluctuates about the optimum

Can track time varying SOEs better than optimum filters

Performance depends on the performance surface

http://find/


31/107

Iterative Solutions for the

Optimum Filtering Problem


Performance (Cost) Functions
http://find/


32/107

Performance (Cost) Functions

Mean-square error

E[e2

(n)] (Most popular)Adaptive algorithms: Least-Mean Square (LMS), Normalized LMS(NLMS), Affine Projection (AP), Recursive Least Squares (RLS), etc.

Regularized MSE

Jrms = E[e2(n)] + w(n)2

Adaptive algorithm: leaky least-mean square (leaky LMS)

1 norm criterionJ1 = E[|e(n)|]

Adaptive algorithm: Sign-Error


Performance (Cost) Functions continued
http://find/


33/107

( )

Least-mean fourth (LMF) criterion

JLMF = E[e4(n)]

Adaptive algorithm: Least-Mean Fourth (LMF)

Least-mean-mixed-norm (LMMN) criterion

JLMMN = E[e2(n) + 1

2(1 )e4(n)]

Adaptive algorithm: Least-Mean-Mixed-Norm (LMMN)

Constant-modulus criterion

JCM = E[

|xT(n)w(n)|22

]

Adaptive algorithm: Constant-Modulus (CM)


MSE Performance Surface Small Input Correlation
http://find/


34/107

p

2010

010 20

20

0

200

500

1000

1500

2000

2500

3000

3500

w1

w2


MSE Performance Surface Large Input Correlation
http://find/


35/107

g p

20 1510 5

0 510 15

20

20

10

010

200

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

w1

w2


The Steepest Descent Algorithm Stationary SOE
http://find/


36/107

p g y

Cost Function

Jms(n) = E[e2(n)] = 2d 2pTw(n) +wT(n)Rxxw(n)

Weight Update Equation

w(n + 1) = w(n) + c(n)

: step-sizec(n): correction term (determines direction of w(n))Steepest descent adjustment:

c(n) = Jms(n) Jms(n + 1) Jms(n)

w(n + 1) = w(n) + [pRxxw(n)]


Weight Update Equation About the Optimum Weights
http://find/


37/107

Weight Error Update Equation

w(n + 1) = w(n) + [pRxxw(n)]

Using p = Rxxwo

w(n + 1) = (I Rxx)w(n) + Rxxwo

Weight error vector: v(n) = w(n) wo

v(n + 1) = (I Rxx)v(n)

Matrix I Rxx must be stable for convergence (|i| < 1)

Assuming convergence, limn v(n) = 0


Convergence Conditions
http://find/


38/107

v(n + 1) = (

I

Rxx)v

(n);R

xx positive definiteEigen-decomposition ofRxx

Rxx = QQT

v(n + 1) = (I QQT)v(n)

QTv(n + 1) = QTv(n) QTv(n)

Defining v(n + 1) = QTv(n + 1)

v(n + 1) = (I )v(n)


Convergence Properties
http://find/


39/107

v(n + 1) = (I )v(n)

vk(n + 1) = (1 k)vk(n), k = 1, . . . , N

vk(n) = (1 k)nvk(0)

Convergence modes

monotonic if 0 < 1 k < 1

oscillatory if 1 < 1 k < 0

Convergence if |1 k| < 1 0 < 0 monotonic convergence

Stability limit is again 0 < < 2maxJms(n) converges faster than w(n)

Algorithm converges faster as = max/min 1


Simulation Results
http://find/


43/107

x(n) = x(n 1) + v(n)

0 20 40 60 80 100 120 140 160 180 20070

60

50

40

30

20

10

0

iteration

M

SE(dB)

Steepest Descent Mean Square Error

Input: White noise ( = 1)

0 20 40 60 80 100 120 140 160 180 20070

60

50

40

30

20

10

0

iteration

M

SE(dB)

Steepest Descent Mean Square Error

Input: AR(1), = 0.7 ( = 5.7)

Linear system identification - FIR with 20 coefficients

Step-size = 0.3

Noise power 2v = 106


The Newton Algorithm
http://find/


44/107

Steepest descent: linear approx. of Jms about the operating point

Newtons method: Quadratic approximation of Jms

Expanding Jms(w) in Taylors series about w(n),

Jms(w) Jms[w(n)] + TJms[w w(n)]

+ 12

[w w(n)]TH(n)[w w(n)]

Diferentiating w.r.t. w and equating to zero at w = w(n + 1),

Jms[w(n + 1)] = Jms[w(n)] + H[w(n)][w(n + 1) w(n)] = 0

w(n + 1) = w(n) H1[w(n)]Jms[w(n)]


The Newton Algorithm continued


45/107

Jms[w(n)] = 2p+ 2Rxxw(n)

H(w(n)) = 2RxxThus, adding a step-size control,

w(n + 1) = w(n) R1xx [p+Rxxw(n)]

Quadratic surface conv. in one iteration for = 1

Requires the determination ofR1xx

Can be used to derive simpler adaptive algorithms

When H(n) is close to singular regularization

H(n) = 2Rxx + 2I

http://find/


46/107

Basic Adaptive Algorithms


Least Mean Squares (LMS) Algorithm
http://find/


47/107

Can be interpreted in different ways

Each interpretation helps understanding the algorithm behavior

Some of these interpretations are related to the steepest descentalgorithm


LMS as a Stochastic Gradient Algorithm
http://find/


48/107

Suppose we use the estimate Jms(n) = E[e2(n)] e2(n)

The estimated gradient vector becomes

Jms(n) =e2(n)

w(n)= 2e(n)

e(n)

w(n)

Since e(n) = d(n) xT(n)w(n),

Jms(n) = 2e(n)x(n) (stochastic gradient)

and, using the steepest descent weight update equation,

w(n + 1) = w(n) + e(n)x(n) (LMS weight update)


LMS as a Stochastic Estimation Algorithm
http://find/


49/107

Jms(n) = 2p+ 2Rxxw(n)Stochastic estimators

p = d(n)x(n) Rxx = x(n)xT(n)

Then,Jms(n) = 2d(n)x(n) + 2x(n)x

T(n)w(n)

Using Jms(n) is the steepest descent weight update,

w(n + 1) = w(n) + e(n)x(n)


LMS A Solution to a Local Optimization
http://find/


50/107

Error expressions

e(n) = d(n) xT(n)w(n) (a priori error)(n) = d(n) xT(n)w(n + 1) (a posteriori error)

We want to maximize |(n) e(n)| with |(n)| < |e(n)|

(n) e(n) = xT(n)w(n)

w(n) = w(n + 1) w(n)

Expressing w(n) as w(n) = w(n)e(n)

(n) e(n) = xT

(n)

w(n)e(n)

For max |(n) e(n)| w(n) in the direction ofx(n)



51/107

w(n) = x(n) and

w(n) = x(n)e(n)

andw(n + 1) = w(n) + e(n)x(n)

As (n) e(n) = xT(n)w(n) = xT(n)x(n)e(n)

|(n)| < |e(n)| requires |1 xT(n)x(n)| < 1, or

0 < < 2

x(n)

2 (stability region)


Observations - LMS Algorithm


52/107

LMS is a noisy approximation of the steepest descent algorithm

The gradient estimate is unbiased

The errors in the gradient estimate lead to Jmsex() = 0

Vector w(n) is now randomSteepest descent properties are no longer guaranteed LMS analysis required

The instantaneous estimates allow tracking without redesign


Some Research Results
http://find/


53/107

J. C. M. Bermudez and N. J. Bershad, A nonlinear analytical model for thequantized LMS algorithm - the arbitrary step size case, IEEE Transactions on

Signal Processing, vol.44, No. 5, pp. 1175-1183, May 1996.J. C. M. Bermudez and N. J. Bershad, Transient and tracking performanceanalysis of the quantized LMS algorithm for time-varying system identification,IEEE Transactions on Signal Processing, vol.44, No. 8, pp. 1990-1997, August1996.

N. J. Bershad and J. C. M. Bermudez, A nonlinear analytical model for thequantized LMS algorithm - the power-of-two step size case, IEEE Transactions onSignal Processing, vol.44, No. 11, pp. 2895- 2900, November 1996.

N. J. Bershad and J. C. M. Bermudez, Sinusoidal interference rejection analysisof an LMS adaptive feedforward controller with a noisy periodic reference, IEEETransactions on Signal Processing, vol.46, No. 5, pp. 1298-1313, May 1998.

J. C. M. Bermudez and N. J. Bershad, Non-Wiener behavior of the Filtered-XLMS algorithm, IEEE Trans. on Circuits and Systems II - Analog and DigitalSignal Processing, vol.46, No. 8, pp. 1110-1114, Aug 1999.


Some Research Results continued
http://find/


54/107

O. J. Tobias, J. C. M. Bermudez and N. J. Bershad, Mean weight behavior of theFiltered-X LMS algorithm, IEEE Transactions on Signal Processing, vol. 48, No.

4, pp. 1061-1075, April 2000.

M. H. Costa, J. C. M. Bermudez and N. J. Bershad, Stochastic analysis of theLMS algorithm with a saturation nonlinearity following the adaptive filter output,IEEE Transactions on Signal Processing, vol. 49, No. 7, pp. 1370-1387, July 2001.

M. H. Costa, J. C, M. Bermudez and N. J. Bershad, Stochastic analysis of the

Filtered-X LMS algorithm in systems with nonlinear secondary paths, IEEETransactions on Signal Processing, vol. 50, No. 6, pp. 1327-1342, June 2002.

M. H. Costa, J. C. M. Bermudez and N. J. Bershad, The performance surface infiltered nonlinear mean square estimation, IEEE Transactions on Circuits andSystems I, vol. 50, No. 3, p. 445-447, March 2003.

G. Barrault, J. C. M. Bermudez and A. Lenzi, New Analytical Model for theFiltered-x Least Mean Squares Algorithm Verified Through Active Noise ControlExperiment, Mechanical Systems and Signal Processing, v. 21, p. 1839-1852,2007.


Some Research Results continued
http://find/


55/107

N. J. Bershad, J. C. M. Bermudez and J. Y. Tourneret, Stochastic Analysis of theLMS Algorithm for System Identification with Subspace Inputs, IEEE Trans. onSignal Process., v. 56, p. 1018-1027, 2008.

J. C. M. Bermudez, N. J. Bershad and J. Y. Tourneret, An Affine Combination ofTwo LMS Adaptive Filters Transient Mean-Square Analysis, IEEE Trans. onSignal Process., v. 56, p. 1853-1864, 2008.

M. H. Costa, L. R. Ximenes and J. C. M. Bermudez, Statistical Analysis of the

LMS Adaptive Algorithm Subjected to a Symmetric Dead-Zone Nonlinearity at theAdaptive Filter Output, Signal Processing, v. 88, p. 1485-1495, 2008.

M. H. Costa and J. C. M. Bermudez, A Noise Resilient Variable Step-Size LMSAlgorithm, Signal Processing, v. 88, no. 3, p. 733-748, March 2008.

P. Honeine, C. Richard, J. C. M. Bermudez, J. Chen and H. Snoussi, A

Decentralized Approach for Nonlinear Prediction of Time Series Data in SensorNetworks, EURASIP Journal on Wireless Communications and Networking, v.2010, p. 1-13, 2010. (KNLMS)


The Normalized LMS Algorithm NLMS
http://find/


56/107

Most Employed adaptive algorithm in real-time applications

Like LMS has different interpretations (even more)

Alleviates a drawback of the LMS algorithm

w(n + 1) = w(n) + e(n)x(n)

If amplitude ofx(n) is large Gradient noise amplification Sub-optimal performance when 2x varies with time (for instance,

speech)


NLMS A Solution to a Local Optimization Problem

Error expressions
http://find/


57/107

Error expressionse(n) = d(n) xT(n)w(n) (a priori error)(n) = d(n) xT(n)w(n + 1) (a posteriori error)

We want to maximize |(n) e(n)| with |(n)| < |e(n)|

(n) e(n) = xT(n)w(n) (A)

w(n) = w(n + 1) w(n)

For |(n)| < |e(n)| we impose the restriction

(n) = (1 )e(n), |1 | < 1

(n) e(n) = e(n) (B)For max |(n) e(n)| w(n) in the direction ofx(n)

w(n) = kx(n) (C)

http://find/


58/107

Using (A), (B) and (C),

k = e(n)

xT(n)x(n)

and

w(n + 1) = w(n) + e(n)x(n)

xT(n)x(n)(NLMS weight update)


NLMS Solution to a Constrained

Optimization Problem
http://find/


59/107

Op b

Error sequence

y(n) =xT(n)w(n) (estimate of d(n))

e(n) =d(n) y(n) = d(n) xT(n)w(n) (estimation error)

Optimization (principle of minimal disturbance)

Minimize w(n)2 = w(n + 1) w(n)2

subject to: xT(n)w(n + 1) = d(n)

This problem can be solved using the method of Lagrange multipliers


Using Lagrange multipliers, we minimize

T
http://find/


60/107

f[w(n + 1)] = w(n + 1) w(n)2 + [d(n) xT(n)w(n + 1)]

Differentiating w.r.t. w(n + 1) and equating the result to zero,

w(n + 1) = w(n) +1

2x(n) (*)

Using this result in xT(n)w(n + 1) = d(n) yields

=2e(n)

xT(n)x(n)

Using this result in (*) yields

w(n + 1) = w(n) + e(n)x(n)

xT(n)x(n)


NLMS as an Orthogonalization Process
http://find/


61/107

Conditions for the analysis e(n) = d(n) xT(n)w(n) wo is the optimal solution (Wiener solution) d(n) = xT(n)wo (no noise) v(n) = w(n) wo (weight error vector)

Error signal

e(n) = d(n) xT(n)[v(n) + wo]

= xT(n)wo xT(n)[v(n) + wo]

= xT

(n)v

(n)


T
http://find/


62/107

e(n) = xT(n)v(n)

Interpretation: To minimize the error v(n) should be orthogonal to allinput vectorsRestriction: We have only one vector x(n)

Iterative solution:

We can subtract from v(n) its component in the direction ofx(n) ateach iteration

If there are N adaptive coefficients, v(n) could be reduced to zeroafter N orthogonal input vectors

Iterative projection extraction Gram-Schmidt orthogonalization


Recursive orthogonalization

n = 0 : v(1) = v(0) proj of v(0) onto x(0)
http://find/


63/107

n = 0 : v(1) = v(0) proj. ofv(0) onto x(0)

n = 1 : v(2) = v(1) proj. ofv(1) onto x(1)

... : ...

n + 1 : v(n + 1) = v(n) proj. ofv(n) onto x(n)

Projection ofv(n) onto x(n)

Px(n)[v(n)] =x(n)[xT(n)x(n)]1xT(n)

v(n)

Weight update equation

v(n + 1) = v(n)

scalar [xT(n)x(n)]1

e(n) xT(n)v(n) x(n)

= v(n) + e(n)x(n)

xT(n)x(n)


NLMS has its owns problems!
http://find/


64/107

NLMS solves the LMS gradient error amplification problem, but ...

What happens ifx(n)2 gets too small?

One needs to add some regularization

w(n + 1) = w(n) + e(n)x(n)

xT(n)x(n)+(NLMS)


-NLMS Stochastic Approximation of

Regularized Newton
http://find/


65/107

Regularized Newton Algorithm

w(n + 1) = w(n) + [I+Rxx]1[pRxxw(n)]

Instantaneous estimates

p = x(n)d(n)Rxx = x(n)x

T(n)

Using these estimates

w(n + 1) = w(n) + [I+ x(n)xT(n)]1x(n)

e(n) [d(n) xT(n)w(n)]

w(n + 1) = w(n) + [I+ x(n)xT(n)]1x(n)e(n)


w(n + 1) = w(n) + [I+ x(n)xT(n)]1x(n)e(n)
http://find/


66/107

Inversion of I+ x(n)xT(n) ?

Matrix Inversion Formula

[A+BCD]1 = A1 A1B[C1 +DA1B]1DA1

Thus,

[I+ x(n)xT(n)]1 = 1I 2

1 + 1xT(n)x(n)x(n)xT(n)

Post-multiplying both sides by x(n) and rearranging

[I+ x(n)xT

(n)]1

x(n) =

x(n)

+ xT(n)x(n)

and

w(n + 1) = w(n) + e(n)x(n)

xT(n)x(n) +



M. H. Costa and J. C. M. Bermudez, An improved model for the normalized LMSG f ffi f


67/107

algorithm with Gaussian inputs and large number of coefficients, in Proc. of the2002 IEEE International Conference on Acoustics, Speech and Signal Processing

(ICASSP-2002), Orlando, Florida, pp. II- 1385-1388, May 13-17, 2002.J. C. M. Bermudez and M. H. Costa, A Statistical Analysis of the -NLMS andNLMS Algorithms for Correlated Gaussian Signals, Journal of the BrazilianTelecommunications Society, v. 20, n. 2, p. 7-13, 2005.

G. Barrault, M. H. Costa, J. C. M. Bermudez and A. Lenzi, A new analyticalmodel for the NLMS algorithm, in Proc. 2005 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP 2005) Pennsylvania, USA, vol.IV, p. 41-44, 2005.

J. C. M. Bermudez, N. J. Bershad and J.-Y. Tourneret, An Affine Combination ofTwo NLMS Adaptive Filters - Transient Mean-Square Analysis, In Proc.Forty-Second Asilomar Conference on Asilomar Conference on Signals, Systems &

Computers, Pacific Grove, CA, USA, 2008.P. Honeine, C. Richard, J. C. M. Bermudez and H. Snoussi, Distributedprediction of time series data with kernels and adaptive filtering techniques insensor networks, In Proc. Forty-Second Asilomar Conference on AsilomarConference on Signals, Systems & Computers, Pacific Grove, CA, USA, 2008.


The Affine Projection Algorithm
http://find/


68/107

Consider the NLMS algorithm but using{x(n),x(n 1), . . . ,x(n P)}

Minimize w(n)2 = w(n + 1) w(n)2

subject to:

x

T

(n)w(n + 1) = d(n)xT(n 1)w(n + 1) = d(n 1)

...

xT(n P)w(n + 1) = d(n P)


Observation matrix
http://find/


69/107

X(n) = [x(n),x(n 1), ,x(n P)]

Desired signal vector

d(n) = [d(n), d(n 1), . . . , d(n P)]

Error vector

e(n) = [e(n), e(n 1), . . . , e(n P)] = d(n)XT(n)w(n)

Vector of the constraint errors

ec(n) = d(n) XT(n)w(n + 1)


Using Lagrange multipliers, we minimize
http://find/


70/107

f[w(n + 1)] = w(n + 1) w(n)2 + T[d(n) XT(n)w(n + 1)]

Differentiating w.r.t. w(n + 1) and equating the result to zero,

w(n + 1) = w(n) +1

2X(n) (*)

Using this result in XT(n)w(n + 1) = d(n) yields

= 2[XT(n)X(n)]1e(n)

Using this result in (*) yields

w(n + 1) = w(n) + X(n)[XT(n)X(n)]1e(n)


Affine ProjectionSolution to the Underdetermined Least-Squares Problem


71/107

We want minimize

ec(n)2 = d(n)XT(n)w(n + 1)2

where XT(n) is (P + 1) N with (P + 1) < N

Thus, we look for the least-squares solution of the underdetermined

system XT(n)w(n + 1) = d(n)

The solution is

w(n + 1) =XT(n)

+d(n) = X(n)[XT(n)X(n)]1d(n)

Using d(n) = e(n) + XT(n)w(n) yields

w(n + 1) = w(n) + X(n)[XT(n)X(n)]1e(n)



72/107

Observations:

Order of the AP algorithm: P + 1 (P = 0 NLMS)

Convergence speed increases with P (but not linearly)

Computational complexity increases with P (not linearly)

IfXT

(n)X(n) is close to singular, use XT

(n)X(n) + IThe scalar error of NLMS becomes a vector error in AP (except for = 1)


Affine Projection Stochastic Approximation ofRegularized Newton
http://find/


73/107

Regularized Newton Algorithm

w(n + 1) = w(n) + [I+Rxx]1

[pRxxw(n)]Estimates using time window statistics

p =1

P + 1

nk=nP

X(k)d(k) =1

P + 1X(n)d(n)

Rxx =1

P + 1

nk=nP

X(k)XT(k) =1

P + 1X(n)XT(n)

Using these estimates with = /(P + 1),

w(n + 1) = w(n) + [I+X(n)XT(n)]1X(n)

e(n)

[d(n) XT(n)w(n)]

w(n + 1) = w(n) + [I+X(n)XT(n)]1X(n)e(n)

http://find/


74/107

w(n + 1) = w(n) + [I+X(n)XT(n)]1X(n)e(n)

Using the Matrix Inversion Formula

[I+X(n)XT(n)]1 = X(n)[I+XT(n)X(n)]1

and

w(n + 1) = w(n) + X(n)[I+XT(n)X(n)]1e(n)


Affine Projection Algorithm as aProjection onto an Affine Subspace
http://find/


75/107

Conditions for the analysis e(n) = d(n) XT(n)w(n) d(n) = XT(n)wo defines the optimal solution in the least-squares

sense v(n) = w(n) wo (weight error vector)

Error vector

e(n) = d(n)XT(n)[v(n) + w(n + 1)]

= XT(n)w(n + 1) XT(n)[v(n) + w(n + 1)]

= XT(n)v(n)


e(n) = XT(n)v(n)


76/107

Interpretation: To minimize the error v(n) should be orthogonal to allinput vectorsRestriction: We are going to use only {x(n),x(n 1), . . . ,x(nP)}

Iterative solution:

We can subtract from v(n) its projection onto the range ofX(n) at

each iteration

v(n + 1) = v(n) PX(n)v(n) (Proj. onto an affine subspace)

Using XT(n)v(n) = e(n)

v(n + 1) = v(n) X(n)[XT(n)X(n)]1XT(n)

v(n)

= v(n) + X(n)[XT(n)X(n)]1e(n) (AP algorithm)


Pseudo Affine Projection AlgorithmMajor problem with the AP algorithmFor = 1 e(n) e(n) large computational complexity
http://find/


77/107

For = 1, e(n) e(n) large computational complexity

The Pseudo-AP algorithm replaces the input x(n) with its P-th order

autoregressive prediction

x(n) =Pk=1

akx(n k)

Using the least-squares solution for a = [a1, a2, . . . , aP]T,

a = [XTp (n)Xp(n)]

1XTp (n)x(n), Xp(n) = [x(n 1), . . . ,x(n P)]

Now, we subtract from x(n) its projections onto the last P inputvectors

(n) = x(n)Xp(n)a(n)

= x(n)Xp(n)[X

Tp (n)Xp(n)]

1XTp (n)x(n)

= (I PP)x(n); (PP: projection matrix)


Example Pseudo-AP versus AP
http://find/


78/107

0 1000 2000 3000 4000 5000 6000

80

70

60

50

40

30

20

10

0Mean Square Error

dB

Iterations

AR(8)=[0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2]N=64P=8SNR=80dBS=Null

AP(72.82dB) PAP(75.82)

=0.7


Using as the new input vector for the NLMS algorithm,


79/107

Using as the new input vector for the NLMS algorithm,

w(n + 1) = w(n) + (n)T(n)(n)

e(n)

It can be shown that Pseudo-AP is identical to AP for an AR inputand = 1

Otherwise, Pseudo-AP is different from AP

Simpler to implement than AP for = 1

Can lead even to better steady-state results than AP for AR inputs

For AR inputs: NLMS with input orthogonalization


Some Research ResultsS. J. M. de Almeida, J. C. M. Bermudez, N. J. Bershad and M. H. Costa, Astatistical analysis of the affine projection algorithm for unity step size andautoregressive inputs, IEEE Transactions on Circuits and Systems - I, vol. 52, pp.
http://find/


80/107

g p y pp1394-1405, July 2005.

S. J. M. Almeida, J. C. M. Bermudez and N. J. Bershad, A Stochastic Model fora Pseudo Affine Projection Algorithm, IEEE Transactions on Signal Processing, v.57, p. 107-118, 2009.

S. J. M. Almeida, M. H. Costa and J. C. M. Bermudez, A Stochastic Model forthe Deficient Length Pseudo Affine Projection Adaptive Algorithm, In Proc. 17thEuropean Signal Processing Conference (EUSIPCO), Aug. 24-28, Glasgow,

Scotland, 2009.S. J. M. Almeida, M. H. Costa and J. C. M. Bermudez, A stochastic model forthe deficient order affine projection algorithm, in Proc. ISSPA 2010, KualaLumpur, Malaysia, May 2010.

C. Richard, J. C. M. Bermudez and P. Honeine, Online Prediction of Time Series

Data With Kernels, IEEE Transactions on Signal Processing, v. 57, p.1058-1067, 2009.

P. Honeine, C. Richard, J. C. M. Bermudez, H. Snoussi, M. Essoloh and F.Vincent, Functional estimation in Hilbert space for distributed learning in wirelesssensor networks, In Proc. IEEE International Conference on Acoustics Speech andSignal Processing (ICASSP), Taipei, Taiwan, p. 2861-2864, 2009.

http://find/


81/107

Deterministic Algorithms


Recursive Least Squares Algorithm (RLS)
http://find/


82/107

Based on a deterministic philosophy

Designed for the present realization of the input signals

Least squares method adapted for real time processing of temporal

series

Convergence speed is not strongly dependent on the input statistics


RLS Problem Definition

Define
http://find/


83/107

Define

xo(n) = [x(n), x(n 1), . . . , x(0)]T

x(n) = [x(n), x(n 1), . . . , x(n N + 1)]T input to the N-tap filter

Desired signal d(n)

Estimator of d(n): y(n) = xT

(n)w(n)Cost function Squared Error

Jls(n) =n

k=0

e2(k) =n

k=0

[d(k) xT(k)w(n)]2

= eT(n)e(n), e(n) = [e(n), e(n 1), . . . , e(0)]T

Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse 2011 83 / 107

In vector form

d(n) = [d(n), d(n 1), . . . , d(0)]T

T T
http://find/


84/107

y(n) = [y(n), y(n 1), . . . , y(0)]T, y(k) = xT(k)w(n)

y(n) =N1k=0

wk(n)xo(n k) = (n)w(n)

(n) =

x(n) x(n 1) x(n N + 1)

x(n 1) x(n 2) x(n N)...

......

...x(0) x(1) x(N + 1)

= [xo(n),xo(n 1), . . .xo(n N + 1)]

e(n) = d(n)(n)w(n)


e(n) = d(n)(n)w(n)

and we minimizeJl (n) = e(n)

2


85/107

Jls(n) = e(n)

Normal Equations

T(n)(n)w(n) = T(n)d(n)

R(n)w(n) = p(n)

Alternative representation for R and p

R(n) = T(n)(n) =n

k=0x(k)xT(k)

p(n) = T(n)d(n) =n

k=0

x(k)d(k)


RLS Observations
http://find/


86/107

w(n) remains fixed for 0 k n to determine Jls(n)

Optimum vector w(n) is wo(n) = R1(n)p

The algorithm has growing memory

In an adaptive implementation,

wo(n) = R1(n)p

must be solved for each iteration n

Problems for an adaptive implementation Dificulties with nonstationary signals (infinite data window) Computational complexity


RLS - Forgetting the PastModified cost function

( )

nk 2( )

nk[ ( ) T ( ) ( )]2
http://find/


87/107

Jls(n) =k=0

nke2(k) =k=0

nk[d(k) xT(k)w(n)]2

= eT(n)e(n), = diag[1, , 2, . . . , n]

Modified Normal Equations

T(n)(n)w(n) = T(n)d(n)

R(n)w(n) = p(n)

where

R(n) = T(n)(n) =n

k=0

nkx(k)xT(k)

p(n) = T(n)d(n) =

nk=0

nkx(k)d(k)


RLS Recursive UpdatingCorrelation Matrix

( )

nk ( ) T ( )
http://find/


88/107

R(n) =k=0

nkx(k)xT(k)

=

n1k=0

nkx(k)xT(k) + x(n)xT(n)

= R(n 1) + x(n)xT(n)

Cross-correlation vector

p(n) =n

k=0

nkx(k)d(k)

=n1k=0

nkx(k)d(k) + x(n)d(n)

= p(n 1) + x(n)d(n)


We have recursive updating expressions for R(n) and p(n)

However we need a recursive updating for R1

(n) as
http://find/


89/107

However, we need a recursive updating for R (n), as

wo(n) = R1

(n)p(n)

Applying the matrix inversion lemma to

R(n) =

R(n 1) + x(n)xT

(n)

and defining P(n) = R1

(n) yields

P(n) = 1P(n 1) 2P(n 1)x(n)xT(n)P(n 1)

1 + 1xT

(n)P(n 1)x(n)


The Gain Vector

Definition1P (n 1)x(n)
http://find/


90/107

k(n) = P(n 1)x(n)

1 + 1

xT

(n)P(n 1)x(n)Using this definition

P(n) = 1P(n 1) 1k(n)xT(n)P(n 1)

k(n) can be written as

k(n) =

P(n) 1P(n 1) 1k(n)xT(n)P(n 1)

x(n)

=R1

(n)x(n)

k(n) is the repres. ofx(n) on the column space of R(n)

Jose Bermudez (UFSC) Adaptive Filtering IRIT Toulouse 2011 90 / 107

RLS Recursive Weight Update

w(n + 1) = wo(n)
http://find/


91/107

Using

w(n + 1) = R1

(n)p(n) = P(n)p(n)

p(n) = p(n 1) + x(n)d(n)

P(n) = 1P(n 1) 1k(n)xT(n)P(n 1)

k(n) = P(n)x(n)

e(n) = d(n) xT(n)w(n)

yields

w(n + 1) = w(n) + k(n)e(n) RLS weight update


RLS Observations

The RLS convergence speed is not affected by the eigenvalues of
http://find/


92/107

The RLS convergence speed is not affected by the eigenvalues of

R(n)Initialization: p(0) = 0, R(0) = I, (1 )2x

This results in R(n) changed to nI+ R(n) Biased estimate ofwo(n) for small n(No problem for < 1 and large n)

Numerical problems in finite precision Unstable for = 1 Loss of symmetry in P(n)

Evaluate only upper (lower) part and diagonal ofP(n) Replace P(n) with [P(n) + PT(n)]/2 after updating from P(n 1)

Numerical problems when x(n) 0 and < 1


http://find/


93/107

C. Ludovico and J. C. M. Bermudez, A recursive least squaresalgorithm robust to low- power excitation, in. Proc. 2004 IEEEInternational Conference on Acoustics, Speech and Signal Processing(ICASSP-2004), Montreal, Canada, vol. II, p. 673-677, 2004.

C. Ludovico and J. C. M. Bermudez, An improved recursive leastsquares algorithm robust to input power variation, in Proc. IEEEStatistical Signal Processing Workshop, Bordeaux, France, 2005.


Performance Comparison

System identification, N=100

Input AR(1) x(n) = 0.9x(n 1) + v(n)
http://find/


94/107

p ( ) ( ) ( ) ( )

AP algorithm with P = 2Step sizes designed for equal LMS and NLMS performances

Impulse response to be estimated

0 10 20 30 40 50 60 70 80 90 1000.1

0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

sample

impulseresponse


Excess Mean Square Error - LMS, NLMS and AP

10EMSE for LMS, NLMS and AP(2)
http://find/


95/107

0 1000 2000 3000 4000 5000 600080

70

60

50

40

30

20

10

0

iteration

EM

SE(dB)


Excess Mean Square Error - LMS, NLMS, AP and RLS

10EMSE for LMS, NLMS, AP(2) and RLS
http://find/


96/107

0 1000 2000 3000 4000 5000 6000

80

70

60

50

40

30

20

10

0

10

iteration

E

MSE(dB)


Performance ComparisonsComputational Complexity

Adaptive filter with N real coefficients and real signals
http://find/


97/107

Adaptive filter with N real coefficients and real signals

For the AP algorithm, K = P + 1

Algorithm + /

LMS 2N + 1 2N

NLMS 3N + 1 3N 1

AP (K2 + 2K)N + K3 + K (K2 + 2K)N + K3 + K2RLS N2 + 5N + 1 N2 + 3N 1

For N = 100, P = 2

Algorithm + / factor

LMS 201 200 1

NLMS 301 300 1 1.5

AP 1, 530 1, 536 7.5

RLS 10, 501 10, 300 1 52.5

J s B d (UFSC) Ad ti Filt i IRIT T l s 2011 97 / 107

T i l l f ti h ll ti (N 1024 P 2)
http://find/


98/107

Typical values for acoustic echo cancellation (N = 1024, P = 2)

Algorithm + / factor

LMS 2, 049 2, 048 1

NLMS 3, 073 3, 072 1 1.5

AP 15, 390 15, 396 7.5RLS 1, 053, 697 1, 051, 648 1 514

J B d (UFSC) Ad ti Filt i IRIT T l 2011 98 / 107

How to Deal with Computational Complexity?
http://find/


99/107

Not an easy task!!!

There are fast versions for some algorithms (especially RLS)

What is usually not said is that ... speed can bring

Instability Increased need for memory

Most applications rely on simple solutions

http://find/


100/107

Understanding the Adaptive Filter Behavior


What are Adaptive Filters?Adaptive filters are, by design, systems that are

Time-variant (w(n) is time-variant)

Nonlinear (y(n) is a nonlinear function of x(n))
http://find/


101/107

Nonlinear (y(n) is a nonlinear function ofx(n))

Stochastic (w(n) is random)

LMS:

w(n + 1) = w(n) + e(n)x(n)

=I x(n)xT(n)

x(n) + d(n)x(n)

y(n) = xT(n)w(n)

They are difficult to understandDifferent applications and signals require different analyses

Simplifying assumptions are necessary

Good design requires good analytical models.

J B d (UFSC) Ad i Fil i IRIT T l 2011 101 / 107

LMS AnalysisBasic equations:

d(n) = xT(n)wo + z(n)T
http://find/


102/107

e(n) = d(n) xT

(n)w

(n)w(n + 1) = w(n) + e(n)x(n)

v(n) = w(n)wo (study about the optimum weight vector)

Mean Weight Behavior:

Neglecting dependence between v(n) andx(n)x

T

(n),E[v(n + 1)] = (I Rxx)E[v(n)]

Follows the steepest descent trajectory

Convergence condition: 0 < < 2/maxlimn

E[w(n)] = wo

However: w(n) is random We need to study its fluctuations about E[w(n)] MSE

J B d (UFSC) Ad i Fil i IRIT T l 2011 102 / 107

The Mean-Square Estimation Error

e(n) = eo(n) xT(n)v(n), eo(n) = d(n) x

T(n)wo
http://find/


103/107

Square the error equation

Take the expected value

Neglect the statistical dep. between v(n) and x(n)xT(n)

Jms(n) = E[e2o(n)] + T r{RxxK(n)}, K(n) = E[v(n)v

T(n)]

This expression is independent of the adaptive algorithm

Effect of the alg. on Jms(n) is determined by K(n)

Jmsmin = E[e2o(n)]

Jmsex = T r{RxxK(n)}


The Behavior ofK(n) for LMS

Using the basic equations

T
http://find/


104/107

v(n + 1) = I+ x(n)xT

(n)v(n) + eo(n)x(n)Post-multiply by vT(n + 1)

Take the expected value assuming v(n) and x(n)xT(n) independent

Evaluate the expectations Higher order moments of x(n) require input pdf

Assuming Gaussian inputs

K(n + 1) =K(n) [RxxK(n) + K(n)Rxx]

+ 2

{RxxTr[RxxK(n)] + 2RxxK(n)Rxx}+ 2RxxJmsmin


Design Guidelines Extrated from the Model

Stability
http://find/


105/107

0

Seminar Bermudez

Documents

Transcript of Seminar Bermudez