A Family of Robust Algorithms Exploiting Sparsity in Adaptive Filters

10
572 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY 2009 A Family of Robust Algorithms Exploiting Sparsity in Adaptive Filters Leonardo Rey Vega, Student Member, IEEE, Hernán Rey, Jacob Benesty, Senior Member, IEEE, and Sara Tressens Abstract—We introduce a new family of algorithms to exploit sparsity in adaptive filters. It is based on a recently introduced new framework for designing robust adaptive filters. It results from minimizing a certain cost function subject to a time-dependent con- straint on the norm of the filter update. Although in general this problem does not have a closed-form solution, we propose an ap- proximate one which is very close to the optimal solution. We take a particular algorithm from this family and provide some theoret- ical results regarding the asymptotic behavior of the algorithm. Fi- nally, we test it in different environments for system identification and acoustic echo cancellation applications. Index Terms—Acoustic echo cancellation, adaptive filtering, im- pulsive noise, robust filtering, sparse systems. I. INTRODUCTION I N the last several years, several authors have paid atten- tion to the problem of system identification of sparse sys- tems [1]–[6]. These systems have the property of concentrating most of their energy in a small fraction of their coefficients. In principle, using different dynamics for updating each coefficient might result in an improvement of the initial speed of conver- gence without compromising the steady-state behavior. In [1], a detection guided normalized least-mean-square (NLMS) algorithm is proposed. However, if the signal-to-back- ground-noise ratio (SBNR) is high, the steady-state behavior might be severely compromised. In [3], a different approach allows the inclusion of a priori information on the system. Nevertheless, the performance of the resulting algorithms might be compromised in the presence of large perturbations. Only a few attempts have been done [4] to exploit sparsity while performing robustly (in the sense of “slightly sensitive to large perturbations”). In this paper, we use a recently introduced framework for the design of robust adaptive filters [7]. As a result, a family Manuscript received July 11, 2008; revised October 24, 2008. Current ver- sion published March 10, 2009. This work was supported in part by the Uni- versidad de Buenos Aires under project UBACYT I005. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Jingdong Chen. L. Rey Vega is with theDepartment of Electronics and CONICET, Univer- sidad de Buenos Aires, 1063 Buenos Aires, Argentina (e-mail: lrey@fi.uba.ar). H. Rey is with the Instituto de Ingeniería Biomédica (FIUBA) and CONICET, Universidad de Buenos Aires, 1063 Buenos Aires, Argentina (e-mail: hrey@fi. uba.ar). J. Benesty is with the INRS-EMT, Université du Québec, Montréal, Québec, H5A 1K6, Canada (e-mail: [email protected]). S. Tressens is with the Department of Electronics, Universidad de Buenos Aires, 1063 Buenos Aires, Argentina (e-mail: lrey@fi.uba.ar). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TASL.2008.2010156 of robust adaptive filters suitable for sparse system identifica- tion arises. Particularly, we use the ideas of the improved pro- portionate NLMS (IPNLMS) algorithm [5]. The proposed algo- rithm can also be interpreted as a variable step-size IPNLMS. We present some theoretical results regarding the asymptotic behavior of the algorithm. Its performance is then tested under several scenarios in system identification and in acoustic echo cancellation problems. Finally, we present the notation used throughout the paper. Let be an unknown linear finite-impulse response system. The input vector at time , , passes through the system leading to . This output is observed, but it is usually corrupted by a noise , which will be considered addi- tive. In many practical situations, , where stands for the background measurement noise and is an impulsive noise. Thus, each input gives an output . We want to find , an estimate of . This filter receives the same input, leading to an output error . The misalignment vector is . We also define the a posteriori error and the a priori error . II. NEW FAMILY OF ROBUST ADAPTIVE FILTERS We propose to design a new family of adaptive filters using the framework introduced in [7]. In order to avoid a degradation of the system performance after a large noise sample perturbs it, the energy of the filter update is constrained at each iteration. This can be formally stated as (1) where is a positive sequence. Its choice will influence the dynamics of the algorithm but in any case, (1) guarantees that any noise sample can perturb the square norm of the filter update by at most the amount . Next, a cost function is required. Following suggestions in [6], we minimize (2) subject to the constraint (1). Here, the matrices and are any positive definite matrices. Different choices of these ma- trices will lead to different algorithms. These matrices could de- pend on which would make the parameter space to be a Riemannian space, i.e., a curved differentiable manifold where the distance properties are not uniform along the space. The use of Riemannian manifolds could be exploited when some a priori information about the true system is known [3], [8], [9]. Such a 1558-7916/$25.00 © 2009 IEEE Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

Transcript of A Family of Robust Algorithms Exploiting Sparsity in Adaptive Filters

572 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY 2009

A Family of Robust Algorithms ExploitingSparsity in Adaptive Filters

Leonardo Rey Vega, Student Member, IEEE, Hernán Rey, Jacob Benesty, Senior Member, IEEE, and Sara Tressens

Abstract—We introduce a new family of algorithms to exploitsparsity in adaptive filters. It is based on a recently introduced newframework for designing robust adaptive filters. It results fromminimizing a certain cost function subject to a time-dependent con-straint on the norm of the filter update. Although in general thisproblem does not have a closed-form solution, we propose an ap-proximate one which is very close to the optimal solution. We takea particular algorithm from this family and provide some theoret-ical results regarding the asymptotic behavior of the algorithm. Fi-nally, we test it in different environments for system identificationand acoustic echo cancellation applications.

Index Terms—Acoustic echo cancellation, adaptive filtering, im-pulsive noise, robust filtering, sparse systems.

I. INTRODUCTION

I N the last several years, several authors have paid atten-tion to the problem of system identification of sparse sys-

tems [1]–[6]. These systems have the property of concentratingmost of their energy in a small fraction of their coefficients. Inprinciple, using different dynamics for updating each coefficientmight result in an improvement of the initial speed of conver-gence without compromising the steady-state behavior.

In [1], a detection guided normalized least-mean-square(NLMS) algorithm is proposed. However, if the signal-to-back-ground-noise ratio (SBNR) is high, the steady-state behaviormight be severely compromised. In [3], a different approachallows the inclusion of a priori information on the system.Nevertheless, the performance of the resulting algorithmsmight be compromised in the presence of large perturbations.Only a few attempts have been done [4] to exploit sparsitywhile performing robustly (in the sense of “slightly sensitive tolarge perturbations”).

In this paper, we use a recently introduced framework forthe design of robust adaptive filters [7]. As a result, a family

Manuscript received July 11, 2008; revised October 24, 2008. Current ver-sion published March 10, 2009. This work was supported in part by the Uni-versidad de Buenos Aires under project UBACYT I005. The associate editorcoordinating the review of this manuscript and approving it for publication wasDr. Jingdong Chen.

L. Rey Vega is with the Department of Electronics and CONICET, Univer-sidad de Buenos Aires, 1063 Buenos Aires, Argentina (e-mail: [email protected]).

H. Rey is with the Instituto de Ingeniería Biomédica (FIUBA) and CONICET,Universidad de Buenos Aires, 1063 Buenos Aires, Argentina (e-mail: [email protected]).

J. Benesty is with the INRS-EMT, Université du Québec, Montréal, Québec,H5A 1K6, Canada (e-mail: [email protected]).

S. Tressens is with the Department of Electronics, Universidad de BuenosAires, 1063 Buenos Aires, Argentina (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TASL.2008.2010156

of robust adaptive filters suitable for sparse system identifica-tion arises. Particularly, we use the ideas of the improved pro-portionate NLMS (IPNLMS) algorithm [5]. The proposed algo-rithm can also be interpreted as a variable step-size IPNLMS.We present some theoretical results regarding the asymptoticbehavior of the algorithm. Its performance is then tested underseveral scenarios in system identification and in acoustic echocancellation problems.

Finally, we present the notation used throughout the paper.Let be an unknownlinear finite-impulse response system. The input vectorat time , , passes through thesystem leading to . This output is observed, but it isusually corrupted by a noise , which will be considered addi-tive. In many practical situations, , where standsfor the background measurement noise and is an impulsivenoise. Thus, each input gives an output .We want to find , an estimate of . This filter receives thesame input, leading to an output error .The misalignment vector is . We also definethe a posteriori error and the a priori error

.

II. NEW FAMILY OF ROBUST ADAPTIVE FILTERS

We propose to design a new family of adaptive filters usingthe framework introduced in [7]. In order to avoid a degradationof the system performance after a large noise sample perturbsit, the energy of the filter update is constrained at each iteration.This can be formally stated as

(1)

where is a positive sequence. Its choice will influence thedynamics of the algorithm but in any case, (1) guarantees thatany noise sample can perturb the square norm of the filter updateby at most the amount . Next, a cost function is required.Following suggestions in [6], we minimize

(2)

subject to the constraint (1). Here, the matrices and areany positive definite matrices. Different choices of these ma-trices will lead to different algorithms. These matrices could de-pend on which would make the parameter space to be aRiemannian space, i.e., a curved differentiable manifold wherethe distance properties are not uniform along the space. The useof Riemannian manifolds could be exploited when some a prioriinformation about the true system is known [3], [8], [9]. Such a

1558-7916/$25.00 © 2009 IEEE

Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

REY VEGA et al.: FAMILY OF ROBUST ALGORITHMS EXPLOITING SPARSITY IN ADAPTIVE FILTERS 573

case could be when we know that the subjacent true system issparse.

It is easy to show that the unconstrained problem given onlyby the cost function (2) has a unique minimum given by

(3)

With this in mind and in order to minimize subject to(1), we should investigate two situations.

Case 1) The minimum of the unconstrained problem iscontained in the hypersphere given by (1).

In this case, the solution to the constrained problem is given by(3). This case takes place if and only if

(4)

Case 2) The minimum of the unconstrained problem is notcontained in the hypersphere given by (1).

This is the situation when (4) is not satisfied. Defining as

(5)

the cost function can be rewritten as

(6)

where . Defining , theproblem to be solved can be put as

subject to (7)

This is a quadratically constrained quadratic program. Thesekinds of problems are well studied [10]. However, in generalthey do not admit a closed form solution. Let us define

(8)

where is a diagonal positive definite matrix. The rationalefor using this structure for is the same as in [2]. Althoughthe choice of will be made more explicit in Section III, wecan say that the choice of the diagonal elements of this matrixwill be proportional to the amplitude of the correspondent entryof . This allows different adaptation dynamics for the co-efficients according to their influence in the adaptive filter. Thiscould increase the convergence speed of the adaptive filter al-gorithm when the subjacent true system is sparse, because thestronger and most significant coefficients will converge faster.

With (8) in (7), we can write the Lagrange function

(9)

where is a Lagrange multiplier. Defining and as the-components of and and as the -diagonal element

of and setting to zero the gradient of with respectof , we can write

(10)

This is the optimal value of which obviously depends on .The optimal value of the Lagrange multiplier is one of the rootsof the following polynomial:

(11)This is a -order polynomial equation, which is impossible tosolve in close form when . The closed form solution for

corresponds to the roots of a quartic equation, which arevery difficult to compute. Moreover, the case is of nopractical interest. Typically, could be very large and the useof numerical root finding algorithms in real-time could be notan option for finding the Lagrange multiplier. This is becauseof the high computational load and the precision required in thecomputation of the roots of (11). For that reason, the problem (7)should be solved using other numerical techniques as numericalgradient search techniques [11]. However, the implementationof these numerical algorithms in real-time problems could beprecluded, especially if is large.

Therefore, we propose the approximate solution:

(12)

Clearly, . If is small enough, (12) will beclose to the optimal solution of problem (7) (since the optimaland suboptimal solutions must lie inside the hypersphere of ra-dius ). It is true that other suboptimal solutions could beproposed, each of them leading to a different algorithm. For ex-ample, the vector with all its entries equal to alsosatisfy the constraint and it is close to the optimal one. However,this solution will not perform as well as (12). The proposed so-lution in (12) has the same direction than the update in (3) (withthe choices of and given in (8), which is the solution ofthe unconstrained problem. It is clear then that this update is agood one, preserving the direction that minimize (2) but withthe norm of the update constrained to , which providesrobustness. It also has to be mentioned that solution (12) is aneasily implementable one, which only requires an appropriatenormalization of the update given in (3). As we will see later,the results of Fig. 2 also validate the use of (12).

Replacing (8) in (3), (4), and (11), leads to the family of robustadaptive filters

(13)

Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

574 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY 2009

The only thing that remains is the choice of the delta se-quence. Following the ideas in [7], we choose

(14)

The memory factor can be chosen as

(15)

where is an integer-valued parameter that depends on the cor-relation of the input. The initial condition of the delta sequencecan be also parameterized as , with andstanding for the power of the input and observed output signals,respectively.

From (13), the proposed algorithm can be interpreted as a ro-bust variable step-size version of the normalized natural gra-dient algorithm [12]. When the condition (4) is satisfied, i.e.,there is a low chance of having a large noise sample contami-nating the error signal as long as is not too large, the stepsize is equal to one. If this is not the case, then .Using the same ideas of [7], it can be proved that the sequence(14) is strictly decreasing towards zero, regardless of the valuesof and . Therefore, the adaptive step size is not only pro-viding the algorithm with a switching mechanism in order toperform robustly against the noise. As the error decreases duringthe adaptation, the variable step-size will also let the algorithmto further decrease the error when the update (3) is no longercapable of doing so. As a consequence of this fact, under cer-tain hypotheses, we will show in Section IV that the algorithmconverges in the mean-square sense to the true system.

Different choices of will lead to different algorithms withthe potential of exploiting the sparsity of the system (PNLMS[2], IPNLMS [5], and other variants [6]). It is important to notethat if , the update (13) reduces to the RVSS-NLMSpresented in [7]. In Section III, we will make a particular choiceof and simulate how close is the proposed solution (12) tothe optimal one in (7).

III. PARTICULAR SELECTION OF

From all the possible , we choose a scaled version of theone associated with the IPNLMS [5]. This matrix is diagonal,with the th element of its diagonal computed as

(16)

where is a small positive constant required at the beginningof the adaptation when typically . The reason to usethe scaled version of the matrix of the IPNLMS is that itcan be regularized with the same constant as its NLMS coun-terpart. In fact, when , the IPNLMS is equivalent tothe NLMS. In the sequel, we set in all the cases[5]. The advantage of the IPNLMS is that it works well even ifthe system is not sparse [5]. The resulting algorithm using (13),

Fig. 1. Impulse response of a sparse system.

(14) and (16) is referred as the robust variable step-size IPNLMS(RVSS-IPNLMS).

Now we compare the proposed suboptimal solution, in termsof (2), with the optimal one. We define the SBNR as

SBNR (17)

where and stand for the power of the system outputand background noise, respectively. We use an AR1(0.8) inputsignal and SBNR dB. The true system is chosen as thefirst 158 taps of the measured impulse response shown in Fig. 1.

At each time step where the condition (4) is satisfied, the so-lution (3) is optimal. However, when it is not, we use the filterestimate after the update (12) as an initial condition in a nu-merical gradient optimization algorithm to compute the optimalestimate (7). The resulting values of the cost function were veryclose (not shown). We also compare how well the algorithmsestimate the unknown system. In this case, the RVSS-IPNLMSis run in parallel with the optimal algorithm. As a measure ofperformance, we use the normalized mismatch expressed in dB

(18)

In Fig. 2, we show that both algorithms present an almostidentical mismatch. Although they seem to be close to conver-gence, the curves actually continue going down very slowlyuntil they reach the machine precision in accordance with theresults of Section IV. This supports the use of the approximateupdate that can be satisfactory implemented in practice, whichis not the case of the optimal solution.

IV. MEAN-SQUARE STEADY-STATE BEHAVIOR

We are interested in the mean-square steady-state behaviorof , where is the true time-invariant system.We will assume that the noise sequence is i.i.d., zero-meanand it is independent of the input regressors , which belong toa zero-mean stationary process with a strictly positive-definitecorrelation matrix . These are reasonable andstandard assumptions. From (13) it is easy to show that

(19)

Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

REY VEGA et al.: FAMILY OF ROBUST ALGORITHMS EXPLOITING SPARSITY IN ADAPTIVE FILTERS 575

Fig. 2. Mismatch (in dB). AR1(0.8) input. SBNR � �� dB. � � �. A singlerealization is shown without any averaging.

Multiplying this equation by and using (13), we obtain

(20)Taking the expectation of the squared norm on both sides

(21)

The parameter is typically close to one. This means that (13) isthe result of low-pass filtering . Then the varianceof the random variable would be small enough to assume that

(22)

(23)

Let us analyze the following quantity:

(24)

With the choice of given in (16), it is not difficult to show

(25)

For a large number of inputs of interest, if is large, the quan-tity will be small. In fact, in the important case ofspherically invariant random processes (SIRP) [13], which arerelevant in speech processes applications [14], it can be shownthat

(26)

This implies that the variance of

(27)

will be very small when is sufficiently large. In this case, wecan make the following approximation:

(28)where

(29)

In this way, assuming that is large, we can write (21) as

(30)

Observing that constitutes atelescoping series, setting , and assuming the existenceof

(31)

Let us analyze the asymptotic behavior of . First of all, itis not difficult to show that

(32)

Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

576 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY 2009

where . We define

(33)

(34)

Clearly, if , we have

(35)

Using the ideas developed in [7] it can be shown that

(36)

i.e., and behave asymptotically as . This im-plies that behaves in the same manner when is large.Hence

(37)

At the same time we have

(38)

Therefore

(39)

This means that we can break (31) in two series and using thefact that [see (51)]

(40)Assuming that , it is clearthat

(41)

This is because

(42)

where . We define

(43)

If is large and under certain mixing condition on the input, itcan be assumed that and are Gaussian. This assump-

tion was used and validated through simulations in [15]. Puttingin terms of and

(44)

Thanks to the Gaussianity of and we can use Price’stheorem [16] in order to obtain

(45)where . It is not difficult to show that

(46)

where the expectation is taken with respect to the noise distribu-tion which could be arbitrary. We can also write the following:

(47)

where is the trace operator and is a diagonal matrixdefined as

(48)

Using the standard assumption that the input regressors are in-dependent we can put

(49)

where , and

(50)

It is clear that . The same holds forthe expression in (49). Then it is clear that

(51)

Using these facts and (38) we should have

(52)

It is clear that when and

(53)

for every value of . In fact, if , (53) is equalto , the probability density function of the noise in

Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

REY VEGA et al.: FAMILY OF ROBUST ALGORITHMS EXPLOITING SPARSITY IN ADAPTIVE FILTERS 577

. This means that we should have any of the two followingsituations:

(54)

(55)

It is not difficult to show that either (54) or (55) imply the fol-lowing:

(56)

This is a very interesting result which states that under the hy-potheses taken, after a sufficiently long time and independentlyof and , the adaptive filter converges to the true system ina mean-square sense. It should be noted that this result, whichis valid under the considered assumptions, does not depend onthe existence of moments of any order of the noise distribu-tion. Therefore, this result would be valid for the important casewhere the noise comes from an -stable distribution.

V. PRACTICAL CONSIDERATIONS

We discuss certain algorithm implementation issues. A regu-larization constant is added to the quantity . We chose

.A major issue should be considered carefully. As the

proposed delta sequence goes asymptotically towards zero,although the algorithm becomes more robust against pertur-bations, it also loses its tracking ability. For this reason, thenonstationary control methods introduced in [7] are included,although other schemes might be used. The objective is todetect changes in the true system. See [7] for a detailed de-scription of the parameters and their role. It is clear that asthese nonstationary control methods are less used, the actualasymptotic performance of the algorithm will be closer to thatpredicted in Section IV.

VI. SIMULATION RESULTS

The system is taken from a measured impulse response withlength (Fig. 1). The 90% of its energy is allocatedin 38 taps and the 99.9% in 525 taps. If the adaptive filter setsthe other small coefficients to zero, it will produce a systematicerror of 30 dB, which sets a lower bound on the mismatchperformance, even if the SBNR is higher than 30 dB. This is thekind of problem that the scheme [1] might suffer.

The adaptive filter length is set to in each case. We use themismatch, defined in (18), as a measure of performance. Theplots are the result of a single realization for all the algorithmswithout any additional smoothing (unless otherwise stated).A zero-mean Gaussian white noise is added to the systemoutput so that a certain SBNR, defined in (17), is achieved. Allthe algorithms are regularized with the constant discussed inthe previous section.

The behavior of the RVSS-IPNLMS is compared with otherstrategies. We simulate an NLMS with a value chosen to givethe same steady state as the one in the RVSS-IPNLMS whenno impulsive noise/double talk is present. We also include the

RVSS-NLMS [7] with the same parameters as the proposed al-gorithm. Finally, we simulate a robust version of the IPNLMS(RIPNLMS) based on [4]

(57)

(58)

where is a parameter and is computed as

The scaling factor has an initial value and is never al-lowed to become smaller than . We actually simulate:RIPNLMS2, with and leading to the same steady-stateas the NLMS with when no impulsive noise (or doubletalk) is present; RIPNLMS3, with the same as RIPNLMS2and chosen to give the same steady state as the one in theRVSS-IPNLMS when no impulsive noise (or double talk) ispresent; and RIPNLMS1, with and the value of thatleads to the same steady state as the one in the RVSS-IPNLMSwhen no impulsive noise (or double talk) is present.

We also want to test the nonstationary control. As a measureof its performance we compute for each simulation

(59)

where is the second largest value of ( and are de-fined in [7]). In all the simulations (except in Figs. 7 and 8),a sudden change is introduced at a certain time step by multi-plying the system coefficients by . In these cases, is ac-complished near the time of the sudden change, while is ac-complished at any other time. The value of is related to thethreshold (defined in [7]) while that of gives an idea of thereliability of the detection.

A. System Identification Under Impulsive Noise

The input process is a correlated AR1 with pole in 0.9. In ad-dition to the background noise , the impulsive noise couldalso be added to the output signal . The impulsive noise isgenerated as , where is a Bernoulli process withprobability of success and is a zero-meanGaussian with power . The RVSS-IPNLMS andRVSS-NLMS use the nonstationary control method 1 describedin [7].

In Fig. 3, it can be seen that the RVSS-IPNLMS has the sameinitial speed of convergence as the one of RIPNLMS2 but witha 10-dB lower steady state. The NLMS has the worst speed ofconvergence. To compare the gain with respect to the RVSS-NLMS we look at the number of iterations to reach the steady-state level of the NLMS algorithm with (in this case, it is

41 dB). Then, the RVSS-IPNLMS needs 35% less iterationsthan the RVSS-NLMS. The RIPNLMS1 and RIPNLMS3 havea worse speed of convergence and a very bad tracking ability.In fact, the tracking performance of the proposed algorithm is

Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

578 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY 2009

Fig. 3. Mismatch (in dB). AR1(0.9) input. SBNR � 40 dB. � � ����. Noimpulsive noise. � � �. � � �� . � � ���� � . � � ��.� � ����.� � ����. Steady state: �51 dB (except for the RIPNLMS2).

Fig. 4. Mismatch (in dB). � � ���. The other parameters are the same asin Fig. 3.� � �����.� � ����.

almost the same as the one of the RIPNLMS2 (the best of thetested algorithms).

Now we include impulsive noise with . Fig. 4shows that the NLMS presents a very bad performance. TheRVSS-IPNLMS and the RVSS-NLMS show almost the samebehavior as in Fig. 3, loosing only 1 dB in steady state. TheRIPNLMS1 performs similarly to the RVSS-NLMS beforethe sudden change. The RIPNLMS2 and RIPNLMS3 worsentheir speed of convergence and loose 18 and 3 dB, respectively,compared to the steady state of the RVSS-IPNLMS. Thethree RIPNLMS algorithms recover very poorly to the suddenchange. This shows the advantage of the proposed algorithmsince it does not use the IPNLMS with when an impulseof noise is present or when the mismatch is low enough (andfurther gain can be accomplished).

Fig. 5. Mismatch (in dB). SBNR � 10 dB. The other parameters are thesame as in Fig. 3.� � �. � � . Steady-state: �21 dB (except for theRIPNLMS2).

Fig. 6. Mismatch (in dB). SBNR � 10 dB. � � ���. The other parametersare the same as in Fig. 3.� � �.� � ���.

In Figs. 5 and 6, we repeat the simulations but now witha low SBNR of 10 dB. All the parameters are the same as inthe high SBNR scenarios. When no impulsive noise is present,the RVSS-IPNLMS has 8 dB less in steady state than theRIPNLMS2, while showing the same initial speed of conver-gence and tracking performance. When compared with theRVSS-NLMS, the RVSS-IPNLMS requires 55% less iterationsto reach the 11 dB level. When the impulsive noise is added,the same conclusions as in the high SBNR scenario can bedrawn.

In the four cases studied in Figs. 3–6, the nonstationary con-trol presents a good performance. The RVSS-IPNLMS can re-cover well from a sudden change, with a speed close to the oneof the RIPNLMS2. The values of and appear on the cap-tion of each figure. The sudden change was reliably identifiedas it is revealed by the large values of .

Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

REY VEGA et al.: FAMILY OF ROBUST ALGORITHMS EXPLOITING SPARSITY IN ADAPTIVE FILTERS 579

Fig. 7. Mismatch (in dB). Speech input. SBNR � 25 dB. No double talk.� � �. � � ���. � � ��. � � �. � � � � . � � � � . � � ����.� � ���.� � ����.

B. Acoustic Echo Cancellation With Double-Talk Situations

In echo cancellation applications, a double-talk detector(DTD) is used to suppress adaptation during periods of simul-taneous far- and near-end signals. We use the simple GeigelDTD [17]. The Geigel DTD declares double-talk if

(60)

where are the samples of the far-end signal and are thesamples of the far-end signal filtered by the acoustic impulseresponse and possibly contaminated by a background noise anda near-end signal. An important detail is that in the followingsimulations the filter update is not stopped even when doubletalk is declared. This is to test the robust performance of thealgorithms. When double talk is declared, the RIPNLMS algo-rithm does not perform the adaptation of the scale factor. TheRVSS-IPNLMS and RVSS-NLMS use the nonstationary con-trol method 2 described in [7]. The far-end and near-end signalsare speech sampled at 8 kHz. The parameters of the DTD are

and .In Fig. 7, neither double talk nor a sudden change are

present. It seems strange that in this case the NLMS andRVSS-NLMS show a better steady state than the RIPNLMS3and RVSS-IPNLMS, respectively. We may think that theparameters were chosen wrongly, but it is not the case. Thisproblem appears due to the nature of the speech signal. Al-though not shown here, the input signal presents almost zeroenergy in the low-frequency region of the spectrum. On theother hand, during the initial convergence the IPNLMS-likeupdate is highly nonlinear. Since in this case the filter is notbeing updated in the direction of the input vector, the matrixmight generate an update direction that falls in the nonexcitedregion of the input spectrum. This might in turn generate anerror in the filter estimate from which the algorithm can neverrecovered from. The final effect will be an increase in steady

Fig. 8. ERLE (in dB). Same scenario as in Fig. 7.

state. Of course, this is not in contradiction with the results ofSection IV. This is because the results in that section are validwhen the input is stationary and its correlation matrix is strictlypositive definite which is not the case considered here.

To overcome this issue, we just change the way of analyzingthe algorithms performance. We define the echo return loss en-hancement (ERLE)

(61)

where is a moving average with a span of 8000 samples.This quantity relates the power of the system output with the oneof the a priori error. Since both are related to the input signal,any nonexcited region of the input spectrum will have no impacton the ERLE.

Now we can see in Fig. 8 that the RVSS-IPNLMS andRIPNLMS2 have the fastest speed of convergence. TheRIPNLMS2 has a steady-state ERLE 8 dB worse than theother algorithms and among these ones, the RVSS-IPNLMSconverges 30 s in advance.

In Fig. 9, double talk and a sudden change are included. After8 s of adaptation, the near-end signal appears for a period of3 s. The proportion of detections during the double-talk situationwas 34% while the proportion of false alarms when no double-talk was present was 9.7%. After passing through the DTD, thepower of the near-end signal was reduced about 5.3 times. TheSBNR is 25 dB while the signal to total noise ratio (STNR), i.e.,

is set to 0 dB, where is the power of the near-end signal be-fore passing through the DTD. Clearly, the proposed algorithmshows the best initial convergence, robust behavior during thedouble-talk situation and recovery from the sudden change.

Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

580 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 4, MAY 2009

Fig. 9. ERLE (in dB). STNR � 0 dB. The other parameters are the same as inFig. 7.� � ���. � � ���.

VII. CONCLUSION

In this paper, we introduced a family of algorithms to exploitsparsity in adaptive filters based on a recently introduced newframework for designing robust adaptive filters. The result is avariable step-size version of the natural gradient algorithm. Aparticular choice of leads to the proposed RVSS-IPNLMSalgorithm. We provide a theoretical analysis showing that, undercertain reasonable assumptions, the adaptive filter converges inthe mean square to the real system. When tested in different sce-narios on a system identification setup, it outperformed the otherstrategies no matter if the SBNR was low or high and even in thepresence of impulsive noise. Under an acoustic echo cancella-tion scenario, it behaves robustly during double-talk situationswithout compromising neither the initial speed of convergencenor the tracking ability.

REFERENCES

[1] J. Homer, I. Mareels, and C. Hoang, “Enhanced detection-guidedNLMS estimation of sparse FIR-modeled signal channels,” IEEETrans. Circuits Syst. I, vol. 53, no. 8, pp. 1783–1791, Aug. 2006.

[2] D. L. Duttweiler, “Proportionate normalized least mean square adapta-tion in echo cancelers,” IEEE Trans. Speech Audio Process., vol. 8, no.5, pp. 508–518, Sep. 2000.

[3] R. K. Martin, W. A. Sethares, R. C. Williamson, and C. R. Johnson, Jr,“Exploiting sparsity in adaptive filters,” IEEE Trans. Signal Process.,vol. 50, no. 8, pp. 1883–1894, Aug. 2002.

[4] T. Gänsler, S. L. Gay, M. M. Sondhi, and J. Benesty, “Double-talkrobust fast converging algorithms for network echo cancellation,”IEEE Trans. Speech Audio Process., vol. 8, no. 6, pp. 656–663, Nov.2000.

[5] J. Benesty and S. L. Gay, “An improved PNLMS algorithm,” in Proc.IEEE ICASSP, 2002, pp. 1881–1884.

[6] Y. Huang, J. Benesty, and J. Chen, Acoustic MIMO Signal Pro-cessing. Berlin, Germany: Springer-Verlag, 2006.

[7] L. R. Vega, H. Rey, J. Benesty, and S. Tressens, “A new robust variablestep-size NLMS algorithm,” IEEE Trans. Signal Process., vol. 56, no.5, pp. 1878–1893, May 2008.

[8] R. E. Mahoney and R. C. Williamson, “Prior knowledge and preferen-tial structures in gradient descent learning algorithms,” J. Mach. Learn.Res., vol. 1, pp. 311–355, Sep. 2001.

[9] T. Abrudan, J. Eriksson, and V. Koivunen, “Efficient riemannian algo-rithms for optimization under unitary matrix constraint,” in Proc. IEEEICASSP, 2008, pp. 2353–2356.

[10] S. Lucidi, L. Palagi, and M. Roma, “On some properties of quadraticprograms with a convex quadratic constraint,” SIAM J. Optim., vol. 8,pp. 105–122, 1998.

[11] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge,U.K.: Cambridge Univ. Press, 2004.

[12] S. Amari, “Natural gradient works efficiently in learning,” NeuralComput., vol. 10, pp. 251–276, Feb. 1998.

[13] K. Yao, “A representation theorem and its applications to spheri-cally-invariant random processes,” IEEE Trans. Trans. Inf. Theory,vol. IT-19, no. 5, pp. 600–608, Sep. 1973.

[14] H. Brehm and W. Stammler, “Description and generation of sphericallyinvariant speech-model signals,” Signal Process., vol. 12, pp. 119–141,Mar. 1987.

[15] T. Al-Naffouri and A. Sayed, “Transient analysis of adaptive filterswith error nonlinearities,” IEEE Trans. Signal Process., vol. 51, no. 3,pp. 653–663, Mar. 2003.

[16] R. Price, “A useful theorem for nonlinear devices having Gaussianinputs,” IRE Trans. Inf. Theory, vol. IT-4, no. 2, pp. 69–72, Jun.1958.

[17] D. L. Duttweiler, “A twelve channel digital echo canceler,” IEEE Trans.Commun., vol. COM-26, no. 5, pp. 647–653, May 1978.

Leonardo Rey Vega (S’08) was born in BuenosAires, Argentina, in 1979. He received the B.Eng.degree in electronic engineering from University ofBuenos Aires in 2004, where he is currently pursuingthe Ph.D. degree.

Since 2004, he has been a Research Assistant withthe Department of Electronics, University of BuenosAires. His research interests include adaptive filteringtheory and statistical signal processing.

Hernán Rey was born in Buenos Aires, Argentina,in 1978. He received the B.Eng. degree in electronicengineering from the University of Buenos Aires in2002 and the Ph.D. degree from the University ofBuenos Aires in 2009.

Since 2002, he has been a Research Assistantwith the Department of Electronics, University ofBuenos Aires. He is currently a Postdoctoral Re-searcher at the Institute of Biomedical Engineering,University of Buenos Aires. His research interestsinclude adaptive filtering theory, neural networks,

and computational neuroscience.

Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

REY VEGA et al.: FAMILY OF ROBUST ALGORITHMS EXPLOITING SPARSITY IN ADAPTIVE FILTERS 581

Jacob Benesty (M’98–SM’04) was born in 1963. Hereceived the M.S. degree in microwaves from Pierre& Marie Curie University, France, in 1987, and thePh.D. degree in control and signal processing fromOrsay University, Orsay, France, in April 1991.

During the Ph.D. degree (from November 1989to April 1991), he worked on adaptive filters andfast algorithms at the Centre National d’Etudes desTelecommunications (CNET), Paris, France. FromJanuary 1994 to July 1995, he worked at TelecomParis University on multichannel adaptive filters and

acoustic echo cancellation. From October 1995 to May 2003, he was first aConsultant and then a Member of the Technical Staff at Bell Laboratories,Murray Hill, NJ. In May 2003, he joined the University of Quebec, INRS-EMT,Montreal, QC, Canada, as an Associate Professor. His research interests arein signal processing, acoustic signal processing, and multimedia communi-cations. He was a member of the editorial board of the EURASIP Journalon Applied Signal Processing and was the cochair of the 1999 InternationalWorkshop on Acoustic Echo and Noise Control. He coauthored the booksAcoustic MIMO Signal Processing (Springer-Verlag, 2006) and Advancesin Network and Acoustic Echo Cancellation (Springer-Verlag, 2001). He isalso a coeditor/coauthor of the books Speech Enhancement (Spinger-Verlag,2005), Audio Signal Processing for Next Generation Multimedia Communi-cation Systems (Kluwer, 2004), Adaptive Signal Processing: Applications toReal-World Problems (Springer-Verlag, 2003), and coustic Signal Processingfor Telecommunication” (Kluwer, 2000).

Dr. Benesty received the 2001 Best Paper Award from the IEEE Signal Pro-cessing Society.

Sara Tressens was born in Buenos Aires, Argentina.She received the degree of Electrical Engineeringfrom the University of Buenos Aires in 1967.

From 1967 to 1982, she was with the Instituteof Biomedical Engineering, University of BuenosAires, where she became Assistant Professor in1977 and worked in the area of speech recognition,digital communication, and adaptive filtering. From1980 to 1981, she worked in the Laboratoire desSignaux et Systemes, Gif-sur Yvette, France. From1982 to 1993, she was with the National Research

Center in Telecommunications (CNET), Issy les Moulineaux, France. Herresearch interests were in the areas of spectral estimation and spatial arrayprocessing. Since 1994, she has been an Associated Professor at the Universityde Buenos Aires. Her primary research interests include adaptive filtering,communications, and spectral analysis.

Ms. Tressens received the 1990 Best Paper Award from the IEEE Signal Pro-cessing Society.

Authorized licensed use limited to: Inst Natl de la Recherche Scientific EMT. Downloaded on March 18, 2009 at 08:10 from IEEE Xplore. Restrictions apply.