Stereo acoustical echo cancellation based on common poles

6
STEREO ACOUSTICAL ECHO CANCELLATION BASED ON COMMON POLES Gabriele Bunkheila, Michele Scarpiniti, Raffaele Parisi and Aurelio Uncini Infocom Dpt. - Dipartimento di Scienza e Tecnica dell’Informazione e della Comunicazione Universit` a di Roma “La Sapienza” - via Eudossiana, 18 - 00184 - Rome, Italy. ABSTRACT Stereo acoustical echo cancellation is a highly challenging application in the field of acoustical signal processing. Un- like the single-channel case, conflicting requirements on the adaptive filters make this problem ill-posed in its original for- mulation. In this contribution, it is shown how introducing an estimate of the Common Acoustical Poles of the receiving room can lead the adaptive system to remarkable performance improvements with respect to the classical implementation. Finally, a detailed comparison between the two schemes is presented, based on a simulated acoustical environment and the use of the classical affine projection algorithm. Index TermsAcoustic signal processing, Echo sup- pression, Common poles, ARMA models, Adaptive filtering. 1. INTRODUCTION Stereo Acoustical Echo Cancellation (SAEC) is a core issue for hands-free teleconferencing system using a pair of audio channels [1] and a representative application in the more gen- eral area of supervised multichannel adaptive estimation of Room Transfer Functions (RTFs). In the single-channel case, the length of the adaptive filter and the peakiness of the spec- trum of the input signal are the only important critical factors undermining the performance of the system [2]. When SAEC is considered, more critical effects have negative impact on the adaptive behavior of the filter, by worsening the condi- tioning of the stereo autocorrelation matrix; these include the mutual statistical dependence of the two input channels, hav- ing a direct impact on the autocorrelation matrix and also im- pacting the so-called “misalignment effect” [1]. As will be reviewed, unlike the single-channel case these issue are due to the fact that SAEC is intrinsically ill-posed, and are hardly overcome by traditional methods. A novel SAEC architecture based on Common Acoustical Poles (CAP) [3] is proposed, which implicitly models the RTFs with Auto-Regressive and Moving-Average (ARMA) filters sharing the same z-domain denominator; the latter is estimated priorly and its roots (i.e. the CAP) are supposed to account for the general resonant This work has been partially supported by the Italian National Project: Wireless multiplatfOrm mimo active access netwoRks for QoS-demanding muLtimedia Delivery (WORLD), under grant number 2007R989S. properties of the room itself. While potentially employing the same adaptive algorithms, this approach can introduce impor- tant simplifications with respect to the traditional methods, since less MA coefficients need to be estimated in real time to attain a given degree of accuracy. Order selection is believed to be a topic problem for CAP estimation, hence a recently proposed method [4] was used to identify the correct order of the priorly-estimated common denominator. The proposed architecture is compared to the classical SAEC formulation, with the use of the Affine Projection Algorithm (APA) [5]. This yields an important reduction of the length of the adap- tive filters, which in turns allows lowering the condition num- ber of the stereo autocorrelation matrix, so improving the con- vergence rate. The paper is organized as follows. Section 2.1 briefly re- calls the main issues related to the classical formulation of SAEC; section 2.2 gives an overview of RTF modeling based on CAP and their estimation. In section 3, the proposed archi- tecture is presented, while section 4 describes the simulation set-up. The results are dealt with in section 5, followed by a short conclusion. 2. BACKGROUND 2.1. The classical Stereo Acoustical Echo Canceler A typical stereo echo canceler is depicted in Figure 1. With- out lack of generality, just one microphone is considered in the following for the receiving room. Let g i (L G × 1) be the vector of the significant coefficients of the impulse response linking the speaker to the i-th microphone in the transmission room, while h i (L H × 1) similarly relates the i-th loudspeaker to the considered microphone in the receiving room. Finally, call ˆ h i (L FIR × 1) the vector relative to the corresponding i-th adaptive FIR filter 1 . The goal is that ˆ h 1 and ˆ h 2 tend adaptively to h 1 and h 2 , respectively, based on some statis- tical minimization of the norm of the error signal, e.g. kek 2 2 , so that the speaker in the transmission room is not reached by the echo of his voice through the backwards channel. 1 Here and in the following, the hat symbol denotes an estimate of the physical counterpart. 978-1-4244-3298-1/09/$25.00 ©2009 IEEE DSP 2009

Transcript of Stereo acoustical echo cancellation based on common poles

STEREO ACOUSTICAL ECHO CANCELLATION BASED ON COMMON POLES

Gabriele Bunkheila, Michele Scarpiniti, Raffaele Parisi and Aurelio Uncini

Infocom Dpt. - Dipartimento di Scienza e Tecnica dell’Informazione e della ComunicazioneUniversita di Roma “La Sapienza” - via Eudossiana, 18 - 00184 - Rome, Italy.

ABSTRACT

Stereo acoustical echo cancellation is a highly challengingapplication in the field of acoustical signal processing. Un-like the single-channel case, conflicting requirements on theadaptive filters make this problem ill-posed in its original for-mulation. In this contribution, it is shown how introducingan estimate of the Common Acoustical Poles of the receivingroom can lead the adaptive system to remarkable performanceimprovements with respect to the classical implementation.Finally, a detailed comparison between the two schemes ispresented, based on a simulated acoustical environment andthe use of the classical affine projection algorithm.

Index Terms— Acoustic signal processing, Echo sup-pression, Common poles, ARMA models, Adaptive filtering.

1. INTRODUCTION

Stereo Acoustical Echo Cancellation (SAEC) is a core issuefor hands-free teleconferencing system using a pair of audiochannels [1] and a representative application in the more gen-eral area of supervised multichannel adaptive estimation ofRoom Transfer Functions (RTFs). In the single-channel case,the length of the adaptive filter and the peakiness of the spec-trum of the input signal are the only important critical factorsundermining the performance of the system [2]. When SAECis considered, more critical effects have negative impact onthe adaptive behavior of the filter, by worsening the condi-tioning of the stereo autocorrelation matrix; these include themutual statistical dependence of the two input channels, hav-ing a direct impact on the autocorrelation matrix and also im-pacting the so-called “misalignment effect” [1]. As will bereviewed, unlike the single-channel case these issue are dueto the fact that SAEC is intrinsically ill-posed, and are hardlyovercome by traditional methods. A novel SAEC architecturebased on Common Acoustical Poles (CAP) [3] is proposed,which implicitly models the RTFs with Auto-Regressive andMoving-Average (ARMA) filters sharing the same z-domaindenominator; the latter is estimated priorly and its roots (i.e.the CAP) are supposed to account for the general resonant

This work has been partially supported by the Italian National Project:Wireless multiplatfOrm mimo active access netwoRks for QoS-demandingmuLtimedia Delivery (WORLD), under grant number 2007R989S.

properties of the room itself. While potentially employing thesame adaptive algorithms, this approach can introduce impor-tant simplifications with respect to the traditional methods,since less MA coefficients need to be estimated in real time toattain a given degree of accuracy. Order selection is believedto be a topic problem for CAP estimation, hence a recentlyproposed method [4] was used to identify the correct orderof the priorly-estimated common denominator. The proposedarchitecture is compared to the classical SAEC formulation,with the use of the Affine Projection Algorithm (APA) [5].This yields an important reduction of the length of the adap-tive filters, which in turns allows lowering the condition num-ber of the stereo autocorrelation matrix, so improving the con-vergence rate.

The paper is organized as follows. Section 2.1 briefly re-calls the main issues related to the classical formulation ofSAEC; section 2.2 gives an overview of RTF modeling basedon CAP and their estimation. In section 3, the proposed archi-tecture is presented, while section 4 describes the simulationset-up. The results are dealt with in section 5, followed by ashort conclusion.

2. BACKGROUND

2.1. The classical Stereo Acoustical Echo Canceler

A typical stereo echo canceler is depicted in Figure 1. With-out lack of generality, just one microphone is considered inthe following for the receiving room. Let gi(LG × 1) be thevector of the significant coefficients of the impulse responselinking the speaker to the i-th microphone in the transmissionroom, while hi(LH×1) similarly relates the i-th loudspeakerto the considered microphone in the receiving room. Finally,call hi(LFIR × 1) the vector relative to the correspondingi-th adaptive FIR filter1. The goal is that h1 and h2 tendadaptively to h1 and h2, respectively, based on some statis-tical minimization of the norm of the error signal, e.g. ‖e‖22,so that the speaker in the transmission room is not reached bythe echo of his voice through the backwards channel.

1Here and in the following, the hat symbol denotes an estimate of thephysical counterpart.

978-1-4244-3298-1/09/$25.00 ©2009 IEEE � � � � � � DSP 2009

It has been shown [1] that the convergence of h1 and h2

to the optimal solutions is complicated by several factors, allmainly due to the mutual statistical dependence of the incom-ing signals x1 and x2. Even though a detailed explanationof the problem is beyond the scope of this paper (the readermay refer to [1] for a thorough discussion), an overview ofthe main issues is believed to be important to understand theadvantages introduced by the proposed architecture. Define

Fig. 1. The classical Stereo Acoustical Echo Canceler(SAEC).

xi[n] = [xi[n], xi[n − 1], . . . , xi[n − LFIR + 1]]T , withxi[n] = gi[n]⊗s[n], the symbol⊗ denoting the linear convo-lution and n the time sample index. xi[n] is the signal com-ing from the i-th transmitting microphone, and s[n] the sig-nal produced by the speaker in the transmitting room. Nowcall x[n] = [x1[n]T ,x2[n]T ]T the stereo input vector, andR = E{x ·xT } (2LFIR×2LFIR) the stereo autocorrelationmatrix, with E denoting the statistical expectation operator.The stereo adaptive filter h = [hT1 , h

T2 ]T admits a unique so-

lution, optimal in the Wiener sense, only if R is non-singular.When this requirement is satisfied, the condition number ofR [6] controls the dynamic behavior of the adaptive filter h(e.g. the convergence rate). In can be shown that the conditionnumber of R is strongly dependent on the cross-coherence be-tween x1 and x2. As the latter are both linear functions of thesource signal s through g1 and g2, respectively, this results inR being ill-conditioned unless LFIR � LG [7]. On the otherhand, calling LH the effective length of the two RTFs h1 andh2 respectively, it is requested that the effective length of theimpulse responses of the adaptive filters, LFIR matches rea-sonably well the length LH . If not, the Wiener solution for hwill only be approximated and non-unique, i.e. dependent onthe transmitting RTFs g1 and g2. This means that the systemwould have to re-adapt each time the speaker moves, whichis highly undesirable. Now define the so-called stereo mis-alignment between the target stereo acoustical channel andthe stereo adaptive filter:

µS [n] = ‖h[n]− h‖2/‖h‖2 (1)

where the shortest between h and h is intended to be firstzero-padded in order make the difference computable, and

the time index n plays the role of the adaptation epoch. Ithas been shown that, even with well behaving other conven-tional AEC quality parameters, such as the Mean Square Er-ror (MSE) or the Echo Return Loss Enhancement (ERLE), µSdoes not converge to its minimum unless LFIR ≈ LH [1]. Inaddition, it has been shown that a bad conditioning of R canrender this misalignment problem more pronounced [1].

As no prior knowledge is generally available about thetransmitting and the receiving rooms, the two may also hap-pen to be similar; in this case LG and LH have the same orderof magnitude, causing the two constraints on LFIR to be im-possible to satisfy. To overcome this issue, the incoming sig-nals should be pre-processed, to artificially reduce their crosscoherence, so allowing the use of a large LFIR, regardless ofLG. Of course this should not introduce important alterationsin the signals, nor in their embedded spatial information. Apossible choice is the following non-linear pre-processing ofboth x1 and x2

x′i = xi + α · xi − (−1)i|xi|2

(2)

which was used in this work with i = 1, 2 and α = 0.5; theseexpressions were proposed in [8], and the choice for α fallsin the interval that was there described as not significantly af-fecting the spatial characterization of the stereo signal. Withthis or other acceptable types of pre-processing, the stereoadaptive filter h does converge towards a unique solution, butR still suffers from bad conditioning due to the large valuesneeded for LFIR to approach LH .

In summary, the core technical challenges of SAEC add afurther layer of complexity to the mono AEC problem: R canbe bad- or ill-conditioned, even with reasonably flat-spectruminput signals, due to the mutual statistical dependence of x1

and x2. Generally speaking, this can lead to very poor per-formance, even with robust and computationally expansiveadaptive algorithms, such as RLS or high-order APA.

2.2. CAP-based Modeling of Room Transfer Functions

CAP-based modeling of all the RTFs relative to the sameroom (e.g. the receiving room in the SAEC scheme of fig-ure 1) assumes that any hi[n] may be expressed, in the z-domain, under the form Hi(z) = Bi(z)/A(z). The rootsof the common denominator A(z) = 1 + a1z

−1 + . . . +aLA

z−LA are assumed to account for the resonant proper-ties of the room itself, and can be estimated a priori. Todo this, NH impulse responses are measured in the targetroom and, provided a choice for the orders LA and LB , avector a = [1, a1, . . . , aLA

] is estimated, along with bi =[b(i)0 , . . . , b

(i)LB−1] for i = 1, . . . , NH . A possible way is the

least-squares method introduced in [3], based on the mini-

mization of

µ =NH∑i=0

‖hi(a, bi)− hi‖22 (3)

where hi(a, bi) = [ hi[0; a, bi], . . . , hi[LH − 1; a, bi] ]T ,hi = [ hi[0], . . . , hi[LH − 1] ]T , hi[n; a, bi] is the inversez-transform of Bi(z)/A(z) and hi[n] is considered negligiblefor n ≥ LH and ∀i = 1, . . . , NH .

In the past it has been reasonably pointed out [3] thatan initial guess on the common AR order LA should be de-rived from the number of resonating modes in the room up tothe maximum considered frequency fB , subject to a subse-quent (and substantial) reduction, in order to relax the com-putational requirements; normally, it may be assumed fB =FS/2, FS being the sampling frequency. The authors of thiscontribution believe that a correct selection of the order LA isa crucial step for the effectiveness of CAP-based modeling.

As recalled in [4] human-compatible acoustical systems,such as regular rooms, need to be accounted of as distributedsystems, and they are ruled by the Helmholtz equation [9].In this context, ARMA models of RTF with a common de-nominator are theoretically justified, but the actual number ofpoles may be very large, with a density along the frequencyaxis being proportional to f2 [9]. The actually available al-gorithms for CAP estimation (e.g. the least-squares method)suffer from high orders of the denominator and high pole den-sity, leading to mis- and over-estimation of CAP for large LA.On the other hand, it is highly desirable to use the highest pos-sible value of LA not leading to estimation problems, to takethe maximum advantage out of the CAP-based modeling ar-chitecture. For these reasons, here the method introduced in[4] was used to select a proper value for LA (and LB), basedon the request of a worst-case modeling error of−40 dB. Thereader may refer to [4] for all the necessary details about thisnovel method, which will be briefly recalled in Section 4.

3. THE CAP-BASED STEREO ECHO CANCELER

The use of CAP in echo cancellation assumes that any RTF inthe receiving room is well described by means of an ARMAmodel, and an estimate A(z) of the denominator is knownand available. If the input xi[n] to each adaptive filter is pre-processed with a filtering block C = 1/A(z), the adaptive fil-ter is only required to estimate the remaining MA part Bi(z),i.e. the vector of coefficients bi(z)(LB × 1). With a fixedtarget accuracy, and no relevant CAP mis- or over-estimation,LB is expected to be significantly smaller than LFIR, so lead-ing to a reduced computational load. More interestingly, thecondition LB < LG may be satisfied more easily: this leadsto a lower cross-coherence of the input signals, which in turnresults in a better conditioning of R. On the other hand, theexposure to misalignment problems (see Section 2.1) is nofurther to be related to the difference between LH and LB ,

sinceBi(z)/A(z) has an infinite-length impulse response. In-stead, if LA and LB are selected properly, the estimates ofthe impulse responses of the filtering systems can be a goodmatch to h1[n] and h2[n], regardless of the reduced length ofthe adaptive MA filters.

In Section 2.2, it was pointed out that actually availablemethods for CAP estimation only allow using values of LAthat are much lower than the theoretical orders; this is espe-cially true for fB well above the first multiples of the firstresonant frequency of the room. The use of CAP could thenbe regarded as less attractive, since the relative reduction ofadaptive coefficients may decrease with growing fB . Theseconsiderations can be much less limiting in the SAEC con-text: based on perceptual considerations, hybrid mono/stereosubband echo cancelers have been proposed [10] where justthe low frequency part of the signal spectrum is involved inthe stereo adaptive algorithm, with higher frequencies essen-tially being taken care of by a monaural AEC.

Fig. 2. The alternative CAP pre-filtering.

Given this idea, attention should be paid about pre-filtering x1[n] and x2[n] with 1/A(z), as this may increasethe ratio between the extreme values of the spectra of theinput signals, so contributing to raising the condition num-ber of R. A possible workaround was proposed in [3] fora single-channel AEC. If a new filtering block C(z) wereintroduced as described in Figure 2 andHi(z) = Bi(z)/A(z)is assumed, imposing E(z) = 0 for any input signals yieldsthe following result after a trivial algebra:

C(z) = 1− A(z) = −a1z−1 − . . .− aLA

z−LA (4)

As a consequence, CAP-based pre-filtering can be performedby means of a fixed FIR structure processing the signal y[n]coming from the microphone, which is shared by the pair oftarget channels and does not introduce any modification onthe inputs of the adaptive filters.

4. SIMULATION SET-UP

A sketch of the used set-up can be found in Figure 3. Thisincludes a 2× 2× 3 m3 room with five microphones at (y =0.5, z = 1) m and four speakers at (y = 1.5, z = 2) m. The

same room model was chosen for the transmission section,using only two microphones and one single source in a dif-ferent configuration. The impulse responses were simulated

Fig. 3. The used set-up: four evenly spaced omni-directional loudspeakers (squares) and five omni-directionalmicrophones (circles) at x = (0, 0.13, 0.25, 0.67, 1.125) mrespectively. Microphone number 5 is not involved in CAPestimation.

with a freely available software based on the image-sourcemethod [11], with a sampling frequency of 8 kHz. These im-pulse responses, typically showing T60 ' 100 ms, were thendecimated by the five different factors 4, 6, 8, 12, 16, allowingto carry out the tests at five different bandwidths, respectivelyfB =1000, 667, 500, 333 and 250 Hz. The coefficients ofthe common z-domain denominator a were estimated on thebasis of the NH = 16 impulse responses resulting from cou-pling only the first four microphones (full black in Figure 3)with the four loudspeakers in the receiving room, and usingthe least-squares method in [3].

To select a correct number of poles to look for during CAPestimation, the following method has been used. Different es-timation procedures have been carried out for varying LA,i.e. the order of CAP denominator. At each stepNH resultingindividual normalized modeling accuracies have been calcu-lated as follows:

µk = ‖hk − hk(a, bk)‖22/‖hk‖22 k = 1, . . . , NH (5)

Now call µmin and µmax, respectively the minimum and max-imum values of (5) [4]. For growing LA and for any valueof fB , two clear trends have been spotted: µmin tends to zeromore or less monotonically, while µmax tends to a limit value,say µmax. This fact has been used to avoid CAP overmod-eling: it can be assumed that the last increment of LA stillintroducing a significant lowering of µmax leads to estimatethe last set of poles shared at least by all the NH consideredRTFs. Call the reached value LA; all the poles estimated forLA > LA do not introduce any modeling gain for at leastone of the considered RTF, suggesting that they are not com-mon and that overmodeling is occurring, just as mentioned in

section 2.2. For the simulations of this work, LA has beenconventionally chosen such that

µmax(LA) = µmax + 1 dB

On the other hand, and not surprisingly, µmax is a decreas-ing monotonic function of LB ; the chosen value LB was thesmallest leading to µmax < −40 dB.

The proposed method is employed to select the two ordersLA and LB for each considered bandwidth. These values arethen used to test the proposed SAEC (Figure 2) against theclassical formulation (Figure 1). The RTFs coupling respec-tively the second and the third loudspeakers (rounded in Fig-ure 3) with the fifth microphone (white in Figure 3) have beenused for h1 and h2, respectively; during simulation, theseimpulse responses were truncated at T60. The pre-estimatedvector a was loaded in the filtering block C(z), as describedby (4), while the two adaptive filters b1 and b2 were givena length LB . The actual length LFIR used for the two FIRfilters h1 and h2 of the classical SAEC scheme has been se-lected in order to render the asymptotic value of the stereomisalignment, µS [∞] (corresponding to (1) for the adapta-tion epoch n→∞) approximately the same in the two cases.As the main issues of SAEC arise from factors other than thepeakiness of the spectrum of the source signal, white gaus-sian noise was used here for the latter, to provide an easilyreproducible test harness for the comparison of the two archi-tectures. Finally, to make comparison easier and independentfrom other factors, no further uncorrelated noise was addedat the microphones in either the transmitting or the receivingroom.

5. SIMULATION RESULTS

The basic APA with order P = 8 [5], has been chosen tocarry out the performance comparisons outlined in the previ-ous sections. For each decimation factor the stereo misalign-ment µS [n] in (1) has been evaluated for both the proposedand the existing architectures. The length LFIR of the adap-tive filters in the classical formulation was selected in order toattain the same value of µS [∞] in the two cases, where µS [∞]denotes the asymptotic value of the stereo misalignment de-fined in (1), i.e. considered for an adaptation epoch well be-yond the typical time of convergence. The attention has fo-cused on two main aspects: the filter length reduction, whichyields algorithm-dependent computational savings, and thereduction of the time of convergence nτ , defined indirectlyand empirically as

nτ : µS [nτ ] = µS [∞] + 3 dB

As examples, Figures 4 - 7 depict the misalignment behav-ior over adaptation time of both SEAC schemes, for thedecimation factors 4, 6, 8 an 16 respectively (i.e. fB =

{1000, 667, 500, 250} Hz). A summary of the numerical re-sults can be found in Table 1. As can be noted by comparingthe columns LB and LFIR, with the same adaptive algorithmthe reduction of adaptive coefficients introduced by the useCAP is apparent. Interestingly and quite clearly, this tendsto decrease with increasing fB . After the considerations ofSection 2.2, it is believed that this is due to the progressivepotential loss of accuracy of CAP-based modeling of resonantmodes with growing fB . The decrease of µS [∞] for larger fBis consistent with what has just been stated: LA and LB havebeen obtained by requiring µmax < −40 dB. Summarizing,in the considered case remarkable results were obtained byusing CAPs in SAEC: using the proposed architecture, theconvergence is reached in about 1/5 of the original time,with something more than only 1/3 of the original number ofcoefficients, on average.

Fig. 4. Misalignment comparison for fB = 1000 Hz: Clas-sical (thin) and CAP-based (thick) SAEC. The dashed linesindicate the two values µS [nτ ] and µS [∞], respectively.

Fig. 5. Misalignment comparison for fB = 667 Hz: Clas-sical (thin) and CAP-based (thick) SAEC. The dashed linesindicate the two values µS [nτ ] and µS [∞], respectively.

Fig. 6. Misalignment comparison for fB = 500 Hz: Clas-sical (thin) and CAP-based (thick) SAEC. The dashed linesindicate the two values µS [nτ ] and µS [∞], respectively.

Fig. 7. Misalignment comparison for fB = 250 Hz: Clas-sical (thin) and CAP-based (thick) SAEC. The dashed linesindicate the two values µS [nτ ] and µS [∞], respectively.

Table 1. Stereo AEC, APA, 8th order, with and withoutCAP, with equal asymptotic misalignment.

fB (Hz) LA LB LFIR µS [∞] nτ ratio

250 28 18 49 -23 dB 4%333 34 26 64 -23 dB 18%500 49 40 91 -21 dB 19%667 65 57 151 -33 dB 21%

1000 85 89 212 -31 dB 34%

6. CONCLUSION

A new architecture has been proposed for SAEC to addressthe specific issues undermining its traditional formulation,

which uses CAP-based modeling of RTFs. Based on a pre-vious work, special attention has been paid to the order ofthe common z-domain denominator. This allowed reducingthe length of the adaptive filters without affecting the optimalmodeling accuracy, so lowering both the size of the autocor-relation matrix of the filters inputs, and, more interestingly,its condition number. Even with relatively low orders result-ing from a reduced-band approach, it was noted that the useof CAP may be extremely relevant in the context of hybridmono/stereo subband AEC architectures. The presented re-sults showed the effectiveness of the proposed solution, inboth the reduction of the number of adaptive coefficients andthe increase of the speed of convergence.

7. REFERENCES

[1] J. Benesty, D. Morgan, and M. Sondhi, “A better under-standing and an improved solution to the specific prob-lems of stereophonic acoustic echo cancellation,” IEEETransactions on Speech and Audio Processing, vol. 6,no. 2, pp. 156–165, March 1998.

[2] S. Haykin, Adaptive Filter Theory, 4-th edition, PrenticeHall, 2001.

[3] Y. Haneda, S. Makino, and Y. Kaneda, “Common acous-tical pole and zero modeling of room transfer functions,”IEEE Transactions on Speech and Audio Processing,vol. 2, no. 2, pp. 320 – 328, April 1994.

[4] G. Bunkheila, R. Parisi, and A. Uncini, “Model order se-lection for estimation of common acoustical poles,” inIEEE International Symposium on Circuits and Systems(ISCAS), pp. 1180 – 1183, May 2008.

[5] S. Gay, and S. Tavathia, “The fast affine projection algo-rithm,” International Conference on Acoustics, Speech,and Signal Processing (ICASSP-95), vol. 5, pp. 3023–3026, May 1995.

[6] G. Golub, and C. F. Van Loan, Matrix Computation, 3-rdedition, The Johns Hopkins University Press, 1996.

[7] T. Gansler and J. Benesty, “New insights into the stereo-phonic acoustic echo cancellation problem and an adap-tive nonlinearity solution,” IEEE Transactions on Speechand Audio Processing, vol. 10, no. 5, pp. 257–267, July2002.

[8] D. Morgan, J. Hall, and J. Benesty, “Investigation of sev-eral types of nonlinearities for use in stereo acoustic echocancellation,” IEEE Transactions on Speech and AudioProcessing, vol. 9, no. 6, pp. 686–696, Sept. 2001.

[9] H. Kuttruff, Room Acoustics, 4-th edition, Spon Press,1997.

[10] J. Benesty, T. Gansler, and P. Eneroth, “Multi-ChannelSound, Acoustic Echo Cancellation, and Multi-ChannelTime-Domain Adaptive Filtering,” in S. L. Gay andJ. Benesty, Eds., Acoustic Signal Processing for Telecom-munication, pp. 101–120, Kluwer Academic Publishers,2000.

[11] D. Campbell, K. Palomaki, and G. Brown, “A mat-lab simulation of shoebox room acoustics for use in re-search and teaching,” Computing and Information Sys-tems, vol. 9, no. 3, p. 48, 2005. [Online]. Available:http://www.dcs.shef.ac.uk/∼guy/pdf/campbell.pdf