A Pre-Filtering and Post-Filtering Approach to Blind Source Separation

10
A Geometrically Constrained ICA Algorithm for Blind Separation in Convolutive Environments Michele SCARPINITI 1 , Francesco DI PALMA, Raffaele PARISI and Aurelio UNCINI Infocom Department, “Sapienza” University of Rome Abstract. In this paper a blind source separation algorithm in convolutive environ- ment is presented. In order to avoid the classical permutation ambiguity in the fre- quency domain solution, a geometrical constraint is considered. Moreover a beam- former algorithm is integrated with the proposed solution: in this way the directiv- ity pattern of the proposed architecture can take into account the residual permuta- tion at low frequencies and the scaling inconsistency. Several experimental results are shown to demonstrate the effectiveness of the proposed method. Keywords. Blind Source Separation, Convolutive environment, Frequency domain algorithms, Geometrical constraints, FastICA. Introduction An increasingly interest on Blind Source Separation (BSS) has arisen from signal pro- cessing researchers in the last fifteen years and a huge number of works were published [4,6]. A particular and emerging field of applications for signal processing is hands-free communication particularly useful in a lot of practical situations, such as a teleconfer- ence or a vehicle environment. In these applications a speech enhancement is necessary, because usually several sources are captured. In this way BSS is a promising technique for speech enhancement in adverse environments, like highly reverberant rooms. While there are a lot of studies in the linear and instantaneous case, much poor is the range of works in convolutive environment. To achieve BSS of convolutive mixtures, several methods have been proposed [2,6]. In order to solve the BSS problem in a reason- able amount of time the problem is transformed into the frequency domain: the algorithm solves an instantaneous BSS problem for every frequency simultaneously [11,21]. Unfortunately in frequency domain two trivial ambiguities occur that could be par- ticular troublesome [9,10]. The permutation ambiguity is particularly tiresome: when converting signal to time domain, contributions from different sources will appear into a single channel, thus destroying the separation achieved in the frequency domain; in addition the scaling indeterminacy at each frequency bin will result in an overall filtering of the sources. Different solutions to these problems can be found in literature [12,16]. 1 Corresponding Author: Infocom Department, “Sapienza” University of Rome, via Eudossiana 18, 00184 Roma; Tel. +39 0644585869; e-mail: [email protected].

Transcript of A Pre-Filtering and Post-Filtering Approach to Blind Source Separation

A Geometrically Constrained ICAAlgorithm for Blind Separation in

Convolutive Environments

Michele SCARPINITI1, Francesco DI PALMA, Raffaele PARISI and Aurelio UNCINI

Infocom Department, “Sapienza” University of Rome

Abstract. In this paper a blind source separation algorithm in convolutive environ-ment is presented. In order to avoid the classical permutationambiguity in the fre-quency domain solution, a geometrical constraint is considered. Moreover a beam-former algorithm is integrated with the proposed solution: in this way the directiv-ity pattern of the proposed architecture can take into account the residual permuta-tion at low frequencies and the scaling inconsistency. Several experimental resultsare shown to demonstrate the effectiveness of the proposed method.

Keywords. Blind Source Separation, Convolutive environment, Frequency domainalgorithms, Geometrical constraints, FastICA.

Introduction

An increasingly interest on Blind Source Separation (BSS) has arisen from signal pro-cessing researchers in the last fifteen years and a huge number of works were published[4,6]. A particular and emerging field of applications for signal processing is hands-freecommunication particularly useful in a lot of practical situations, such as a teleconfer-ence or a vehicle environment. In these applications a speech enhancement is necessary,because usually several sources are captured. In this way BSS is a promising techniquefor speech enhancement in adverse environments, like highly reverberant rooms.

While there are a lot of studies in the linear and instantaneous case, much poor isthe range of works in convolutive environment. To achieve BSS of convolutive mixtures,several methods have been proposed [2,6]. In order to solve the BSS problem in a reason-able amount of time the problem is transformed into the frequency domain: the algorithmsolves an instantaneous BSS problem for every frequency simultaneously [11,21].

Unfortunately in frequency domain two trivial ambiguitiesoccur that could be par-ticular troublesome [9,10]. The permutation ambiguity is particularly tiresome: whenconverting signal to time domain, contributions from different sources will appear intoa single channel, thus destroying the separation achieved in the frequency domain; inaddition the scaling indeterminacy at each frequency bin will result in an overall filteringof the sources. Different solutions to these problems can befound in literature [12,16].

1Corresponding Author: Infocom Department, “Sapienza” University of Rome, via Eudossiana 18, 00184Roma; Tel. +39 0644585869; e-mail: [email protected].

In order to solve these indeterminacies a geometrical constraint is introduced in thispaper. The proposed solution is based on two steps: first of all a geometrically con-strained algorithm, introduced and fully explained in [13], is adopted to lead the solu-tion given by the standard FastICA algorithm towards the true solution; subsequently thescaling ambiguity an the residual permutation indeterminacy are solved by combiningthe algorithm with a beamforming solution, like proposed in[18]. This paper utilizes theproposed algorithm to enhance the speech signals captured from a microphone array.

The paper is organized as follows: section 1 introduces the BSS problem in theconvolutive environment. Section 2 describes the proposedgeometrical constraint thatcan solve the permutation indeterminacy; in addition an integration with a beamformer,that can solve the residual permutation at low frequencies and the scaling ambiguity,is described. Section 3 shows some experimental results while section 4 concludes thework.

1. Blind Source Separation for Convolutive Mixtures

Let us consider a set ofN unknown and independent sourcess(n) = [s1(n), . . . , sN (n)]T ,such that the componentssi(n) are zero-mean and mutually independent. Signals re-ceived by an array ofM sensors are denoted byx(n) = [x1(n), . . . , xM (n)]T and arecalled mixtures. For simplicity we consider the case ofN = M .

The convolutive model introduces the following relation between thei-th mixedsignal and the original source signals

xi (n) =

N∑

j=1

K−1∑

k=0

aij (k)sj (n− k) , i = 1, . . . ,M (1)

The mixed signal is a linear mixture of filtered versions of the source signals,aij(k)represents thek-th mixing filter coefficient andK is the number of filter taps. The taskis to estimate the independent components from the observations without resort to apriori knowledge about the mixing system and obtaining an estimateu(n) of the originalsource vectors(n):

ui (n) =

M∑

j=1

L−1∑

l=0

wij (l)xj (n− l) , i = 1, . . . , N (2)

wherewij(l) denotes thel-th mixing filter coefficient andL is the number of filter taps.When a mixing environment is quite complex, filters of the ICA network may re-

quire thousands of taps to appropriately invert the mixing.In such cases, the time do-main methods have a large computational load to compute convolution of long filtersand are expensive for updating filter coefficients. The methods can be implemented inthe frequency domain using the Fast Fourier Transform (FFT)[15] in order to decreasethe computational load because the convolution operation in the time domain can be per-formed by element-wise multiplication in the frequency domain. Note that the convolu-tive mixtures can be expressed as

x (f, k) = A (f) s (f, k) , ∀f (3)

wherex(f, k) ands(f, k) are the frequency components of mixtures and the indepen-dent sources at frequencyf , respectively.A(f) denotes a matrix containing elementsof the frequency transforms of mixing filters at frequencyf . From (3), it is clear thatconvolutive mixtures can be represented by a set of instantaneous mixtures in the fre-quency domain. Thus, the independent components can be recovered by applying ICAfor instantaneous mixtures at each frequency bin and then transforming the results in thetime domain:

u (f, k) = W (f)x (f, k) , ∀f (4)

whereW(f) denotes the demixing matrix in the frequency domain. Note that s(f, k),x(f, k) andu(f, k) are vectors of complex elements.

In order to solve the BSS in the convolutive environment Bingham & Hyvarinenin [3] have proposed a complex-valued version of the well-know and best performingFastICA algorithm introduced in [7,8]. After a canonical centering and whitening pre-processing, the columnw of theW(f) matrix are obtained by maximizing the negen-tropy function [10], using the following equations

w+ = E{

x(wHx)g(∣

∣wHx∣

2)}

−E{

g(∣

∣wHx∣

2) +

∣wHx∣

2g′(

∣wHx∣

2)}

w,(5)

w =w+

‖w+‖, (6)

whereg(·) is a suitable nonlinear function andg′(·) its first derivative. We adopt thefollowing functiong(y) = 1/(y + a), with a a constant value, usually set toa = 0.1.Note that it is necessary to choose no learning rates in this algorithm.

2. The Geometrically Constrained Algorithm

The fundamental limitations of the convolutive BSS problemsolved by a frequency do-main approach are the frequency and scaling ambiguities: ifthe permutation is not con-sistent across frequency then converting the signal back tothe time-domain will combinecontributions from different sources into a single channel, and thus destroying the sepa-ration achieved in the frequency-domain (FD). In this paragraph we attempt to propose asolution exploiting some a priori information given by the knowledge of the geometricalposition of the sources.

2.1. Geometrical Constraints

The aim of recovering permutations is a severe problem, especially when the numberNof sources is large. Several solution exist to the previous problem. Some authors proposedto consider the consistency of the filter coefficients, that can be achieved by requiringcontinuity of filter values in frequency domain [12]. An alternative way is to considersome geometrical informations on source positions.

Recently several works, like [1], show that frequency-domain blind source separa-tion is equivalent to a set of frequency-domain adaptive beamformers (ABF) under cer-tain conditions. This equivalence suggests a geometrical constrained approach: the so-

Figure 1. Assumptions on microphones geometry:Mj is thej-th microphone whileSk is thek-th source.

lution of the ICA algorithm is projected on the ABF solution.In order to compute thebeamformer solution, it is necessary to estimate theDirection of Arrivals(DOA). DOAestimation can be performed by Generalized Cross-Correlation (GCC) [14], MUSIC [19]or other techniques described in literature [5].

The introduction of the geometrical constraint gives a robustness to the algorithmthat can not be achieved by the only ABF, that is very sensitive to small variations ofparameters. On the other side ICA algorithm can suffer of thelocal minima problem thatgives a solution far from the true one. The synergy between ICA and ABF can overcomethese problems.

After a canonical centering and a whitening pre-processingwith an orthogonal ma-trix Q

z = Qx , (7)

the demixing matrix can be written asW = TH · Q, whereT = [t1, . . . , tN ]H is arotation matrix that must be estimated by the FastICA algorithm in Eqs. (5) and (6).

Since the equivalence of the FD-ICA and FD-ABF described in [1], the demixingvector must be a scaled and whited version of the ABF delay vector:

topt = tABF = δVhtarget. (8)

This follows by the constraint

wHtargethtarget (f) = tHtargetQ · htarget (f) = c. (9)

The estimated delay vector, for each bin, has the form

htarget (f) =

1...

ej2π(N−1)d f

ccos θtarget

, (10)

whered is the distance between microphones,c the sound speed andθtarget is the DOAestimated by a beamformer. Figure 1 describes the geometry of the sources and the mi-crophone array.

Because we are interested only in the direction of the demixing vectorttarget andnot to its norm, we project this solution to the constraint inEq. (9), thus Eq. (6) becomes:

t =t+

∥t+HQ · h∥

. (11)

The performance of the proposed geometrically constrainedalgorithm is addressedevaluating the angle between the estimatedhk and truehk delay vector, by

µk(f) = cos−1

hHk hk

|hk|∣

∣hk

, (12)

and the angle between two delay vectors relative to different sources:

γkj(f) = cos−1

(

hHk hj

|hk| |hj |

)

. (13)

The proposed algorithm converges, if

µk(f) <γkj(f)

2= µc(f) ∀k, j. (14)

Unfortunately at low frequencies theµc(f) parameter is very small and a certain numberof permutations occur.

2.2. Combining ICA and Beamforming

In order to overcome the previous problem at low frequenciesa DOA estimation is per-formed [17,18]. Such estimation is done analyzing the directional patterns which allowus to associate a single source to a local minimum, as described in [18].

The l-th directional pattern can be expressed, for the non-restrictive case ofN = 2sources andM = 2 microphones, as follows

Fl(f, θ) =

2∑

k=1

Wlk(f) exp

[

j2πfd sin θ

c

]

, (15)

wherec is the sound speed,d is the distance between microphones andWlk(f) is thek-th entry of thewl column vector. These directional patterns have a minimum value incorrespondence of an estimated disturbing source [17,18].In particular the two sourcedirections are evaluated as

θ1 (f) = min

[

argminθ

|F1 (f, θ)| , argminθ

|F2 (f, θ)|

]

,

θ2 (f) = max

[

argminθ

|F1 (f, θ)| , argminθ

|F2 (f, θ)|

]

.(16)

Assuming that the minimum value of the patternF1(fk, θ) is θ2 for thek-th frequencybin fk, if for another frequency binfj the minimum value isθ1, then a permutationsoccurred (as shown in Figure 2) and the filter coefficients must be swapped. In additionthe value of the directional pattern in correspondence of the source, the red points inFigure 2, can be used as scaling factor in order to solve the scaling ambiguity. An optionalbeamformer can be used for the evaluation of the directionsθ1 andθ2.

Figure 2. The use of directivity patterns to solve the permutation indeterminacy.

Figure 3. The proposed architecture.

The use of the directional patterns allows us to solve both the permutation and scal-ing ambiguity: the patterns decide to choose the ICA solution or a swapped one, if apermutation occurs. Then a generalization forN > 2 sources can be easily derived.

The proposed algorithm can be summarized as proposed in Table 1 and a blockdiagram is shown in Figure 3.

Remove mean value from mixtures;

Whitening;

Initialization: t = Q · htarget;

repeat

FastICA;

Projection: t =t+

‖t+HQ·h‖

until convergence

Table 1. Summary of the proposed algorithm.

3. Results

We tested our architecture in a number of real room environments characterized by a re-verberation timeT60 in a range of0−300 ms. The impulse responses of the environmentwere simulated with the Matlab tool RoomSim2.

In order to provide a mathematical evaluation of the output separation, differentindexes of performance are available in literature. In thispaper the signal to interferenceratio (SIR)SIRj of thej-th source was adopted [20]:

SIRj = 10 log

E

{

(

|y|σ(j),j

)2}

/ E

k 6=j

(

|y|σ(j),k

)2

. (17)

2Roomsim is a MATLAB simulation of shoe-box room acoustics for use in teaching and research. Roomsimis available fromhttp://media.paisley.ac.uk/~campbell/Roomsim/

Figure 4. Placement of microphones in the test room in the caseN = 2.

In (17)yi,j is thei-th output signal when only thej-th input signal is present, whileσ(j)is the output channel corresponding to thej-th input. In addition we use the PerformanceIndex, defined as:

r =1

N

Nf∑

f=1

N∑

i=1

N∑

j=1

|Hij (f)|

maxj |Hij (f)|

+

N∑

i=1

N∑

j=1

|Hji (f)|

maxi |Hji (f)|

− 2N

,(18)

whereH(f) = W(f)A(f) is the matrix describing the overall mixing-demixing system,which ideally should be a permutation matrix, andNf is the total number of frequencybins. A perfect separation is achieved if the performance index in (18) approaches zero.In this paper we propose some experimental tests in a standard 5.80 × 3.20 m2 room.The sources are far1.15 m from the microphone array and microphones are spaced by2cm.

A first test with two male speakers and two microphones is proposed. The two speak-ers and microphones are located as shown in Figure 4 and theirdirections areθ1 = 45andθ2 = 135 degrees. The directivity patterns for this experiment is proposed in theFigure 5. Table 2 summarizes the SIR of the two recovered sources and the PerformanceIndexr. As we can see from Table 2 the proposed architecture is able to recover speech

T60 [ms] SIR1 [dB] SIR2 [dB] r

0 35.99 32.54 0.17

100 24.35 26.01 0.22

150 19.89 17.62 0.30

300 9.10 8.56 0.54

Table 2. Summary of the first proposed experimental test withθ1 = 45◦ andθ2 = 135

◦.

signals very well, but the performance falls down dramatically when the reverberationtime increases toward300 ms.

A second test is conducted with three sources. The third source is an interferencewhite noise located in front of the microphone array and thusat θ3 = 90 degrees, asshown in Figure 6. The algorithm is tested for a reverberation time into the range0÷300ms and results are summarized in Table 3. Also in this second test, as we can see from

Figure 5. Directivity pattern, forθ1 = 45◦ (first row) andθ2 = 135

◦ (second row).

Figure 6. Placement of microphones in the test room in the caseN = 3.

T60 [ms] SIR1 [dB] SIR2 [dB] SIR3 [dB] r

0 21.68 25.73 22.14 0.21

100 18.51 17.23 17.89 0.32

150 15.64 13.17 14.73 0.41

300 6.54 4.21 5.03 0.94

Table 3. Summary of the second proposed experimental test withθ1 = 45◦, θ2 = 135

◦ andθ3 = 90◦.

T60 [ms] SIR1 [dB] SIR2 [dB] r

0 32.46 30.17 0.22

100 20.13 21.72 0.38

150 15.66 13.89 0.51

300 6.18 6.32 1.03

Table 4. Summary of the third proposed experimental test withθ1 = 25◦ andθ2 = 155

◦.

Table 3, the separation is achieved. The introduction of a third source results in a smallworsening of the overall performances.

A third test is proposed for two male speech located atθ1 = 25◦ andθ2 = 155◦

respectively. Results are summarized in Table 4. The experimental test for a differentcouple of directions shows similar results with respect thefirst one, as can be seen from acomparison of Tables 4 and 2. Hence the proposed approach is able to perform separationof sources in this case as well.

Finally we want to observe what happens if the estimation of the DOA is not soaccurate. The following fourth experiment proposes a separation test for two male speechwhen the estimated DOAsθ1 andθ2 have an inaccuracy of±5◦. Table 5 summarizes theobtained results. Comparing Table 5 with Tables 2 and 4 we canargue that the proposed

T60 [ms] SIR1 [dB] SIR2 [dB] r

0 36.00 32.37 0.19

100 23.98 25.59 0.23

150 19.17 17.44 0.32

300 9.38 8.26 0.56

0 32.31 29.85 0.23

100 19.83 21.64 0.39

150 15.78 13.27 0.51

300 6.38 6.04 1.02Table 5. Summary of the third proposed experimental test withθ1 = 45

◦, θ2 = 135◦ (top rows), and

θ1 = 25◦, θ2 = 155

◦ (bottom rows) and an inaccuracy of±5◦.

solution is robust to a not so accurate estimate of the DOA directions.

4. Conclusions

This paper introduced a geometrically constrained algorithm for the frequency-domainblind source separation in convolutive environments. The proposed algorithm is able toreduce the permutation inconsistency, typical of the frequency-domain approaches.

Moreover an integration with a beamformer algorithm is madein order to solvecompletely the residual permutation ambiguity at the low frequencies, where the geo-metrical constraint is not enough. In addition this last addition can also solve the scalingambiguity.

Several tests have been performed to verify the effectiveness of the proposed ap-proach and demonstrate that it can solve the permutation andscaling indeterminaciesvery successfully. The quality of the separation has been evaluated in terms of the SIRIndex and the Performance Index, which are widely diffused in literature.

Acknowledgements

This work has been partially supported by the Italian National Project: Wireless mul-tiplatfOrm mimo active access netwoRks for QoS-demanding muLtimedia Delivery(WORLD), under grant number 2007R989S.

References

[1] S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, and H. Saruwatari. Equivalence betweenfrequency-domain blind source separation and freqquency-domain adaptive beamforming for convolu-tive mixtures.Journal on Applied Signal Processing, 11:1157–1166, 2003.

[2] A. J. Bell and T. J. Sejnowski. An information-maximisationapproach to blind separation and blinddeconvolution.Neural Computation, 7:1129–1159, 1995.

[3] E. Bingham and A. Hyvarinen. A fast fixed-point algorithmfor independent component analysis ofcomplex-valued signals.International Journal of Neural Systems, 10(1):1–8, 2000.

[4] S. Choi, A. Cichocki, H.-M. Park, and S.-Y. Lee. Blind source separation and independent componentanalysis: A review.Neural Information Processing - Letters and Reviews, 6(1):1–57, January 2005.

[5] T. Gustafsson, B. Rao, and M. Trivedi. Source localization in reverberant environments: modeling andstatistical analysis.IEEE Transactions on Speech and Audio Processing, 11(6):791–803, Nov. 2003.

[6] S. Haykin, editor.Unsupervised Adaptive Filtering, Volume 1: Blind Signal Separation. Wiley, 2000.[7] A. Hyvarinen. Fast and robust fixed-point algorithms for independent component analysis.IEEE Trans-

actions on Neural Networks, 10(3):626–634, May 1999.[8] A. Hyvarinen and E. Oja. A fast fixed-point algorithm for independent component analysis.Neural

Computation, 9:1483–1492, 1997.[9] A. Hyvarinen and E. Oja. Independent component analysis:algorithms and applications.Neural Net-

works, 13:411–430, 2000.[10] A. Hyvärinen, J. Karhunen, and E. Oja.Independent Component Analysis. John WIley & Sons, Inc.,

2001.[11] S. Ikeda and N. Murata. A method of ica in time-frequency domain. InProc. Workshop Indep. Compon.

Anal. Signal. Sep., pages 365–370, 1999.[12] M. Z. Ikram and D. R. Morgan. Permutation inconsistency in blind speech separation: Investigation and

solutions.IEEE Transaction on Speech and Audio Processing, 13(1):1–13, January 2005.[13] M. Knaak, S. Araki, and S. Makino. Geometrically constrained independent component analysis.IEEE

Transactions on Audio, Speech and Language Processing, 15(2):715–726, Feb. 2007.[14] C. Knapp and G. Carter. The generalized correlation method for estimation of time delay.IEEE Trans-

actions on Acoustics, Speech and Signal Processing, 24(4):320–327, Aug. 1976.[15] A. V. Oppenheim, R. W. Schafer, and J. R. Buck.Discrete-Time Signal Processing. Prentice-Hall, 2nd

edition, 1999.[16] L. Parra and C. Spence. Convolutive blind separation ofnonstationary sources.IEEE Transaction on

Speech and Audio Processing, 8:320–327, 2000.[17] H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, and K. Shikano. Blind source separation based on

a fast-convergence algorithm combining ica and beamforming.IEEE Transaction on Audio, Speech andLanguage Processing, 14(2):666–678, March 2006.

[18] H. Saruwatari, S. Kurita, K. Takeda, F. Itakura, T. Nishikawa, and K. Shikano. Blind source separationcombining independent component analysis and beamforming.EURASIP Journal on Advances in SignalProcessing, 2003(11):1135–1146, November 2003.

[19] R. O. Schmidt. Multiple emitter location and signal parameter estimation.IEEE Transaction on Anten-nas and Propagation, 34:276–280, March 1986.

[20] D. Shobben, K. Torkkola, and P. Smaragdis. Evaluation ofblind signal separation methods. InIn Proc.of ICA and BSS, pages 239–244, Aussois, France, Jenuary, 11-15 1999.

[21] P. Smaragdis. Blind separation of convolved mixtures in the frequency domain.Neurocomputing,22:21–34, 1998.