On a chirplet transform-based method applied to separating and counting wolf howls

10
Signal Processing 88 (2008) 1817–1826 On a chirplet transform-based method applied to separating and counting wolf howls $ B. Dugnol, C. Ferna´ndez, G. Galiano , J. Velasco Dpto. de Matema´ticas, Universidad de Oviedo, c/Calvo Sotelo, 33007-Oviedo, Spain Received 2 August 2007; received in revised form 29 November 2007; accepted 21 January 2008 Available online 31 January 2008 Abstract We propose a parametric model based on chirp decomposition to modelize wolf chorus howl production. Our aim is counting the number of individuals present in a given recording, task accomplished by phase and amplitude estimation of their corresponding howls. The Chirplet transform is used to obtain a first order approximation of the phase, improving the zero order approximation given by other methods, such as the short time Fourier transform (STFT). This gain in accuracy allows us to use criteria for a more accurate chirp tracking, specially at crossing points in the time–frequency plane, and in the determination of harmonics. We explore the efficiency of the method by applying it to synthetic signals as well as to wolves chorus recordings. Results show good performance for chirp tracking even under strong noise corruption. r 2008 Elsevier B.V. All rights reserved. Keywords: Chirplet transform; Instantaneous frequency estimation; Parametric method; Voice separation; Multiple instantaneous frequency tracking; Population counting 1. Introduction It is well known that the social controversy caused by wolf in most of the countries they are present. On the one hand, in very populated regions, such as Europe, they compete with human both for space and natural resources, causing, in many occasions, damages and costs to human interests, mainly through attacks to livestock. On the other hand, due to extinction risk, wolf is a protected specie around the world, being the costs they produce usually charged to public budgets [1]. Hence, it is a common interest to scientists and authorities to estimate their populations. However, estimating wolf population is a difficult task which, traditionally, relied either on the recuperation of field traces, such as tracks on the snow, or in the direct observation of wolf packs. The efficiency of these methods is quite variable, depending on the specific habitat of interest (existence of regular and long-lasting snowfalls, orographic characteristics of the region, etc). As a complement to traditional techniques, we present in this article a method based on Signal Theory to extract information about wolf populations of a given area from field recordings of ARTICLE IN PRESS www.elsevier.com/locate/sigpro 0165-1684/$ - see front matter r 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2008.01.018 $ All the authors are partially supported by Project PC07-12, Gobierno del Principado de Asturias, Spain. Third and fourth authors are partially supported by the Spanish DGI Project MTM2007-65088. Corresponding author. Tel.: +34 985182296; fax: +34 985103354. E-mail addresses: [email protected] (B. Dugnol), [email protected] (C. Ferna´ ndez), [email protected] (G. Galiano), [email protected] (J. Velasco).

Transcript of On a chirplet transform-based method applied to separating and counting wolf howls

ARTICLE IN PRESS

0165-1684/$ - se

doi:10.1016/j.si

$All the aut

Gobierno del P

authors are pa

MTM2007-650�Correspond

fax: +3498510

E-mail addr

[email protected]

[email protected]

Signal Processing 88 (2008) 1817–1826

www.elsevier.com/locate/sigpro

On a chirplet transform-based method applied to separating andcounting wolf howls$

B. Dugnol, C. Fernandez, G. Galiano�, J. Velasco

Dpto. de Matematicas, Universidad de Oviedo, c/Calvo Sotelo, 33007-Oviedo, Spain

Received 2 August 2007; received in revised form 29 November 2007; accepted 21 January 2008

Available online 31 January 2008

Abstract

We propose a parametric model based on chirp decomposition to modelize wolf chorus howl production. Our aim is

counting the number of individuals present in a given recording, task accomplished by phase and amplitude estimation of

their corresponding howls. The Chirplet transform is used to obtain a first order approximation of the phase, improving

the zero order approximation given by other methods, such as the short time Fourier transform (STFT). This gain in

accuracy allows us to use criteria for a more accurate chirp tracking, specially at crossing points in the time–frequency

plane, and in the determination of harmonics. We explore the efficiency of the method by applying it to synthetic signals as

well as to wolves chorus recordings. Results show good performance for chirp tracking even under strong noise corruption.

r 2008 Elsevier B.V. All rights reserved.

Keywords: Chirplet transform; Instantaneous frequency estimation; Parametric method; Voice separation; Multiple instantaneous

frequency tracking; Population counting

1. Introduction

It is well known that the social controversycaused by wolf in most of the countries they arepresent. On the one hand, in very populated regions,such as Europe, they compete with human both forspace and natural resources, causing, in manyoccasions, damages and costs to human interests,

e front matter r 2008 Elsevier B.V. All rights reserved

gpro.2008.01.018

hors are partially supported by Project PC07-12,

rincipado de Asturias, Spain. Third and fourth

rtially supported by the Spanish DGI Project

88.

ing author. Tel.: +34985182296;

3354.

esses: [email protected] (B. Dugnol),

s (C. Fernandez), [email protected] (G. Galiano),

s (J. Velasco).

mainly through attacks to livestock. On the otherhand, due to extinction risk, wolf is a protectedspecie around the world, being the costs theyproduce usually charged to public budgets [1].Hence, it is a common interest to scientists andauthorities to estimate their populations. However,estimating wolf population is a difficult task which,traditionally, relied either on the recuperation offield traces, such as tracks on the snow, or in thedirect observation of wolf packs. The efficiency ofthese methods is quite variable, depending on thespecific habitat of interest (existence of regular andlong-lasting snowfalls, orographic characteristics ofthe region, etc). As a complement to traditionaltechniques, we present in this article a method basedon Signal Theory to extract information about wolfpopulations of a given area from field recordings of

.

ARTICLE IN PRESSB. Dugnol et al. / Signal Processing 88 (2008) 1817–18261818

their choruses. In previous papers [2,3] we dealt withthe first step of our technique: denoising andenhancing the signals contained in recordings made,usually, in adverse situations. In the present work,we study the utilization of a parametric modeltogether with the Chirplet transform to extract therelevant information (howls and barks) from thedenoised recordings and give, after a suitableprocessing of the signal, an estimation of thenumber of individuals present in such recordings.

In our numerical experiments we use wolveschoruses signals recorded both in wilderness andin captivity [4,5]. The objective is to separate thedifferent howls composing the chorus and thereforemay be considered as a source separation problem.The literature on this subject is broad, ranging fromsimpler situations like the n-microphone n-sourcesseparation problem to more complex analysis suchas the one-microphone blind source separationproblem [6–8]. Our problem is placed somewherein between since it is a one-microphone multi-sourceseparation problem, but it is not of a blind naturesince a parametric representation based on a chirpdecomposition of the signal seems to be reasonablefor the wolf howls modelling.

2. Howl tracking and separation

A wolves chorus is composed, mainly, by howlsand barks which, from the analytical point of view,may be regarded as chirp functions, see Fig. 1. Theformer have long time support small frequencyrange variation, while the latter are almost punctu-ally localized in time but posses a large frequencyspectrum. It is convenient, therefore, adopting a

1 2 3 40

500

1000

1500

2000

2500

Time

Freq

uenc

y

Spectrogram

Fig. 1. A field recorded signal containing several multi-harmonic

howls and barks together with a strong background noise.

parametric model to represent the wolves chorus asan addition of chirps given by the functionf : ½0;T � ! C,

f ðtÞ ¼XN

n¼1

anðtÞ exp½ifnðtÞ�, (1)

with T the chorus duration, an and fn the chirpsamplitude and phase, respectively, and N thenumber of chirps composing the chorus. We noticethat N is not necessarily the number of wolves since,for instance, harmonics of a fundamental tone arecounted separately.

To identify the unknowns N, an and fn weproceed in two steps. Firstly, for a discretization ofthe time interval ½0;T �, say tk ¼ kt, for t40 andk ¼ 0; . . . ;K, we estimate the amplitude anðtkÞ andthe phase fnðtkÞ of chirps contained at such discretetimes. Secondly, we establish criteria which allow usdeciding if the computed estimates at adjacent timesdo belong to the same global chirp or do not. Weshall describe the first step in the present section andpostpone the treatment of the second step to thesection of numerical experiments.

There exist a variety of methods for amplitudeand phase estimation. Among parametric models,such as sinusoidal representations of McAulay andQuatieri type, amplitude and phase are estimatedfrom STFT simple peak-picking algorithms [9], orfrom energy considerations based on the Teager-Kaiser operator E ¼ ðx0ðtÞÞ2 � xðtÞx00ðtÞ, [10,11].Nonparametric models rely on IF estimation viatime–frequency distributions, such as the spectro-gram, the Wigner–Ville distribution and generalCohen’s class distributions, [12–16]. Other interest-ing techniques are those based on signal-dependentdecompositions of the given signal, like the empiri-cal mode decomposition (EMC) [17], or theGianfelici transform [18]. These methods work inthe time domain and process, therefore, faster thanthose based on the time–frequency domain. Theirnatural field of applications are nonlinear and non-stationary signals and, to some extent, are capableof separating multi-component signals. However,the EMD requires a broad separation of compo-nents IF bands [19], and it is not, therefore, usefulfor separating crossing chirps, while the Gianfelicidecomposition does not allow to deal with signalscontaining a time-varying number of components,which is a usual situation in our application.

Since in our problem we are facing difficultiessuch as tracking the chirps at crossing points or

ARTICLE IN PRESSB. Dugnol et al. / Signal Processing 88 (2008) 1817–1826 1819

identifying harmonics of a given fundamentalchirp, it is convenient approximating the phasewith more accuracy than just the first order IFestimation. The main tools available in the litera-ture to jump to a second order chirp rate esti-mation are those based either on the Chirplettransform [20,21], that we use in this article, or onthe Fourier fractional transform [22,23]. TheChirplet transform of a function f 2 L2ðRÞ isdefined as

Cf ðt0; x;m; lÞ ¼Z 1�1

f ðtÞct0;x;m;lðtÞdt, (2)

with the complex window ct0;x;m;l given by

ct0;x;m;lðtÞ ¼ vlðt� t0Þ exp i xtþm2ðt� t0Þ

2� �h i

. (3)

Here, v 2 L2ðRÞ denotes a non-negative, symmetricand normalized real window, vlð�Þ ¼ vð�=lÞ=

ffiffiffilp

, forl40, and the parameters t0; x;m 2 R stand for time,IF and chirp rate, respectively. The Chirplet trans-form may be interpreted twofold. On on hand, it isthe STFT of the signal with the complex windowwðtÞ ¼ vlðtÞ expðiðm=2Þt2Þ, i.e.

Cf ðt0; x;m; lÞ ¼Z 1�1

f ðtÞwðt� t0Þ exp½�ixt�dt, (4)

and therefore its numerical implementation at agiven instant relies on the usual fast Fouriertransform algorithm. On the other hand, we mayrewrite Cf as

Cf ðt0; x;m; lÞ ¼Z 1�1

gf l;t0 ðxÞ exp �i xxþm2

x2� �h i

dx,

implying that the Chirplet transform provides acorrelation measure of the linear chirp exp½�iðxxþ

ðm=2Þx2Þ� and the portion of the signal centered at t0,gf l;t0 ðxÞ ¼ f ðxþ t0ÞvlðxÞ exp½�ixt0�. It is, then, clearthat for a linear chirp

f ðtÞ ¼ aðtÞ exp ia2ðt� t0Þ

2þ bðt� t0Þ þ g

� �h i,

and for fixed t0 and l, the quadratic energydistribution PCf ðt0; x;m; lÞ ¼ jCf ðt0; x; m; lÞj2 has aglobal maximum at ðx;mÞ ¼ ða;bÞ, allowing us todetermine the IF and chirp rate of a given linearchirp at a fixed time by computing the maximumpoint of the energy distribution. It is not difficult toprove that for more general forms of mono-component chirps the following localization resultholds: Let f ðtÞ ¼ aðtÞ exp½ifðtÞ�, with a 2 L2ðRÞ non-negative and f 2 C3ðRÞ. For all e40 and x; m 2 R

there exists L40 such that if loL then

PCf ðt0; x; m; lÞpleþ PCf ðt0;f0ðt0Þ;f

00ðt0Þ; lÞ. (5)

In addition,

liml!0

1

lPCf ðt0;f

0ðt0Þ;f

00ðt0Þ; lÞ ¼ aðt0Þ

2. (6)

In other words, for a general mono-componentchirp the energy distribution maximum provides anarbitrarily close approximation to the IF and chirprate of the signal. Moreover, its amplitude may beestimated by shrinking the window time support atthe maximum point.

Finally, for a multi-component chirp f ðtÞ ¼PNn¼1 anðtÞ exp½ifnðtÞ� the situation is somehow more

involved since although the energy distribution stillhas maxima at ðf0nðt0Þ;f

00nðt0ÞÞ for all n such that

anðt0Þa0, these maxima are of local nature, and infact, spurious local maxima not corresponding toany chirp may appear due to energy interactionamong actual chirps.

3. Numerical experiments

In this section, we present some results obtainedwith the algorithm described in the previous sectionwhen applied to both synthetic and wolves chorusesfield recorded signals.

In the first part of Experiment 1, we process aclean synthetic signal composed by three nonlinearchirps with multiple crossings. The computation ofIF, chirp rate and amplitude approximations is veryaccurate and allows us to separate the three chirpsin their whole time support. In the second part ofthis experiment, we add to the previous signal aGaussian noise of the same amplitude, i.e., withsignal to noise ratio, SNR ¼ 0. We perform heretwo computations. Firstly, we process the roughnoisy signal and observe that the algorithm is stillcapable of separating the chirps in the highestamplitude time segments, but fails on identifyingthem for low amplitude time segments. Conse-quently, we restart the noisy signal analysis byfiltering it in advance with the noise reduction signalenhancement algorithm described in [3], and thenuse the Chirplet transform-based algorithm forseparating the resulting signal. In this case, thelow amplitude tails of the original signal arecaptured more accurately, although the computa-tional cost is significantly increased.

Experiments 2 and 3 involve actual wolf chorusesfield recorded signals [4,5]. In Experiment 2, we deal

ARTICLE IN PRESSB. Dugnol et al. / Signal Processing 88 (2008) 1817–18261820

with a signal of a unique wolf howling. As expected,the signal is composed by a formant and a sequenceof harmonics. Due to the background noise, onlyone of the harmonics has enough intensity to keepdetectable. We use the noise reduction algorithmfollowed by the chirp separation algorithm to trackand separate the formant and its harmonic. Wenotice that it is the fact that chirps are separatedwhat allow us to recognize the formant-harmonicrelationship among them. Finally, in Experiment 3,we apply the separation algorithm to a very complexsignal in which a chorus of an unknown number ofadults and sub-adults wolves are howling andbarking. The result of the algorithm provides alower estimation of the number of individuals in thechorus which seems reasonable to the biologist ofthe team.

3.1. Implementation of the algorithm

For the numerical simulations, we use theexpression of the Chirplet transform given by (4).The window, vl, is fixed in all the experiments as aGaussian window centered at the origin, withl ¼ 0:1 s. We consider a discrete mesh of chirprates, mm, for m ¼ 1; . . . ;M, and then, for each m,the DFT (4) is computed. Thus, we obtain thediscrete Chirplet transform Cf ðtk; x‘; mm; lÞ, withtk ¼ kt, for k ¼ 1; . . . ;K , and ‘ ¼ 1; . . . ;L, and withK and L depending on the sample frequency and theduration of the signal. The time step, t, is fixed inadvance and it is related to the window overlappingin the DFT computation. Observe that for a signalcomposed by N samples, the number of operations

0.5 1 1.5 2 2.50

1000

2000

3000

4000

Time

Freq

uenc

y

Fig. 2. Left: STFT of a field recorded signal. Right: quadratic energy di

to IF and chirp rate chirps locations. We observe the different behavio

is of order OðMN logNÞ (signal DFT operationsorder times discrete chirp rates number).

According to (5), when the signal is mono-component or the various components of the signalare far from each other relative to the windowwidth, the maximum of PCf ðtk; �; �; lÞ is located atsome ðx;mÞ arbitrarily closed to the IF and chirprate, ðf0nðtkÞ;f

00nðtkÞÞ, of a chirp modulated by fnðtÞ.

However, when multi-component signals are closeto each other or are crossing, some spurious localmaxima are produced which do not correspond toany actual chirp. Therefore, some criterion must beused to select the correct local maxima at each tk.Although we lack of an analytical proof, there areevidences suggesting that maxima produced bychirps, i.e., at points of the type ðf0nðtkÞ;f

00nðtkÞÞ,

decrease much faster in the x direction than in the mdirection, see Fig. 2, a phenomenon that does notoccur at spurious maxima. We use this fact tochoose the candidates first by selecting x‘, for‘ ¼ 1; . . . ;L, which are maxima for

supm

PCf ðtk; x;m; lÞ,

and, among them, selecting the maxima with respectto m of PCf ðtk; x‘;m; lÞ. Next, we establish athreshold parameter to filter out possible localmaxima located at points that do not correspondto any f0nðtkÞ but which are close to two of them. Weset this threshold such that the existence of twoconsecutive maxima is avoided. In this way weobtain, for each tk, a set of points ðxnk

;mnkÞ, for

nk ¼ 1; . . . ;Nk, which correspond to IF’s and chirprates estimations of chirps with time supportincluding tk.

0

Chirp-rate

10002000 3000

Instantaneous frequency

stribution of the Chirplet transform at t0 ¼ 2. Maxima correspond

r in the x and m directions at these maxima.

ARTICLE IN PRESSB. Dugnol et al. / Signal Processing 88 (2008) 1817–1826 1821

The next step is chirp separation. We shall assumethat two points ðxnk

;mnkÞ and ðxnkþ1

; mnkþ1Þ belong to

the same chirp if

1

no

2xnkþ1� tmnkþ1

2xnk� tmnk

on for small n

ðn � 1:05 in experimentsÞ. (7)

To justify this criterion, let us assume that f0n, withsupport containing the interval ½tk; tkþ1�, is regular

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5Chirp 1

Time

Ana

lytic

am

plitu

de

0 2 40

250

500

1000

1250

1500

750

Ch

T

Inst

anta

neou

s fre

quen

cy

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5Chirp 2

Time

Ana

lytic

am

plitu

de

0 2 40

250

500

750

1000

1250

1500Ch

T

Inst

anta

neou

s fre

quen

cy

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5Chirp 3

Time

Ana

lytic

am

plitu

de

0 2 40

250

500

750

1000

1250

1500Ch

T

Inst

anta

neou

s fre

quen

cy

Fig. 3. Exact (solid line) and computed (dots) amplitudes, IF’s and

enough and consider its Taylor expansion both at tk

f0n tk þt2

� �¼ f0nðtkÞ þ

f00nðtkÞ

2tþ oðtÞ,

and at tkþ1

f0n tkþ1 �t2

� �¼ f0nðtkþ1Þ �

f00nðtkþ1Þ

2tþ oðtÞ,

with oðtÞ denoting a regular function such thatoðtÞt�1! 0 as t! 0. Since tk þ t=2 ¼ tkþ1 � t=2,

6 8 10

irp 1

ime0 2 4 6 8 10

−2000

−1000

0

1000

2000Chirp 1

TimeC

hirp

−rat

e

6 8 10

irp 2

ime0 2 4 6 8 10

−2000

−1000

0

1000

2000Chirp 2

Time

Chi

rp−r

ate

6 8 10

irp 3

ime0 2 4 6 8 10

−2000

−1000

0

1000

2000Chirp 3

Time

Chi

rp−r

ate

chirp rates for each voice of the clean signal of Experiment 1.

ARTICLE IN PRESSB. Dugnol et al. / Signal Processing 88 (2008) 1817–18261822

we have

f0nðtkÞ þf00nðtkÞ

2t ¼ f0nðtkþ1Þ �

f00nðtkþ1Þ

2tþ oðtÞ.

Therefore, writing ðxnk;mnkÞ ¼ ðf0nðtkÞ;f

00nðtkÞÞ, and

the analogous expression for tkþ1, we get

xnkþ

t2mnk� xnkþ1

�t2mnkþ1

for small t,

from where (7) is derived. In case test (7) is satisfiedby more than one point, typically happening atchirps crossings points, we impose a regularitycriterion and choose the point with a closer chirprate to that of ðxik

;mikÞ.

Summarizing, the chirp separation algorithm isimplemented as follows:

Fig

For the initial time, t0, compute the local maximaof PCf ðt0; x; m; lÞ and consider they are startingpoints of different chirps.

� For subsequent discrete times, tk, compute the

local maxima of PCf ðtk; x;m; lÞ, denoted byðxnk

;mnkÞ, for nk ¼ 1; . . . ;Nk (Nk local maxima

at tk) and apply criterion (7) to check if theymatch with an already detected chirp. When anyof them do not, it is established as the startingpoint of a new chirp.

� When the above iteration is finished and to avoid

artifacts due to numerical errors, we disregard chirpsof very short duration (one or two time steps).

Finally, once the chirps are separated, we use (6) toestimate the amplitude

anðtkÞ2�

1

lPCf ðt0; xnk

;mnk; lÞ,

for small l. Again, to avoid artifacts due tonumerical discretization, we neglect portions of

0 2 4 6 8 100

250

500

750

1000

1250

1500

Time

Inst

anta

neou

s fre

quen

cy

Howl−tracking result

. 4. Chirp separation for the noisy signal of Experiment 1 with and

signals with an amplitude lower than certain relativethreshold, � 2 ð0; 1Þ, of the maximum amplitude ofthe whole signal, considering that in this case nochirp is present.

Let us finally mention that all the experiments areperformed on 16 bites–44.1KHz sampled signals,being typical running times on a Centrino Duoprocessor in the range of minutes.

3.2. Experiment 1: synthetic signals

In this first experiment we test our algorithmwith: (i) a clean synthetic signal, f, composed by theaddition of three nonlinear chirps, and (ii) the samesignal corrupted with an additive noise of similaramplitude than that of f, i.e., SNR ¼ 0. The cleansignal is given by f ðtÞ ¼

P3i¼1 aiðtÞ cosfiðtÞ, for

t 2 ½0; 8�, with amplitudes

a1ðtÞ ¼1

3exp �

ðt� 3:5Þ2

5

� �X ½1;6�ðtÞ,

a2ðtÞ ¼1

3exp �

ðt� 5Þ2

4

� �X ½2;8�ðtÞ,

a3ðtÞ ¼1

3exp �

ðt� 6Þ2

4

� �X ½3:5;6:5�ðtÞ

with X ½a;b� denoting the characteristic function ofthe time sub-interval ½a; b�, and with phases

f1ðtÞ ¼ � 13:07t5 þ 226:2t4 � 1434t3

þ 4055t2 � 4408t,

f2ðtÞ ¼ 1:733t5 � 42:92t4 þ 381:1t3

� 1459t2 þ 2798t,

f3ðtÞ ¼ � 4:973t5 þ 166:3t4 � 2172t3

þ 0:138t2 þ 0:411t.

0 2 4 6 8 100

250

500

750

1000

1250

1500

Time

Inst

anta

neou

s fre

quen

cy

Howl−tracking result

without previous noise filtering (left and right plots, respectively).

ARTICLE IN PRESSB. Dugnol et al. / Signal Processing 88 (2008) 1817–1826 1823

We used the same time step, t ¼ 0:2 s, and windowwidth, l ¼ 0:1 s, to process both clean and noisysignals, while we set the relative threshold amplitudelevel to � ¼ 0:01 for the clean signal and to � ¼ 0:1for the noisy signal. The results of our algorithm ofseparation are shown in Fig. 3. We observe that forthe clean signal all chirps are captured with a highdegree of accuracy even at crossing points. How-ever, although the number of chirps is correctlycomputed for the noisy signal, we observe that themain effect of noise corruption is the loss ofinformation at chirps low amplitude range. A way

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5Chirp 1

Time

Ana

lytic

am

plitu

de

0 2 40

250

500

750

1000

1250

1500Ch

T

Inst

anta

neou

s fre

quen

cy

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5Chirp 2

Time

Ana

lytic

am

plitu

de

0 2 40

250

500

750

1000

1250

1500Ch

T

Inst

anta

neou

s fre

quen

cy

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5Chirp 3

Time

Ana

lytic

am

plitu

de

0 2 40

250

500

750

1000

1250

1500Ch

T

Inst

anta

neou

s fre

quen

cy

Fig. 5. Exact (solid line) and computed (dots) amplitudes, IF’s and chir

to improve this result is applying the noise reductionsignal enhancement algorithm described in [3] to thenoisy signal and, afterwards, using the separationalgorithm. As can be seen in Fig. 4, the result issomehow better than for the rough noisy signal. InFig. 5 we show the amplitude, IF and chirp rateestimations of the chirp which is more affected bythe noise corruption, for both clean and noisysignals. The main effect of noise corruption isobserved in the amplitude computation and in theloss of information at the tails of the threemagnitudes.

6 8 10

irp 1

ime0 2 4 6 8 10

−2000

−1000

0

1000

2000Chirp 1

TimeC

hirp

−rat

e

6 8 10

irp 2

ime0 2 4 6 8 10

−2000

−1000

0

1000

2000Chirp 2

Time

Chi

rp−r

ate

6 8 10

irp 3

ime0 2 4 6 8 10

−2000

−1000

0

1000

2000Chirp 3

Time

Chi

rp−r

ate

p rates for each voice of the filtered noisy signal of Experiment 1.

ARTICLE IN PRESS

0 2 4 60

250

500

750

1000

1250

1500

Time

Freq

uenc

y

Spectrogram

0 2 4 60

250

500

750

1000

1250

1500

Time

Freq

uenc

y

Howl−tracking result

Fig. 6. Left: experiment 2 signal’s STFT. Right: result of the separation algorithm.

0 2 4 6750

1000

1250

1500

Time

Freq

uenc

y

Fig. 7. The separation algorithm allows us to check the

harmonicity of chirps. We plot the high frequency chirp and

the diadic translation of the low frequency chirp of Fig. 6.

B. Dugnol et al. / Signal Processing 88 (2008) 1817–18261824

3.3. Experiment 2. One wolf multi-harmonic emission

The signal analyzed in this experiment, withSTFT showed in Fig. 6, is formed by a howl of aunique individual which is composed by a formant-chirp and its corresponding harmonic chirps amongwhich only one is detected, due to the high intensitynoise level. Before using the separation algorithm,we filtered the signal to the relevant frequency band½250; 1500�Hz and then applied the noise reductionsignal enhancement algorithm of [3].

Parameters were fixed as follows: time step tot ¼ 0:2 s, amplitude threshold to � ¼ 0:05 andwindow width to l ¼ 0:1 s. The result is shown inFig. 6. We observe that a segment of the harmonic isnot detected, as expected due to the high SNR.However, the algorithm detects and separates bothchirps accurately enough to perform the computa-tion plotted in Fig. 7, showing that both chirps areharmonics and, therefore, allowing us to concludethat only one individual is emitting.

3.4. Experiment 3. A wolves chorus

In this experiment we analyze a rather complexsignal obtained from field recordings of wolveschoruses in wilderness, [4], see Fig. 2. Due to thenoise present in the recording, we first use the PDEalgorithm to enhance the signal and reduce thenoise, see [3] for details. For the separationalgorithm, we fixed the time step as t ¼ 0:03 s, therelative amplitude threshold as � ¼ 0:01 and thewindow width as l ¼ 0:0625 s.

The output is composed by 32 chirps which shouldcorrespond to the howls and barks (with all their

harmonics) emitted by the wolves along the durationof the recording. The result is shown in Fig. 8. Sinceour aim is giving an estimate of how manyindividuals are emitting in a recording, we plot azoom of the separating algorithm result for thesignals having time support at t ¼ 2. Here, thenumber of chirps reduces to six. However, it seemsthat one couple of them are harmonics, the coupleformed by the chirp around 1000Hz and the chirpwith highest IF. Therefore, we may conclude that atleast five wolves are emitting during the interval oftime plotted, t 2 ð1; 2:5Þ. A similar analysis may becarried out with other time slices until all therecorded signal is covered and a global lower estimateof the number of individuals is thus obtained.

4. Conclusions

A chirp tracking and separation algorithm basedon the Chirplet transform applied to a parametric

ARTICLE IN PRESS

0 1 2 30

500

1000

1500

2000

2500

Time

Freq

uenc

y

Howl−tracking result

1 1.5 2 2.50

500

1000

1500

2000

2500

Time

Freq

uenc

y

Howl−tracking result

Fig. 8. Left: result of the separation algorithm. Right: zoom of the sub-interval ð1; 2:5Þ, showing six chirps which correspond to five wolves

emissions.

B. Dugnol et al. / Signal Processing 88 (2008) 1817–1826 1825

AM-FM model is utilized for wolf populationcounting. Although field recorded wolf chorussignals posses a complex structure due to noisecorruption and nonlinear non-stationary multi-component character, the outcome of our algorithmprovides us with accurate estimates of the numberof individual sound sources present in a givenrecording. Analytical results for mono-componentor sufficiently separated multi-component signalsare provided ensuring that numerical approxima-tions are arbitrarily close to exact solutions.However, the case of multi-component signalscontaining crossing chirps, although with goodnumerical results, is not yet well understoodfrom the analytical point of view and deservesfuture research. Drawbacks of the numerical algo-rithm are related to the expert dependent electionof some parameters, such as the amplitude thresh-old, or to the execution time when separationof long duration signals must be accomplished.We are currently working in the improvement ofthese aspects as well as in the recognition ofcomponents corresponding to the same individualsound source, such as harmonics of a fundamentalchirp, pursuing the full automatization of thecounting algorithm.

Acknowledgments

The authors would like to thank to the anon-ymous reviewers, whose comments resulted insignificant improvements in the final draft.

References

[1] A. Skonhoft, The costs and benefits of animal predation: an

analysis of scandinavian wolf re-colonization, Ecol. Econ. 58

(4) (2006) 830–841.

[2] B. Dugnol, C. Fernandez, G. Galiano, Wolves counting by

spectrogram image processing, Appl. Math. Comput. 186

(2007) 820–830.

[3] B. Dugnol, C. Fernandez, G. Galiano, J. Velasco, Imple-

mentation of a diffusive differential reassignment method for

signal enhancement. An application to wolf population

counting, Appl. Math. Comput. 193 (2007) 374–384.

[4] L. LLaneza, V. Palacios, Field recordings obtained in

wilderness in Asturias, Spain, in the 2003 campaign,

Asesores en Recursos Naturales, S.L.

[5] L. LLaneza, V. Palacios, Field recordings obtained in

captivity in Spain and Portugal in the 2005 campaign.

Asesores en Recursos Naturales, S.L.

[6] M. Zibulevsky, B.A. Pearlmutter, Blind source separation by

sparse decomposition in a signal diccionary, Neural Com-

put. 13 (2001) 863–882.

[7] L. Benaroya, F. Bimbot, Wiener based source separation

with hmm/gmm using a single sensor, in: 4th International

Symposium on ICA Blind Signature Separation (ICA 2003),

Nara, Japan, 2003, pp. 957–961.

[8] T. Roweis, One microphone source separation, in: Advances

in Neuronal Information Processing Systems (NIPS 00),

vol. 13, 2001.

[9] R.J. McAulay, T.F. Quatieri, Speech analysis/synthesis

based on a sinusoidal representation, IEEE Trans. Acoustics

Speech Signal Process. 34 (4) (1986) 744–754.

[10] P. Maragos, J.F. Kaiser, T.F. Quatieri, Energy separation in

signal modulations with applications to speech analysis,

IEEE Trans. Speech Audio Process. 41 (10) (1993)

3024–3051.

[11] B. Santhanam, P. Maragos, Multicomponent am-fm demo-

dulation via periodicity-based algebraic separation and

energy-based demodulation, IEEE Trans. Commun. 48 (3)

(2000) 473–490.

ARTICLE IN PRESSB. Dugnol et al. / Signal Processing 88 (2008) 1817–18261826

[12] L. Stankovic, I. Djurovic, Instantaneous frequency estima-

tion by using the Wigner distribution and linear interpola-

tion, Signal Process. 83 (3) (2003) 483–491.

[13] H.K. Kwok, D.L. Jones, Improved instantaneous frequency

estimation using an adaptive short-time Fourier transform,

IEEE Trans. Signal Process. 48 (10) (2000) 2964–2972.

[14] Z. Zhao, M. Pan, Y. Chen, Instantaneous frequency

estimate for non-stationary signal, in: Fifth World Congress

on Intelligent Control and Automation, 2004, WCICA 2004,

vol. 4, 2004, pp. 3641–3643.

[15] B. Boashash, Estimating and interpreting the instantaneous

frequency of a signal. I. Fundamentals, Proc. IEEE 80 (4)

(1992) 520–538.

[16] B. Boashash, Estimating and interpreting the instantaneous

frequency of a signal. II. Algorithms and applications, Proc.

IEEE 80 (4) (1992) 540–568.

[17] N.E. Huang, et al., The empirical mode decomposition and

the Hilbert spectrum for nonlinear and non-stationary time

series analysis, Proc. R. Soc. London A 454 (1971) (1998)

903–995.

[18] F. Gianfelici, G. Biagetti, P. Crippa, C. Turchetti, Multi-

component am-fm representations: an asymptotically exact

approach, IEEE Trans. Audio Speech Lang. Process. 15 (3)

(2007) 823–837.

[19] G. Rilling, P. Flandrin, One or two frequencies? The

empirical mode decomposition answers, IEEE Trans. Signal

Process. 2008, doi:10.1109/TSP.2007.906771.

[20] S. Mann, S. Haykin, The chirplet transform: a generalization

of Gabor’s logon transform, in: Proceedings Vision Interface

1991, 1991, pp. 205–212.

[21] L. Angrisani, M. D’Arco, A measurement method based on

a modified version of the chirplet transform for instanta-

neous frequency estimation, IEEE Trans. Instrum. Meas. 51

(4) (2002) 704–711.

[22] H.M. Ozaktas, Z. Zalevsky, M.A. Kutay, The Fractional

Fourier Transform with Applications in Optics and Signal

Processing, Wiley, Chichester, 2001.

[23] D.H. Bailey, P.N. Swarztrauber, The fractional Fourier

transform and applications, SIAM Rev. 33 (3) (1991)

284–404.