Download - Novel wavelet domain Wiener filtering de-noising techniques: Application to bowel sounds captured by means of abdominal surface vibrations

www.elsevier.com/locate/bspc

Biomedical Signal Processing and Control 1 (2006) 177–218

Novel wavelet domain Wiener filtering de-noising techniques: Application

to bowel sounds captured by means of abdominal surface vibrations

C. Dimoulas a,*, G. Kalliris b, G. Papanikolaou a, A. Kalampakas c

a Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki University Campus, Thessaloniki 54124, Greeceb School of Journalism and Mass Communication Media, Aristotle University of Thessaloniki, Thessaloniki University Campus, Thessaloniki 54124, Greece

c Gastrenterology Department, Papageorgiou General District Hospital, Perifereiaki Odos, 56403 Thessaloniki, Greece

Received 15 February 2006; received in revised form 22 June 2006; accepted 18 August 2006

Available online 17 October 2006

Abstract

This work focuses on the design and evaluation of efficient and accurate de-noising algorithms that combine robust signal enhancement and

minimum signal distortion. The proposed method introduces novel, frequency depended, parametric, Wiener filtering techniques that involve

Discrete Wavelet Transform and Wavelet Packets. Implementations of various decomposition schemes, different mother wavelets and various

thresholding options were tested, while perceptual criteria were also taken into account. The introduced de-noising approach has been extensively

tested on human bowel sounds, captured by means of abdominal surface vibration recordings, in order to be further utilized as a diagnostic tool.

Qualitative and quantitative analysis of the method’s performance, when applied to various types of recorded and synthetic sounds, revealed that

the new approach works excellent with favourable results.

# 2006 Elsevier Ltd. All rights reserved.

Keywords: Wavelets; Wiener filter; De-noise; Signal enhancement; Bowel sounds; Abdominal vibrations; Gastrointestinal phonography

1. Introduction

The background noise removal problem is addressed in

many research directions of various scientific fields and

different implementation approaches, including non-audio

applications. There is a pluralism of noise reduction references

in many areas of the communications domain, including

speech, music, video, vibrations, bio-signals, medical imaging.

A general model that addresses such problems suggests the

following steps: (i) transformation (in the broad sense of the

term) of the original signal to the appropriate domain that best

separates signal from noise, (ii) processing of the transformed

noised-signal components, aiming at noise elimination, (iii)

inverse transformation of the processed components to obtain

noise-free signal (Fig. 1). Many transformation and analysis–

synthesis schemes have been utilized according to this general

model.

The basic intention of audio restoration techniques is the

improvement of speech intelligibility and music quality, for

* Corresponding author. Tel.: +30 2310933868; fax: +30 2310996309.

E-mail address: [email protected] (C. Dimoulas).

1746-8094/$ – see front matter # 2006 Elsevier Ltd. All rights reserved.

doi:10.1016/j.bspc.2006.08.004

speech and audio recording enhancement, respectively. The

analysis of human auditory systems has allowed the introduc-

tion of perceptual criteria, which have further extended the

potentials of audio restoration. Critical bands analysis, audible

noise suppression and elimination of audible artefacts are the

key concepts for these ‘‘perceptual’’ approaches [1–8]. With

respect to instrumentation and measurements, including

biomedical and bioacoustics applications, it is not very

common to utilize such approaches. The most common

methodologies tend to use combinations of adaptive filtering

techniques and decomposition–reconstruction schemes, includ-

ing classical spectral subtraction [9–11]. Wavelets fall under the

second sub-category, whereas most work has mainly been

concentrated on seeking the ‘‘best’’ signal decomposition–

reconstruction topologies, as well as the optimum threshold

strategy adoption [12–19].

The current work was motivated from the de-noising demands

in human bowel sounds, captured as abdominal surface

vibrations, in order to be further facilitated for diagnostic

purposes. Some of the key concepts and unquestionable targets of

the current research, since its initiation, were robust noise

cancellation, minimal signal distortion and feasible implementa-

tion for long-term analysis. The implementation proposed in this

mailto:[email protected]

http://dx.doi.org/10.1016/j.bspc.2006.08.004

C. Dimoulas et al. / Biomedical Signal Processing and Control 1 (2006) 177–218178

Fig. 1. The general de-noise processing model where s(i) is the noise-free signal, n(i) the additive noise, x(i) the noise contaminated signal, T[ ] represents the

employed signal transformation, T�1[ ] the inverse transformation and s�(i) the noise-free estimated signal (the noise-reduction outcome).

paper, is a balance between the above described methods, aiming

to facilitate their advantages and to combine efficiency, reduced

complexity and compromised computational cost. Thus, the

basis of the method is the generalized Wiener filter [2,20–22] in

the wavelet domain. Decomposition schemes and perceptual

criteria are taken into account.

1.1. Current status of bowel sounds processing

During the past decades, many researches have pointed out

the lack of a reliable and easy to apply method for long-term

monitoring of intestinal contractile activity, in order to study the

gastrointestinal motility physiology [23,24]. The absence of

such a method leads to the fact that certain gastrointestinal

functional disorders are still diagnosed indirectly, via elimina-

tion of other diseases. The irritable bowel syndrome (IBS) is the

most representative case, while other functional disorders such

as abdominal bloating, functional dyspepsia, diarrhoea,

constipation and abdominal pain fall in the same category

[24–27]. Most of these symptoms are strongly affected to

various factors like nutrition, medication and stress; so that

their influence to intestinal motility is another issue that needs

further research [25–28].

Manometry and electromyography stand as the most

common investigative methods that have been tested the

possibilities to be used for medical knowledge extraction and

diagnosis of the related abnormalities [29–34]. An example of

the initial foundations of those works is the Motor Migrating

Complex (MMC) theory, which has proved that the ‘‘fasting

state’’ intestinal contractile activity (meaning that there have

been passed at least 2 h since the subjects’ last meal) is

repeated periodically in cycles of silence periods, ‘‘regular’’

events and irregular contractions [29–34]. Besides this

predetermination, medical experience on interpreting intest-

inal contraction signals and relating them to physiology issues

is very limited [23,24]. Their utilization has been mainly

restricted to research studies, since they exhibit certain

disadvantages to be used in clinical practice, especially for

prolonged monitoring periods. Intraluminal manometry

methods are painful and inconvenient to apply on human

subjects, due to their invasiveness. Nevertheless, they are still

capable of ‘‘clean’’, multi-site pressure recordings [29–31,33].

Operative electromyography seems to feature similar advan-

tages and disadvantages. On the other hand, surface electro-

myography, is easy to apply but the information provided is

very poor [33]; nevertheless, research efforts on cutaneous

electromyography are continued [34,35].

Bowel-sounds (BS) auscultation was proposed as an

alternative medical-study approach, in order to overcome the

previous stated obstacles. BS are generated by the contractile

activity of the human digestion organs, especially the

propulsive movements of the small intestine, ordered to

propagate the viscous, down to the gastrointestinal track

[30,32]. The mechanical energy released from the intestinal

contractions is usually captured at the abdominal surface, by

means of pressure (sound signals), displacement, velocity and

acceleration (abdominal surface vibrations (ASV)). In this

manner, there are direct relationships between manometry,

electromyography and ASV sensing, in how they monitor

intestinal motility. All methods measure the effects of muscular

convulsions, which produce pressure alterations over time and

electrical myogenic activity. Implementation of BS monitoring

has many advantages over traditional investigative methods

(including X-ray screening approaches), because it is easier to

apply, non-invasive, painless, does not cause discomfort to

subjects, its influence to other psycho-physiology issues is

limited, and can be applied for prolonged periods. Additionally,

various sound-features can be implemented in order to analyse

the gastrointestinal mechanical activity, by means of both

quantitative (e.g., power, energy, average contractions) and

qualitative (pitch-frequency, duration, impulsiveness) descrip-

tion schemes [24].

In general, we may distinguish two research approaches in BS

analysis [24]. The first one is usually applied on isolated BS

events, evaluating signal properties and acoustic characteristics

in order to extract medical knowledge, related to organic

abnormalities and pathologies. However, limited research efforts

of this kind have been reported [36,37]. On the other hand, most

approaches of gastrointestinal auscultation focus in the evalua-

tion of the overall contractile motor activity, by determining time

periods with presence or absence of BS. Within this second

subcategory, two different strategies are usually deployed [24]:

(i) BS activity is captured for prolonged periods (about 2 h or

more) whereas signal averaging and integration utilities are

employed to provide long-term level alteration plots in the

former [23,40–43], (ii) limited duration recordings (saying 2 min

recordings usually starting 2 h after subjects’ meal) are acquired

as representative samples of the entire mechanism of gastro-

intestinal motility, in the later [44–46]. In both strategies we

might observe advantages and disadvantages [24].

C. Dimoulas et al. / Biomedical Signal Processing and Control 1 (2006) 177–218 179

However, knowledge and interpretation of BS has advanced

little since Cannon’s pioneering work [47], so that there is a

lack of a reliable and accurate method for use in clinical

practice. Most researchers seem to agree that this is not due to

redundancy of diagnostic information of BS, but due to

insufficient scientific support [24,48]. There are many

difficulties connected with ASV sensing and BS analysis,

mainly caused by the weak nature of the produced acoustic

phenomena, as well as the peculiar characteristics of the sound

propagating medium. Satisfactory BS acquisition demands

ultra sensitive electroacoustic transducers and high amplifica-

tion signal conditioning circuits, while subjects’ safety is

critical. Thus, one of the major factors that influence the

potential of BS analysis is noise contamination [48–56]. The

noise removal process is essential for BS enhancement, for

easier and more efficient auscultation of BS from clinicians,

and unsupervised analysis of long-term ASV monitoring. Thus,

it is a necessary processing stage to all the previous BS-analysis

approaches [23,24,42,43,48–57].

There are many issues to be considered for the influence of

the interfering noise to BS analysis. First of all, the presence of

noise complicates the audible interpretation of the original

signal, resulting to problematic and tiresome auscultation.

Secondly, masking effects are possible to appear so that

detection and interpretation of low-amplitude signal-frequency

components, become tricky. In this sense, various audible

artefacts often arise. The afore-mentioned issues are likely to

lead to major estimation errors, because (i) medical analysis is

difficult to progress in supervised schemes (especially for long-

term monitoring), since clinicians’ audible interpretation of the

recorded signals is not available, (ii) signal-energy parameters

and other audio features, employed for analysis purposes, are

miscalculated, (iii) numerical analysis and automated proces-

sing can produce misclassification errors. These issues are also

dominant to all the above-mentioned strategies of BS analysis.

This is the reason that many research efforts have been recently

appeared in bibliography, focusing on robust signal enhance-

ment of the noise contaminated BS. Except of the adaptive

filtering approach of Mansy and Sandler [49], most of these de-

noising methods are dealing with the problem of additive

broadband noise elimination, employing auto threshold

estimation strategies, in combination with wavelet-statistics

[48,50,51,53], wavelets-fractal dimension [54,55] and higher

order statistics [52,56].

In fact, all the previously stated methods are based on the

initial work of Coifman and Wickerhauster [9] that employ

iterative wavelet processing in combination with best basis

selection. Hadjileontiadis and Panas [10] had initially modified

the original algorithm, keeping only the iterative structure

(without the best basis selection functionality) and providing a

new auto-threshold estimation procedure. The implemented

‘‘wavelet transform-based stationary–non stationary’’ (WTST–

NST) filter [10] was proposed for lung-sounds de-noising

processing. This parametric approach has also been imple-

mented for BS processing [48,50,51], while implementation

improvements have also been suggested [53]. The ‘‘wavelet

transform-fractal dimension-based method (WT-FD) was then

proposed by Hadjileontiadis [54,55] for lung sounds and BS de-

noising purposes, providing a novel approach in estimating

wavelet thresholds, while keeping the rest of the method intact.

The iterative processing procedure was also implemented in a

new fashion, employing higher order statistics instead of

wavelets [52,56]. The proposed Iterative Kurtosis-based

Detector (IKD) is also intended for lung sounds and bowel

sound detection and analysis [52,56].

According to the results of the previous paragraph’s research

works, all these variations of the iterative structured algorithms

are very efficient when they are applied in explosive bowel

sounds (EBS) [48,52,54–56], also referred as intestinal bursts

(IB) [57], pops, tingles and clicks [38,39,47]. However,

regularly sustained (RS) BS signals [57], with smoother

attack-release envelopes and longer durations, are also found

inside BS recordings, very frequently [38,39,47]. As a result,

the utilization of EBS de-noising algorithms is usually

problematic in the case of the RS events (also referred as

borborygmus, crepitating sounds, gurgling, rumbling, or

growling noise, including cases of whistling—musical BS),

causing either insufficient de-noising, or serious destructions of

the signals’ morphological structures [24]. Furthermore, their

implementation in automated long-term processing is risky,

producing de-noising artefacts quite often. The computational

cost of the repeated threshold estimation and the iterative

processing scheme is another serious drawback [24]. As already

stated, the current work aims to overcome the previously

described weaknesses, providing novel Wiener filtering de-

noising techniques, which can be efficiently applied on both IB

and RS patterns of BS, with compromised complexity and

computational cost.

1.2. Problem definition

A common problem to all instrumentation set-ups that

demand high amplification is the presence of the so-called,

additive broadband background noise (ABN). Some of the

reasons that cause the production of ABN are (i) ‘‘thermal’’

noise induced along the data acquisition chain, (ii) ambient

acoustic noise, (iii) quantization noise, especially in cases

where recording levels are not adjusted properly.

Let x(i) be the sampled version of a noise contaminated

signal x(t), whereas s(i) and n(i) are the sampled versions of the

‘‘clean’’ signal s(t) and the uncorrelated with the signal,

additive random noise n(t):

xðiÞ ¼ sðiÞ þ nðiÞ (1)

The aim of noise reduction techniques is to recover an

approximation s�(i) as closely as possible to the original clean

signal s(i), in order to eliminate noise components. Two of the

most famous de-noising approaches are spectral subtraction

and wavelet thresholding. Methods of the first type emphasize

minimal signal distortion. Wavelet thresholding strategies, on

the other hand, can provide rough noise reduction results,

so that they are likely to affect useful signal components

besides noise.


The current approach aims at implementing an accurate and

relatively fast, universal de-noising method that can be easily

applied for long-term analysis. Elimination of time disconti-

nuities and artefact-caused misinterpretation, due to severe

noise contamination, were considered as issues of major

importance. The proposed Wavelet Domain Wiener Filter

(WDWF) method satisfies the afore-mentioned prerequisites by

combining classical Wiener filter efficiency [20–22], bark scale

wavelets flavour [58–63] and compromised computational cost

of Fast Wavelet Transform algorithms [64].

2. Material and methods

Since the proposed WDWF method is a combination of

classical spectral subtraction and wavelet domain processing, a

quick report on both signal-processing fields is necessary,

before presenting the implemented algorithms.

2.1. Spectral subtraction and parametric Wiener filter

Spectral subtraction was introduced by Boll [65] in 1979 and

still remains one of the most popular methods for background

noise reduction [66–68] or as standard reference when

evaluating other noise reduction techniques [69,70]. It is based

on filtering of the noisy signal using a time-varying filter

applied to the frequency domain. Let us turn our attention now

to the general de-noise model of Fig. 1, as it was described in

the previous section. If X(k), S(k) and N(k) are the spectra of the

noise contaminated signal x(i), the original clean signal s(i) and

the noise signal n(i), respectively, estimated using short time

spectral analysis, such as the STFT, then:

XðkÞ ¼ SðkÞ þ NðkÞ) SðkÞ ¼ XðkÞ � NðkÞ (2)

The solution to the de-noise problem is to formulate a filter

H(k) that best approaches the spectral subtraction operation

described in (2), so that we would be able to extract an

estimation S�(k) of the clean signal, which is the output of the

filter, the available signal X(k) being the input. Assuming that

the noise signal is a stationary random process, we may get an

estimation NFP(k) of the noise spectrum (noise footprint), by

applying Fourier Transform to available signal silence periods.

With this line of reasoning, the estimation of the clean signal is

given by:

S� ðkÞ ¼ XðkÞ � NFPðkÞ ¼ HðkÞ � XðkÞ (3)

The two basic short time spectral analysis noise reduction

methods are the magnitude spectral subtraction

S� ðkÞj j ¼ XðkÞj j � NFPðkÞj j; XðkÞj j> NFPðkÞj j0; otherwise

�(4)

and the power spectral subtraction

S�ðkÞj j2

¼PXðkÞ�a �PNFP

ðkÞ; PXðkÞ�a �PNFPðkÞ>b �PNFP

ðkÞb �PNFP

ðkÞ; otherwise

�(5)

where

PXðkÞ , EfjXðkÞj2g; PNFPðkÞ , EfjNFPðkÞj2g (6)

and a, b are real valued positive parameters employed to

control the amount of subtraction and the remaining noise

floor [2,6]. Thus, without further processing of the noisy signal

phase [4,6], the clean signal can be estimated from the Inverse

Fourier Transform (IFT) using the following formula:

s� ðiÞ ¼ IFTfjS� ðkÞj;]ðXðkÞÞg (7)

Another classical noise reduction technique is the para-

metric Wiener filter, which can be proved to be equivalent to the

spectral subtraction procedure, when applied to time limited

signals [20,21]. Wiener filter minimizes the mean square error

of the estimate’s time domain reconstruction for the case of

uncorrelated, zero-mean, additive noise [6]. The mathematical

expression for the transfer function HPWF of the parametric

Wiener filter is given below:

HPWFðkÞ ¼ 1� cNFPðkÞj j

XTðNFPðkÞÞj j

� �a� �b

; if cjNFPðkÞjjXTðkÞj

� �a

� 1

0; otherwise

8><>:

(8)

where a, b, c are the real valued parameters of the filter and

XT(k) is the spectrum of T duration windowed signal. NFP(k) is

again calculated as the Fourier Transform of a noise only

segment of signal (noise footprint) [2,4,6,20–22].

The noise estimation procedure as stated up to this point,

does not consider any perceptual criteria that are related with

the human auditory system. It is well known that human

response to audio stimulus is not spectrally uniform along the

range of audible frequencies. The formulation of such a

psycho-acoustic factor extended the potential of the so-called

Frequency Depended Parametric Wiener Filter [20,21].

Equivalent masking noise estimation analysis [20,21,71]

has been concentrated to examine the model of pure tones

masked by white noise, as the worst case of perceiving a

desired tonal signal in the presence of broad-band acoustic

noise (Fig. 2). According to the results of those studies

[20,21,71], it could be stated that human auditory system

filters broad-band acoustic noise, according to a transfer

function given by,

Aðk � d f Þ ¼1; for k � d f � 500

k � d f

500; for k � d f > 500

8<: (9)

where df = fs/N the involved frequency resolution of the STFT.

If we include the above transfer function in the noise

estimation procedure, this would result to a frequency

depended method, simulating the perception of broad

band acoustic noise by the human ear [20,21]. The Frequency

Depended Parametric Wiener Filter is implemented


Fig. 2. Level (SPL) of test tones just masked by white noise of given density level Lwn, as a function of the test-tone frequency. The dashed curve indicates the

threshold in quiet (Source: Zwicker and Fastl [71], p. 57).

according to

HFDPWFðkÞ

¼ 1� c �AðkÞ � NFPðkÞj jXTðkÞj j

� �a� �b

; if c �AðkÞ � NFPðkÞj jXTðkÞj j

� �a

� 1

0; otherwise

8><>:

(10)

2.2. Wavelet domain de-noising techniques

The implementation of the above Wiener filter formulas

involves short-term spectral analysis, usually accomplished via

the STFT. To do so, signal is windowed to time-overlapped

sequential frames, prior to Wiener filter process. According to

Heisenberg’s principle [58,62] there is a limit to the time–

frequency resolution product:

Dt � D f � 1

4p(11)

where Dt is the time resolution and Df the frequency resolution,

depending on the STFT windowing operation, that is deter-

mined by the sampling frequency fs, the windowing function

and the window-length N [58]. The previous equation points out

that it is possible to achieve fine time resolution forcing poor

frequency resolution and vice versa, but there is no way of

achieving best resolution in both fields at the same time. Thus,

once parameters fs and N have been selected, time and fre-

quency resolutions are defined, and they are linearly expanded

over the time or the frequency axis. On the other hand, effective

representation and analysis of audio signals require fine time

resolution at high frequencies and fine frequency resolution at

low frequencies, the so-called ‘‘constant Q analysis’’. The term

‘‘constant Q analysis’’ was first introduced to filter bank

implementations, whereas the Q factor of each filter, which

is equivalent to the central frequency ( fc) divided by the

bandwidth (BW), remains constant (Q = fc/BW) [58].

2.2.1. Wavelet transforms

Wavelet transform has been introduced as a more elegant

approach to achieve the previously described prerequisite for

fine time and frequency resolution. Some of the aims and

motives of wavelet analysis was the elimination of drawbacks

appeared in filter bank processing, such as the increased amount

of data, the computational complexity and cost, issues related

with filter delay parameters. Wavelet processing offers an

alternative spectral analysis approach that features the flavour

of fine, bark scale, resolution, promising easier implementation

and lower computational complexity [58,60,62].

Wavelets are families of functions generated from a

‘‘mother’’ function, the mother wavelet, after scaling and

dilation (time shifting). The sum of the inner products of signal

and wavelet functions, the so-called wavelet coefficients,

results to the Continuous Wavelet Transform (CWT). If L2(R)

denotes the vector space of measurable, square integrable one-

dimensional functions and x(t) one-dimensional function such

as x 2 L2(R), then the following equations are valid in a Hilbert

space:

Cu;tðtÞ ¼1ffiffiffiffiffiffijuj

p � c�

t � t

u

�; u; t 2R; u 6¼ 0 (12)

CWTxðu; tÞ ¼

x;cu;t

¼Z

xðtÞ � c�u;tðtÞ d t (13)

where u, t are the scale and dilation parameters, respectively,

c(t) the mother wavelet, cu,t(t) the family of wavelet functions

and * represent the complex conjugate. An analogous expres-

sion is used for the Inverse Continuous Wavelet Transform.

Since the full analysis of the wavelet transform exceeds this

paper’s intension, we will focus our attention on the specific

topics of Discrete Wavelet Transform (DWT) and Wavelet

Packet analysis (WP) [58,60,62,72], which have been

employed in the current approach. DWT has been suggested

in an attempt to reduce wavelet coefficients by restricting

parameters u, t to some discrete values. The elimination of


Fig. 3. The 2-channel perfect reconstruction scheme using Quadrature Mirror Filters (QMF) where s(i) the input signal, h[n] and g[n] are impulse responses of the

high-pass and the low-pass (respectively) analysis filters, h�[n] and g�[n] are impulse responses of the high-pass and the low-pass (respectively) synthesis filters.

different u, t combinations with regarding to minimum influ-

ence on the analysis efficiency, suggests the adoption of a

‘‘dyadic grid’’, similar with that used in constant Q or octave

analysis, whereas u, t, lie in the scheme: u = 2j, t = k�2j, k,j 2 Z.

Thus, wavelet functions of the DWT are of the following form:

c j;kðtÞ ¼ 2 j=2 � cð2 jt � kÞ; j; k2Z (14)

A mathematical interpretation of the above notation is that

signal is decomposed to ‘‘approximations’’ and ‘‘details’’

components, similar to the operation produced from iterative

projection to nested sub-spaces [58,60,62]. Real world

implementations of wavelet transforms are achieved via digital

signal computing, so that analog to digital conversion is

required prior to any processing. In this case signals x(t) and

c(t) of the Eqs. (12)–(14) are substituted with their digitized

versions x(i�dt) and c(i�dt), so that numerical processing is

finally applied to the sequences {x(i)}, {c(i)}.

A common practice for fast implementation of DWT is the

use of Quadrature Mirror Filter-banks (QMF). Multi-stage

signal filtering is performed, so that signal band splitting is

achieved at every stage (Fig. 3). If h(n), g(n) are the impulse

responses of the high pass and low pass frequency filters,

respectively, the selection of the appropriate reconstruction

filters h�(n), g�(n), leads to perfect reconstruction of the initial

signal [3,58,60,62]. Thus, if no other processing is involved, the

resulted signal, produced as the outcome of the decomposition–

reconstruction procedure, is a delayed version of the exact

initial signal. This ‘‘2 channel perfect reconstruction filter

bank’’ approach can be implemented using half band FIR filters

Fig. 4. Discrete Wavelet Transform decomposition using Quadrature Mirror Filter-ba

low-pass analysis filters, h�[n] the high-pass synthesis filters and g�[n] the low-pass s

(detail coefficients) and low-pass filtered data (approximation coefficients), resp

decomposition level at each node.

and it is also referred to as Fast Wavelet Transform [62,64]. The

relationship between filter responses and mother wavelet or

scaling function [60,62] allows the configuration of wavelet

analysis parameters according to the initial theoretical

foundation, and their FWT materialization. There are two

basic analysis-synthesis schemes that correspond to different

wavelet implementations. The first, where the iterative signal

decomposition is applied only to the low frequency bands

(Fig. 4), is equivalent with the known DWT as expressed by

Eq. (14). The other, where both low and high frequency bands

can be half-band filtered (Fig. 5), is known as Wavelet Packet

(WP) analysis [58,62].

From the figures it is shown that an applicable notation for

band indexing is implemented using combinations of ‘‘0’’

and ‘‘1’’, so that ‘‘0’’ corresponds to the low frequency half-

band splitting and ‘‘1’’ to the high frequency portion.

Additionally, sub-sampling, by a factor of 2, takes place after

each band-splitting, in order to avoid increasing the total

number of samples. This multi-resolution analysis operation

does not affect the content of the original signal, since the

‘‘perfect reconstruction’’ filter combinations h(n), h�(n) and

g(n), g�(n) will reverse any effect produced, during a

counterpart up-sampling operation at every reconstruction

stage. As a result, the total number of samples of all bands

never exceeds the initial number of samples of the signal to

be processed.

2.2.2. Wavelet thresholding rules for noise reduction

We turn now to the general noise reduction model of Fig. 1.

If we use DWT and WP for the involved signal transform T,

nks (QMFs), with s(i) the input signal, h[n] the high-pass analysis filter, g[n] the

ynthesis filters. The digits ‘‘1’’ and ‘‘0’’ correspond to the high-pass filtered data

ectively, while the length of the binary sequences ‘‘001. . .’’ represent the


then, the de-noising problem is reduced to find the appropriate

threshold values and apply thresholding rules to the wavelet

coefficients, before inverse transformation. Many studies have

focused on both tasks previously described. Statistical rules,

entropy-based criteria and perceptual approaches have been

proposed for threshold estimation, which is either constant or

rescaled to the involved wavelet decomposition bands, or even

applied using iterative schemes [2,3,6,9–14,17,48,51,53–55].

However, there are two basic strategies of applying threshold-

based processing: the soft thresholding rule and the hard

thresholding rule [13,62]. Both of them are described in the

following equation:

xhard-thðiÞ ¼xðiÞ; if xðiÞj j> t0; otherwise

�

xsoft-thðiÞ ¼signðxðiÞÞ � ðjxðiÞj � tÞ; if jxðiÞj> t0; otherwise

� (15)

where t is the threshold value to compare with, meaning that

coefficients with amplitudes smaller than the threshold are

considered as noise components, x(i) the input signal (wavelet

coefficients in our case) and xhard-th(i), xsoft-th(i) are the outputs

of the hard-thresholding and soft-thresholding filters, respec-

tively.

2.2.3. Wavelet based implementations of Wiener filter

Recalling Eq. (8), if XkðwÞ are the WT coefficients of the k

band w ¼ 0; 1; � � � ;WXk � 1, then the adaptation of the

frequency-depended parametric Wiener filter to the wavelet

domain would induct a unique, for each band k, transfer

function Hk

Hk ¼ 1� c � Akw �hNFP-kðwÞihXkðwÞi

� �a� �b

; if c � Akw �hNFP-kðwÞihXkðwÞi

� �a

� 1

0; otherwise

8<: (16)

where NFP-kðwÞ are the noise footprint wavelet coefficients and

Akw the ‘‘wavelet-estimated’’ perceptual parameter A, at the

band k:

hNFP;kðwÞi ¼1

WNk

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXWNk�1

w¼0

½NFPðwÞ�2vuut (17)

hXkðwÞi ¼1

WXk

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXWXk�1

w¼0

½XkðwÞ�2vuut (18)

Akw ¼1; for f c-k � 500

f c-k

500; for f c-k > 500

((19)

The parameter fc-k of Eq. (19) corresponds to the central

frequency of the band k, calculated as the geometric mean of the

band’s frequency limits. If no perceptual criteria are employed,

Akw is disabled (Akw = Ak0 = 1). Following the previously

described implementation, if processing of the k band’s WT

coefficients is applied prior to the reconstruction phase, using

(16), the resulted filtered coefficients S�k ðwÞ are the Wiener

filter outcome:

S�k ðwÞ ¼ Hk � XkðwÞ (20)

Eq. (16) describes a noise-reduction process that is usually

referred as ‘‘oracle attenuation’’ [62]. ‘‘Oracles’’ simplify the

estimation by providing information about the signal that is

normally not available, so that a lower-bound risk of the de-

noising error is obtained [62]. In the present situation, the noise-

free signal is estimated with the hypothesis that the additive

noise is a stationary process that retains its spectral

characteristics, which can be estimated from the available

‘‘noise footprint’’.

The wavelet domain processing described by (20) has, so far,

been successfully applied to many audio coding and signal

enhancement applications [72,73]. However, signal windowing

and overlapping strategies are necessary in order to achieve

acceptable results and to reduce artefacts. Masking criteria and

other perception-based thresholds have also been reported for

increasing robustness, while various pre-echo cancellation

techniques are important to maintain high compression ratio

and unnoticeable effect on audition quality, for audio

compression purposes [3,6,74]. The frequency resolution,

obtained from wavelet transforms, depends on the total number

of analysis bands formed during the decomposition scheme.

2.3. Modified wavelet domain wiener filters

As already stated, the targets of the WDWF approach are:

reduced complexity, fine time–frequency resolution, easy

windowing configuration and elimination of time disconti-

nuities or other artefacts. To meet such demands, the adaptation

of Wiener technique to the wavelet domain had to be achieved

in a different way.

2.3.1. Point to point WDWF type I

A point to point analysis scheme is proposed, whereas the a-

powered signal effective value at the band k hX(k,w)ia, is

replaced with an alternative, more convenient parameter, with

equivalent significance:

Px;aðk;wÞjI ¼ d � jXðk;wÞja þ ð1� dÞ � Px;aðk;w� 1Þ

w ¼ 0; 1; � � � ;WXk � 1(21)

where the notation Xðk;wÞ represents the wth wavelet coeffi-

cient at band k of the noised signal x(i), instead of the XkðwÞ,used in Section 2.2.3 (the new notation is introduced in order to

easily distinguish the two approaches). Similarly, the notations

S� ðk;wÞ, N � ðk;wÞ and NFPðk;wÞ are used for the wavelet

coefficients of the de-noised signal s�(i), the extracted noise

n�(i) and the noise footprint nFP(i), respectively.


Fig. 5. Wavelet Packet decomposition using Quadrature Mirror Filter-banks (QMFs).

Eq. (21) introduces an exponential moving average

estimation for the a-powered magnitude signal estimation

Px;a(k,w)jI. The real valued momentum term d is spaced in the

interval [0,1] and it is used to control the amount of memory

taken into account for the calculation. Similar approaches have

been utilized in relative, frame-based, spectral subtraction or

Wiener filtering noise cancellation applications [20,21,73,75].

With this settlement, finest time resolution is achieved, while

computational complexity is minimal. The estimated value is

sensitive to the preceded samples of the signal, which is in

accordance to the Haas precedence effect of psychoacoustics

[71], offering a potential advantage towards perceptual audio

processing. However, signal discontinuities are likely to be

produced at the initiation of the iterative process, for the 0th

sample specifically, where no previous estimation is available.

In an effort to overcome this handicap, the Eq. (21) is re-formed

as follows:

Px;aðk;wÞjI ¼jXðk;wÞja; w ¼ 0

d � jXðk;wÞja þ ð1� dÞ � Px;aðk;w� 1Þ; w ¼ 1; � � � ;WXk � 1

�(22)

With this last arrangement the proposed type-I WDWF

allows a point to point processing at every band:

HWDFWðk;wÞ ¼1� c � Akw �

PnFP;aðkÞPx;aðk;wÞ

� �� b

; if c � Akw �PnFP;aðkÞPx;aðk;wÞ

� �� 1

0; otherwise

8><>: (23)

whereas Px;a(k,w) is the Px;a(k,w)jI estimation of Eq. (22), Akw is

still calculated from (19) and the noise estimation PnFP;a(k) is

the average of all the a-powered magnitude noise estimations

NFP;a(k,w), provided by the k-band wavelet coefficients of the

noise footprint:

PnðkÞ ¼1

WNk

XWNk�1

w¼0

jNFPðk;wÞja (24)

Thus, the wavelet domain, frequency-depended, parametric

Wiener filtering operation is described in the following

equation:

S� ðk;wÞ ¼ HWDWFðk;wÞ � Xðk;wÞ; w ¼ 0; 1; � � � ;WXk � 1

(25)

where k are all the available analysis bands produced from

DWT or WP decomposition.

The introduction of the exponential moving average scheme

for the estimation of the noised-signal coefficients power is

advantageous, over both Fourier-based short-term spectral

amplitude (STSA) approaches and classical wavelet de-

noising. It is also beneficial compared to previous exponential

moving average implementations, adapted to filter banks. Thus,

in contrast to the predecessor STFT Wiener filter, the WDWF

approach, as it is described from Eqs. (21)–(25), retains the


flavour of the logarithmic frequency resolution, instead of the

classical linear spacing. Additionally, the method features

increased time resolution, in contrast to the classical wavelet-

Wiener approaches described in Section 2.2.3, while the

reduced complexity and computational load is also profitable.

Another attribute of the proposed scheme, which places it in

advantageous position over the non-wavelet filter-bank

implementations [20,21], is the multi-resolution nature of the

wavelet transform. The length of the wavelet coefficients is kept

equal to the initial length of the input sequence, i.e., a

settlement that compromises the computational cost, in contrast

to the implementations of [20,21], where the total size of the

band-filtered sequences is multiplied by the number of the

involved analysis bands. Furthermore, the multi-resolution

scheme allows for identical filter-configuration over all wavelet

scales, where in the case of the classical filter-bank filtering

[20,21] a band-adaptation is required, mostly for the influence

of the memory term d that will be analyzed in the following

paragraphs.

In all the previous formulas of the parametric Wiener filter

implementations, parameters a, b, c are used to adjust the filter

transfer function, according to the degree of noise contamina-

tion [2,20–22]. In general, filter-response is controlled, so that

noised signals are highly compressed for low signal to noise

ratios (SNR), while the attenuation level gets smaller for higher

SNR values. In fact, the filter-magnitude curve is asymptoti-

cally approaching the 0 dB level (Hk = 1) as the SNR gets

bigger. In this context, increasing the a parameter results to

smoother filtering, since, for SNR values greater than about

2 dB the filter-magnitude response is moving closer to the 0 dB

level [2,20,21]. The exact opposite behaviour is observed for

the parameters b and c, given that a rougher filtering is initiated

when the corresponding parametric values are raised.

Specifically, the filter-response curves are bending to reach

about 60 dB attenuation at 0 dB SNR, for typical values of b = 4

and c = 1.5 [2,20,21]. In any case, the filter attenuation function

with respect to SNR changing has a characteristic exponential

curve, which starts from very large attenuation values at low

SNR (jHkj > 15–20 dB, for SNR < 5 dB) and asymptotically

reaches the 0 dB level at higher SNR (SNR > 15–20 dB). More

information about the influence of a, b and c parameters and the

related filter attenuation curves, may be found in [2,21].

As already stated, the current WDWF approach differs from

the parametric Wiener spectral subtraction [2,22], mostly due to

the exponential moving average procedure, that is used instead

of classical windowing approaches. This introduces an

additional parameter, the memory term d, which is also

involved in the filter-configuration process. However, the

influence of this parameter is not so obvious as in the cases of a,

b and c, so it is meaningful to emphasize on this aspect,

providing more information about the behaviour of the memory

term d. Exponential averaging techniques are used in various

sound pressure level estimation techniques [76], to take

advantage of the fact that most acoustic phenomena feature

similar to the exponential curves, attack-release envelopes.

Additional benefits are related to perceptual attributes [71], as

well as to their desirable operational characteristics, such as

simplicity, easy implementation and efficiency [76]. In contrast

to the classical window-based sound-level estimation, expo-

nential averaging is applied by filtering the instantaneous sound

power signal (�signal2), using a first order infinite impulse

response (IIR) filter with reverse coefficients {a0,a1} and

forward coefficients {b1}: [76]

a0 ¼ 1; a1 ¼ �e�ð1= f s�twÞ

b1 ¼ 1� e�ð1= f s�twÞ ¼ 1þ a1

(26)

In the above formulas, fs is the sampling frequency [Hz] of

the input sequence and tw the equivalent duration [s] of the

‘‘exponential windowing’’ operation [76]. Comparing Eqs. (21)

and (26), it is obvious that the filter coefficient b1 and the

memory term d are identical. Thus, the reverse and forward

coefficients of the ‘‘exponential average’’ IIR filter are {1,

d � 1} and {d}, respectively. Using Eq. (26), we are able to

calculate the equivalent windowing duration for various values

of the memory parameter d. Before we do so, let us turn our

attention to the significance of the duration tw and its influence

to the Wiener filtering process, issues that can be synopsized to

the general remarks that follow. The length of the window

should be controlled according to signal variations and its

morphological structure, meaning that a signal with abrupt

changes (usually with high frequency content) requires short

averaging lengths, while longer durations might be used for

sustained (lower frequency) signals. Thus, the shorter the

length, the more adaptive the Wiener filtering process,

especially for the cases that useful-signal is not always present

(as it happens with most natural phenomena), resulting to the

total elimination of the ‘‘noise-only’’ intervals. However, a

minimum window-length is required in order to avoid

‘‘interruptions’’, or erroneous power estimation of the useful

signal components.

The last sentence suggests that for each signal-frequency

component, the length of a full periodic circle should be at least

selected for the value of the duration tw, in order to avoid

miscalculation of the signal expected (power) values. For

example, a minimum duration of 5 ms is necessary for a 200 Hz

tonal signal, resulting to a window-length of 40 samples, in the

case of a typical sampling frequency of 8 kHz. The

corresponding exponential averaging requires a memory term

d = 0.025, as it is estimated from Eq. (26). Based on the

previous remarks, it is obvious that a variable memory term d(k)

is necessary for each frequency band k, considering a constant

sampling-frequency, as it happens with classical constant-rate

filter-bank implementations [20,21]. Table 1 represents the

band-edge frequency limits for the cases of the classical octave

and third-octave analysis [76], and the corresponding upper-

bounds of the band memory terms d(k), for the frequencies up to

4 kHz ( fs = 8 kHz).

The adoption of the wavelet analysis and the corresponding

multi-resolution scheme, equips the WDWF method with some

additional advantages related to the exponential averaging

procedure. In contrast to the previously stated cases of constant-

rate filter banks [20,21], the wavelet sub-sampling procedure

makes obsolete the individual configuration of the memory


Table 1

Required exponential-averaging lengths in constant rate, spectral band analysis (octaves, third-octaves)

Octaves [k] fL [Hz] fC [Hz] fH [Hz] tw-min [ms]

(lower-bound)

d(k)

(upper-bound)

Third-

octaves [k]

fL [Hz] fC [Hz] fH [Hz] tw-min [ms]

(lower-bound)

d(k)

(upper-bound)

1 44.2 62.5 88.4 22.6 0.006 1 55.7 62.5 70.2 18.0 0.007

2 70.2 78.7 88.4 14.3 0.009

2 88.4 125 176.8 11.3 0.011 3 88.4 99.2 111.4 11.3 0.011

4 111.4 125.0 140.3 9.0 0.014

5 140.3 157.5 176.8 7.1 0.017

3 176.8 250 353.6 5.7 0.022 6 176.8 198.4 222.7 5.7 0.022

7 222.7 250.0 280.6 4.5 0.027

8 280.6 315.0 353.6 3.6 0.034

4 353.6 500 707.1 2.8 0.043 9 353.6 396.9 445.4 2.8 0.043

10 445.4 500.0 561.2 2.2 0.054

11 561.2 630.0 707.1 1.8 0.068

5 707.1 1000 1414.2 1.4 0.085 12 707.1 793.7 890.9 1.4 0.085

13 890.9 1000.0 1122.5 1.1 0.105

14 1122.5 1259.9 1414.2 0.9 0.131

6 1414.2 2000 2828.4 0.7 0.162 15 1414.2 1587.4 1781.8 0.7 0.162

16 1781.8 2000.0 2244.9 0.6 0.200

17 2244.9 2519.8 2828.4 0.4 0.245

7 2828.4 4000 4000a 0.4 0.298 18 2828.4 3174.8 3563.6 0.4 0.298

19 3563.6 4000.0 4000a 0.3 0.359

a For the given sampling rate of 8 kHz, the high bandedge frequency at the last band, k = 7 for the octave analysis and k = 19 for the third octave analysis, is equal to

4 kHz (half the sampling rate), due to the Nyquist criterion.

term d(k) for each band, due to the fact that sampling rate is

adaptively reduced to each band. Thus, careful adjustment of a

global memory term d is entirely adequate to serve the above

processing requirements for all the involved bands. With this

arrangement, the filter retains its efficiency and de-noising

capabilities, while easy configuration and reduced complexity

facilitate filter adjustment and implementation. Having those

remarks in mind, the ‘‘type-I Wavelet Domain Wiener Filter’’

(WDWFI) was configured based on empirical observations of

BS de-noising examples. The memory term was set to d = 0.2,

while the rest of the parameters were adjusted to a = 2, b = 1

and c = 3. The configuration of the parameter d for the selected

analysis–synthesis topologies (5-level DWT and 17-band WPA,

described next) is presented in Tables 2 and 3, where the

selected value satisfies all the conditions, previously discussed.

Comparing Tables 1–3, it is obvious that the WDWF

Table 2

The influence of the exponential-averaging memory term d and the corresponding w

analysis

Wavelet

node

Band

[k]

Sampling

rate

Min-freq.

[Hz]

Max-freq.

[Hz]

tw-min

[ms]

d = 0.05

tw [ms] Condition:

[tw � tw-m

1 6 4000 2000 4000 0.5 4.9 TRUE

01 5 2000 1000 2000 1.0 9.7 TRUE

001 4 1000 500 1000 2.0 19.5 TRUE

0001 3 500 250 500 4.0 39.0 TRUE

00001 2 250 125 250 8.0 78.0 TRUE

00000 1 250 0 125 16.0a 78.0 TRUE

a To avoid dividing with 0, the ‘‘tw-min’’ value, at the k = 1 band, is calculated usi

frequencies.

approaches (Tables 2 and 3) offers easy adjustment and

implementation advantages over constant-rate, filter-bank

implementations, like the ones presented in Table 1.

2.3.2. Point to point WDWF type II

As already stated, Wiener filter minimizes the mean square

error of the estimate’s time domain reconstruction for the case

of uncorrelated, zero-mean, additive noise [2,6,20,22], provid-

ing optimal de-noising results in the case that both signal and

noise are present. However, signal is not always present inside

sound recordings, since silent periods appear very frequently.

For this reason, the classical formula describing the noise

contamination problem (Eq. (1)) is usually altered to the

following equation [22]:

xðiÞ ¼ bnsðiÞ � sðiÞ þ nðiÞ (27)

indow-averaging lengths (tw), in Discrete Wavelet Transform (DWT), multi-rate

d = 0.1 d = 0.2 d = 0.4

in]

tw [ms] Condition:

[tw � tw-min]

tw [ms] Condition:

[tw � tw-min]

tw [ms] Condition:

[tw � tw-min]

2.4 TRUE 1.1 TRUE 0.49 FALSE






ng the central frequency (62.5 Hz). Besides, BS are not considered at such low


Table 3

The influence of the exponential-averaging memory term d and the corresponding window-averaging lengths (tw), in Wavelet Packets (WP), multi-rate analysis

Wavelet

node

Band

[k]

Sampling

rate

Min-freq.

[Hz]

Max-freq.

[Hz]

tw-min

[ms]

d = 0.05 d = 0.1 d = 0.2 d = 0.4

tw [ms] Condition:

[tw � tw-min]

tw [ms] Condition:

[tw � tw-min]

tw [ms] Condition:

[tw � tw-min]

tw [ms] Condition:

[tw � tw-min]

111 17 1000 3500 4000 0.3 19.5 TRUE 9.5 TRUE 4.5 TRUE 2.0 TRUE















00001 2 250 125 250 8.0 78.0 TRUE 38.0 TRUE 17.9 TRUE 7.8 FALSE

00000 1 250 0 125 16.0a 78.0 TRUE 38.0 TRUE 17.9 TRUE 7.8 FALSE

a To avoid dividing with 0, the ‘‘tw-min’’ value, at the k = 1 band, is calculated using the central frequency (62.5 Hz). Besides, BS are not considered at such low

frequencies.

where bns(i) is a binary-valued state indicator sequence such

that bns(i) = 1 indicates the presence of signal s(i) and bns(i) = 0

indicates that the signal is absent. This model seems to be more

suited to describe the noise contaminated BS, due to the fact

that most gastrointestinal acoustic phenomena comprise com-

binations of solitary and clustered contractions, resulting to

sequences of concatenated BS segments and pause intervals

[57].

Wiener filter provides significant noise suppression with

minimum destruction of the useful signal components

[2,6,20,22]. However, estimation and processing errors are

produced inside silence periods, where only ABN is present

[20,21,77]. A common displeasing result is the appearance of

the so-called ‘‘birdy noise’’ and similar noise residual artefacts

[2,78], which occur when the signal coefficients regress near

the threshold. Considering that no signal components are

available to mask the ABN residues in that case [71,79], birdy-

residual noise is very annoying, due to the fact that is easily

audible (even for quite small recording levels) and because of

its random appearance and disappearance. The nature of BS, as

it was described previously, strengthens this noise reduction

artefact. Common solutions to the above problem are either to

further process the de-noised signal [78], or to apply rough

thresholds, affecting useful signal information, besides noise. A

classical example of the second solution is the hard thresh-

olding strategy or the iterative hard thresholding

[3,9,10,13,68,74,75]. For this reason the thresholding para-

meter was set to c = 3 in the case of the WDWFI module,

instead of the classical spectral subtraction configuration,

where c = 1.

A more interesting solution to the previous problem was

proposed by McAulay and Malpass [77] who had implemented

a soft-decision algorithm, capable of indicating the signal

presence or absence, prior to the de-noising process. This

method has motivated the implementation of the ‘‘type-II

Wavelet Domain Wiener Filter’’ (WDWFII), which is

presented below. McAulay and Malpass [77] observed that

the filter was unable to estimate properly the noise-only

presence, during silent periods. Thus, they proposed a two-

state modular model, where the first stage module decides if the

noised selection contains signal components, and the second

initiates if a true decision has occurred in the first stage, to

further suppress the noise contaminated sequence [77]. The

soft-decision algorithm uses a-priori signal to noise ratios for

every frequency component to estimate the possibility, for each

time–frequency bin, of presence or absence of useful signal.

Based on these SNR-depended possibilities, the maximum

likelihood estimator provides a two-stage binary output:

Hbn = 1, suggesting that signal (speech in their original work)

is active and Hbn = 0, suggesting silent period [77]. The

notation Hbn is introduced here analogously to the bns(i) binary

sequence of Eq. (27) (the terms H1 and H0 where used in the

original work of McAulay and Malpass for the states ‘‘1’’ and

‘‘0’’, respectively).

Let us now return to the WDWFI module of Eq. (23). For

practical reasons the Pa;x(k,w) term is changed by the notation

Pa;x(k,w)jI, in order to be able to distinguish the two approaches

(types I and II). ‘‘Splitting’’ the Eq. (23), in order to isolate the

‘‘signal-presence’’ binary decision filter HbnjI(k,w) and the de-

noising filter HdnjI(k,w), the filtering equation is adapted to a

two stage model (Eq. (28)), similar with the one proposed by

McAulay and Malpass [77]. The incorporation of the

‘‘masking filter’’ Akw in the following formulas introduces a

diversity of choices, since the general filtering equation

presented below, is still valid, either the perceptual ‘‘noise-

masker’’ criteria are enabled (Akw is provided according to


Eq. (19)), or disabled (Akw = Ak0 = 1).

HWDFWjIðk;wÞ ¼ HbnjIðk;wÞ � HdnjIðk;wÞ;

HbnjIðk;wÞ ¼ fc � Akw � PnFP;aðkÞ � Px;aðk;wÞjIg;

HdnjIðk;wÞ ¼�

1� c � Akw ��

PnFP;aðkÞPx;aðk;wÞjI

��b(28)

It is obvious that the ‘‘signal-presence’’ binary decision

filter HbnjI(k,w) is quite sensitive to noise fluctuations,

especially for the cases of c = 1 and Akw = Ak0 = 1, since very

small random variations might be considered as signal. This

unwanted situation is also caused by the fact that ABN usually

has a quasi-stationary nature, which might result to slightly

different probability distributions or spectral profiles, than

those of the selected noise footprint. The simplest solution to

face the discussed problem is to increase the thresholding

parameter c, as it was done for the case of the WDWFI module

(c = 3). As already stated, this may have a negative impact

when suppressing useful signal components besides noise,

since the filter attenuation curve becomes sharper [2,20,21].

Another unwanted behaviour that sometimes arises is the fact

that a strong signal may cause ‘‘post-echo’’ phenomena,

meaning that noise-residue-tails might be observed for a while,

although the signal component has completed his cycle. This

issue is related to the smooth windowing operation of the

exponential averaging (related examples are presented in the

next section). Although these artefacts are hardly listened (in

most cases they are inspected visually, only), their presence

affects the morphology of the corresponding signal-curves,

which also plays an important role in automated BS analysis

[24,57].

Following the approach of McAulay and Malpass [77], a

more convenient solution to face the previously mentioned state

problems would be to use varying threshold parameters c(k,w),

or varying memory terms d(k,w), adapting their values to the

state-changes of the binary decision filter HbnjI(k,w). However,

this settlement would add more complexity during filter

implementation and adjustment, so it was abandoned. Instead

of that, we preferred to introduce an alternative, more adaptive

to signal variations, a-powered value estimation Px;a(k,w) to be

utilized in the binary decision filter Hbn(k,w). Keeping the

parametric, recursive nature of the exponential averaging

procedure, we decided to take advantage of the noise-free past

values, in order to form the ‘‘type II a-powered estimation’’:

Px;aðk;wÞjII ¼jXðk;wÞja; w ¼ 0

d � jXðk;wÞja þ ð1� dÞ � Ps� ;aðk;w� 1Þ; w ¼ 1

�where,

Ps� ;aðk;wÞ ¼ dPS � jS� ðk;wÞja þ ð1� dPSÞ � Ps� ;a k;w� 1ð Þ; wor

Ps� ;aðk;wÞ ¼ jS� ðk;wÞja; ðdPSffi 1Þ

(in the case where dPS = 1, the exponential-average-power, can

be omitted so that the second part of Eq. (30) is used, to allow

rapid adaptation).

Comparing Eqs. (22) and (29) and considering that the

contaminating noise is uncorrelated with the useful signal, we

may point out the following remarks: (i) both power-

estimations Px;a(k,w)jI and Px;a(k,w)jII contain a signal-part

and a noise-part, with the last being smaller for the case of

Px;a(k,w)jII, since the corresponding noise-part comes from the

current noised-sample jX(k,w)ja, alone, (ii) the type-II

exponential power averaging is more adaptive to the

morphological structure of the signal components, so that

the corresponding windowing operation has shorter duration (it

is less influenced by the preceding noised samples, so that it

exhibits ‘‘shorter memory’’), (iii) the type-II power estimation

is always smaller than the type-I (Px;a(k,w)jII < Px;a(k,w)jI).Replacing the Px;a(k,w)jI term in the binary decision filter, with

the Px;a(k,w)jII, we obtain the type-II signal detection filter,

HbnjII(k,w):

HbnjIIðk;wÞ ¼ fc � Akw � PnFP;aðkÞ � Px;aðk;wÞjIIg (31)

From the analysis of the Eqs. (29)–(31) it is concluded that

the HbnjII(k,w) filter estimates the possibility for the signal-

presence binary decision, predicting forthcoming states of

signal presence or absence, from the previous ones. Within this

context, it is obvious that the possibility for signal presence is

greater if the previous state was active, comparing to the

opposite, non-active, case. In other words, the binary decision

filter expresses the conditional probability for a ‘‘signal-

decision’’ state, given the condition of the previous, signal

‘‘presence’’ or ‘‘absence’’, state. This is related with the fact

that, once a signal is initiated, there must be some time for the

attack-release cycle to be completed, before a pause interval

reappears. However, to avoid erroneous estimations at the

beginning and the ending of the signal segments, current

noised-samples are combined with past clean-samples, where

the memory term d is utilized to control the corresponding

probability density functions. Nevertheless, the physical

meaning and interpretation of the parameter d in the type-II

power-estimation, is completely different from the classical

exponential-averaging. It more likely behaves as a control

parameter that balances previous clean-samples with the

current noised-ones, and compares them with the noise

footprint power, to decide for signal initiation.

; � � � ;WXk � 1(29)

¼ 1; � � � ;WXk � 1; dPS! 1

(30)


Taking the previous interpretation into account, it is easy

to form a hybrid system, consisting of the HbnjII(k,w) and

HdnjI(k,w) filters, to modify the proposition of McAulay and

Malpass [77] for the case of the Wavelet Domain Wiener

Filter. According to the earlier analysis, the type-II binary

decision filter HbnjII(k,w) is more accurate in detecting and

rejecting the noise-only silent periods. Additionally, small

signal components, usually masked by the presence of ABN,

are likely to be rejected, so that perceptual criteria are also

incorporated into the de-noising process. This approach has

similar functionality with the perceptual filter Akw, since both

are based on the following fact: ‘‘inside noised sequences,

weak signal components (maskee) are ‘‘hidden’’ from the

presence of noise (masker)’’ [4,20,21,71]. On the other hand,

the de-noising filter HdnjI(k,w) that is employed during the

signal-presence state, remains unchanged, as in the case of

WDWFI. In fact, this modification allows for lowering the

threshold parameter c (c = 1), which is also beneficial for the

following reasons. Given the presence of signal, low-level

noise residuals, resulted from the soft suppression filtering,

are very likely to be masked, so that their presence is barely

annoying and sometimes not even perceptible. Such ‘‘tricks’’

are quite common on various perceptual approaches

employed in audio compression [8,79] and audio restoration

techniques [4], with the difference that the roles between

signal components and noise residuals are reversed in this

case (in contrast to the previous perceptual approaches),

since signal is now the masker and noise the maskee.

Additionally, the utilization of softer filtering curves is also

beneficial, from a different point of view, since less signal

distortion is introduced. We will refer to this combined

method of HbnjII(k,w) and HdnjI(k,w) filters, as WDWF type

‘‘I & II’’ (WDWFI&II):

HWDFWjI&IIðk;wÞ ¼ HbnjIIðk;wÞ � HdnjIðk;wÞ (32)

where the filters HbnjII(k,w) and HdnjI(k,w) are still given by

Eqs. (31) and (28), respectively. The difference between type

I and types I and II is that parameters a, b, c, d can be

configured independently for each one of the two filters (aI,

bI, cI, dI and aII, bII, cII, dII). However, a configuration with

identical parameters of both filters was managed, based on

empirical observations, as well as considering the require-

ments (discussed in Section 2.3.1) for the memory parameter

d. The configuration for the WDWFI&II module resulted to

the selection of the following parameters: a = 2, b = 1, c = 1

and d = 0.1. The hybrid WDWFI&II system provides robust

de-noising results on both binary states (whether signal is

present or absent). However, we could consider as disadvan-

tages the fact that it introduces more parameters to be

configured, and mostly that it demands additional computa-

tions. To simplify both the filter complexity and the imple-

mentation requirements, we decided to introduce the type-II

power estimation Px;a(k,w)jII in the de-noising filter Hdn-

II(k,w), so that both filters will use similar expressions.

The resulted type-II WDWF (WDWFII) was tested and

configured based mostly on empirical observations, however,

theoretical aspects were also taken into account and will be

described in the paragraph below.

HWDFWjIIðk;wÞ ¼ HbnjIIðk;wÞ � HdnjIIðk;wÞ;

HdnjIIðk;wÞ ¼�

1� c � Akw ��

PnFP;aðkÞPx;aðk;wÞjII

��b (33)

The WDWFII module differs from the corresponding

WDWFI&II hybrid system, in the de-noising filter Hdn(k,w),

which is activated when the signal is present (the binary

decision state is true). From Eqs. (22), (28), (29) and (33), it is

obvious that the type II de-noising filter HdnjII(k,w) provides

harder suppression of the noised sequences in contrast to the

type-I HdnjI(k,w) (for identical a, b, c and d parameters), due to

the fact that Px;a(k,w)jII < Px;a(k,w)jI. Nevertheless, the pre-

sence of signal, enables signal components to have greater

influence to the overall signal power (the greatest level of the

signal determines the overall power level of the noised signal),

so that the type-II power estimation approaches the correspond-

ing type-I. Thus, type-II WDWF (with c = 1) provides softer

signal suppression compared to the type-I WDWF (c = 3) when

signal is present, but harder suppression when signal

components are not presented, or they are buried below noise

levels.

Summing up, the transfer function of the WDWF is given by

Eqs. (23) and (25) for both types I and II, but with different a-

powered signal estimations Px;a(k,w) (Eqs. (22) and (29),

respectively). Although the WDWFII module has been

empirically motivated from types WDWFI and WDWFI&II, it

features some unique attributes, since: (i) it maintains its

simplicity, for the reduced implementation complexity and the

compromised computational load, (ii) it incorporates the soft-

decision filtering, suggested by McAulay and Malpass [77],

(iii) it features perceptual criteria, eliminating the weak signal

components usually masked by noise (it completely removes

the very hardly-audible components, instead of keeping a small

portion of them with many noise residual artefacts), (iv) it

continuously (point to point) balances the filter-gain, so that no

additional smoothing is necessary to eliminate audio artefacts

[78], or to prevent signal interruptions and violent transitions

[73].

The WDWFII transfer function is a first order autoregressive

(AR) model for the a-powered signal jX(k,w)ja, since output is

produced using the current input and the exact previous output.

However, the WDWFII feedback loop results to further

shrinkage of the signal (a-powered/k-band wavelet coefficients,

in our case), a fact that eliminates the possibilities for unstable

operation, which is inherent in autoregressive models. As it has

been already made clear in the above analysis, the configuration

of the WDWFII module was based on the theoretical

justification of the WDWFI&II and the related empirical

observations, resulting to the following adjustments: a = 2,

b = 1, c = 1 and d = 0.1, which, of course, are identical to the

selections of the WDWFI&II case. Validation results, about the

efficiency of the selected configurations for both types I and II

are presented in Section 3.


2.3.3. 6-Band DWT implemented WDWF

The selection of the ‘‘optimal’’ decomposition for wavelet-

based processing or analysis, is a classical problem. Avariety of

best-basis-selection strategies have been proposed both for,

DWT and WP, such as entropy, perceptual criteria and other

cost functions [3,12]. Daubechies [80] suggested that for a N

length input signal, optimal decomposition, in the sense of

DWT orthonormal base signal projection, requires a number of

M adjacent resolution scales (M-1 level of decompositions),

where:

M ¼ log2ðNÞ (34)

This suggestion has been adopted in many areas of de-noise

signal processing, including BS enhancement [48]. However,

for a single frame processing, which in our case has been

selected to nb = 2048 samples, this leads to 11 analysis bands,

or a 10-level decomposition tree. Given a sampling frequency

of 8 kHz, and calculating frequency limits of the bands

produced via the half-band splitting operation, it turns out that

almost half of the formatted bands are laid below 100 Hz,

whereas most audio signal components are unconsidered,

especially for the case of BS. From this point of view, the

adoption of M = 11 scales results to unnecessary processing

overheads, an issue that was also pointed out in [53]. Because of

the above remarks, a standard 6-level DWT has been chosen to

serve as a rough de-noise process, suitable for the long-term

frame based ASV processing [24,42]. The decomposition

scheme of this 6-band WDWF, which is presented in Fig. 6,

divides the full range of 0–4 kHz to a classical octave analysis

from 125 Hz to 4 kHz (upper frequency limits). Given that such

type of 5 level DWT processing has been implemented in many

audio applications [81], the proposed approach is best suited for

band feature analysis, such as band average power, or signal to

noise ratios [57].

Fig. 6. The 6-band DWT decomposition tree employe

The 6-band DWT configured WDWF has been tested with

various sets of parameters. As already discussed, experimenta-

tion showed that type (I) WDWF with a = 2, b = 1, c = 3 and

d = 0.2 and type (II) WDWF with a = 2, b = 1, c = 1 and d = 0.1

formulate efficient configurations for BS enhancement. Mother

wavelet Daubechies 6 proved to be the best selection for both

types. Use of the perceptual parameter Ak led to robust

enhancement with reduced artefacts, for components quite

above noise thresholds, while, desirable for prolonged analysis,

hard de-noising results were observed for noise buried signal

components.

2.3.4. 17-Band WP implemented WDWF

Although the results of the DWT implementations were very

satisfactory, an alternative decomposition topology was also

proposed. It is about a WP decomposition, where both

approximation and detail coefficients can further be band-

split. There are two basic factors that suggested this alternative

approach. Firstly, the fact that Wiener filter efficiency increases

as the frequency resolution gets finer [20,22]. Secondly, the

selection of an optimal basis, concerning critical bands for

human auditory attributes, would further extend the de-noising

and analysis capabilities of our approach [5,8,59,71]. Thus, the

new implementation would be ideal for delicate treatment, in

the case of enhancement and medical interpretation of isolated

short-term BS.

According to the bark scale rules, frequency resolution

should feature constant bandwidth (BW = 100 Hz) for fre-

quencies below 500 Hz, and bandwidth constantly equal to

20% of the central frequency (BW = 20%fc) for frequencies

above 500 Hz [8,59,71,79]. This type of semi-linear, semi-

logarithmic frequency spacing has been applied in many audio

coding applications and it has proved to be very efficient [8,79].

However, in contrast to the classical, Fourier-based, frequency

analysis methods and filter-banks, wavelets band-split analysis

d for Wavelet Domain Wiener Filtering (WDWF).


Fig. 7. The 17-band WP decomposition tree employed for Wavelet Domain Wiener Filtering (WDWF).

does not allow for arbitrary selection of the bands’ frequency

limits. Thus, the primary target in our case was to select a

decomposition tree with bark scale features, so that the number

of formulated bands as well as the corresponding bandwidths

would be as close as possible to those of critical bands

[8,59,71,79], in the frequency range 0–4 kHz. The result was

the 17 bands WP implementation of Fig. 7. Fig. 8 describes the

relationship between the bark scale frequency resolution and

the resolution obtained from the 6 bands DWT and the 17 bands

WP employed in our work.

The 17-band WP configured WDWF has also been tested

with various configuration parameters. Experimentation results

suggested again the following efficient configurations: type-I

WDWF with a = 2, b = 1, c = 3 and d = 0.2 and type-II WDWF

with a = 2, b = 1, c = 1 and d = 0.1. Mother wavelet Daubechies

6 has again proved to be the best selection for both types, while

the use of the perceptual parameter Akw proved, according to the

judgment of physicians and gastroenterologists, beneficial for

the improved audible results.

It is important to mention that a full scale 5-level WP

analysis scheme with 32 bands has also been tested. The results

showed that the computational overhead overcame the

improvement in signal enhancement, so this final scheme

was rejected. Besides, there is no need for further improvement

since the two previously described implementations proved

very robust and accurate. Thus, the four different WDWF

approaches finally implemented are (a) the 6-band type-I

WDWF (WDWFI-6), (b) the 6-band type-II WDWF (WDWFII-

6), (c) the 17-band type-I WDWF (WDWFI-17) and (d) the 17-

band type-II WDWF (WDWFII-17).

3. Experimental results

The implemented algorithms were primarily tested using

natural BS data, isolated from the available ASV multi-channel,

long duration recordings. Those signals have been also utilized

during the configuration phase for the calibration of the various

parameters of the four WDWF alternatives. Synthetically

generated sounds were next employed to simulate original BS,

for quantitative evaluation of the method’s performance. Since

it is difficult to approximate BS using classical test signals, such

as trigonometric functions, chirps, etc., an alternative approach

was followed to produce the synthetic BS (sBS). Thus, sums of

weighted linear adaptive modulated Gaussian functions were

employed for the BS ‘‘synthesis’’, to obtain deterministic sBS:

sBS½i� ¼XD�1

k¼0

Ck � hk½i� (35)

where function hk[i] is defined from the parameters ak, ik and uk:

hk½i� ¼ ðakpÞ�0:25exp

�� ½i� ik�2

2akþ jð2puk½i� ik�Þ

�(36)


Fig. 8. Comparison between critical band (CB) analysis and the selected decomposition topologies, which are the 6-band Discrete Wavelet Transform (DWT) and the

17-band wavelet packet analysis (WPA): (a) number of bands (barks) vs. frequency, (b) achieved bandwidth resolution vs. frequency. It is obvious that both of the

implemented topologies are tracking the perceptual attributes of the critical-band curves, with the WPA-17 scheme being more adaptive than the DWT-6.

Eqs. (35) and (36) describe the Adaptive Representation, also

known as ‘‘matching pursuit’’, one of the most famous Joint

Time Frequency Analysis algorithms that, among its many

advantages, features improved time and frequency resolution

[82]. Joint Time Frequency Analysis algorithms have also been

considered to setup the de-noising procedure as alternatives to

DWT and WP implementations, but they were abandoned due

to increased computational overhead. Eq. (35) was employed to

setup test-signals with easy controllable characteristics, such as

time–frequency localization, duration, ‘‘impulsiveness’’, etc.

However, synthetic signals adapted to natural BS were also

used. The estimation of the parameters ak, ik and uk, was

achieved via the Adaptive Transform, the counterpart of the

previously stated adaptive representation, of the noise free BS

[82]. Apart from the two previous types of test signals, natural

noise-free BS, which had been initially de-noised with classical

de-noising methods (different from the proposed WDWF

approaches), were also implemented during the experimental

evaluation procedure.

3.1. Medical experimental procedure

BS recordings utilized in the current work, were captured

during general experiments concerning auscultation diagnosis

and medical treatment [23,24,42,43,57]. The experiments took

place at the Gastrenterology Department of the Papageorgiou

General District Hospital of Thessaloniki, with a protocol

approved by the Hospital Ethics Committee. The proposed

medical study protocol falls in the category of the long-term

monitoring approaches, as it was described in Section 1.1,

where multi-site recordings are utilised for the evaluation of

gastrointestinal motility functionality [23,24,42,43]. Thus, a

minimum duration of 2 h was decided in order to be able to

analyze a full MMC cycle in the fasting state recordings, where

longer duration of up to 6 h were also incorporated for

evaluation purposes. A number of 28 middle-aged healthy

volunteers have been participated in the prolonged-duration

examination protocol, while a number of more than hundred

short-duration (10–30 min) recordings were carried out for

experimentation purposes, including some pathological situa-

tions (two patients with Cron-disease, one patient with total

gastrectomy, one patient with paralytic ileus, one patient with

obstructive ileus and five incidents of gastroenteritis).

Several sensors were considered for the detection of bowel

sounds, including electronic stethoscopes, stethoscopes in

combination with dedicated microphones, capacitance trans-

ducers (C-Ducer) and piezoelectric transducers [23]. All these

electroacoustic transducers were tested for their performance in

capturing BS by means of ASV. Except from their response

attributes (sensitivity and frequency response, mostly), physical


characteristics such as dimension, weight and shape, as well as

the corresponding mechanisms of operation, were also

considered as very important for the demands of the current

application. Based on the experimental observations of the

evaluation procedure, contact piezoelectric transducers were

selected as the most appropriate sensors. Piezoelectric

transducers, also referred as contact microphones or sound

Pick-Ups (PU), feature many advantages: (i) they have small

size and convenient shape, so that can be easily attached to the

objects, (ii) they are highly sensitive to the vibrations of the

contacted surfaces and less sensitive to all the other air-

propagated sounds, (iii) they are passive electric elements and

they do not require power to operate, ensuring safety conditions

for the subjects, (iv) their cost is low, in contrast to the

measuring piezoelectric accelerometers, while (v) their poor

frequency response at very low frequencies is rather beneficial,

since BS are not considered at frequencies below 100–150 Hz

[23].

The ‘‘K&K Sound’’, single-head, hot-spot piezo-transducer

pick-up was finally selected, since it features outstanding

performance characteristic and very small size. It is a round-

shaped instrument transducer having excellent linearity, listed

to the frequency range of 15 Hz to 15 kHz, with high

impedance and sensitivity. According to the manufacturer

(K&K Sound Systems, Inc.) many experimental projects have

been done with the Hot Spot, like installing pickups on the

fingertips of a glove or using them as ‘‘knocking’’ sensors in car

engines. Its small head is only 1.2 in. diameter and 1/32 in.

height and allows an unobtrusive installation. Furthermore,

small areas at the instrument can be reached and the transducer

can be mounted virtually invisibly. This low-cost transducer

was compared for its BS sensing capabilities via auscultation

experiments in combination with an electronic stethoscope

(Audioscope, Medivisio OY, Helsinki, Finland), as well as with

a high precision measurements’ accelerometer (Bruel & Kjær,

type 4506, tri-axial, miniature, high precision accelerometer).

According to the clinicians’ judgment, the transducer features

excellent sound-quality, while the very small size makes it

suitable for easy adaptation on the abdominal wall, especially

for long-term monitoring during nocturnal sleep.

Additionally, the frequency response of the sensor was

measured using the Bruel & Kjær 4809 vibration exciter and

white-noise test signals, resulting to a flat frequency response,

with variation smaller than 3 dB, in the desired frequency

area from 80 Hz to 4 kHz. A high gain (>50 dB), low noise and

low output-impedance preamplifier, with 40 Hz–50 kHz (+0/

�1 dB) frequency range, was also designed for matching

impedance and signal amplification demands. The preamplifier

was housed to a small plastic box (5 cm � 2 cm � 2 cm) and

was placed to a very small distance (<25 cm) from the sensor,

since longer distance cables, connected directly to the

transducer, would produce additional noise interferences.

The power supply for the pre-amplifier is provided through

rechargeable batteries, eliminating the possibilities for noise

interference from the power supply network (hum noises due to

ground loops), while ensuring safety operation, at the same

time. No analog filters were involved in the electroacoustic

chain, besides the high-pass AC coupling and the low-pass anti-

aliasing filters, employed from the circuitry of the recording

units.

The proposed sensor can be easily attached to the abdominal

walls, using double adhesive tape. However, a wearable

absorbing abdominal vest (WAAV), containing a thin lead-plate

(similar to those used in radiographic medical uniforms),

coated with foamy sound-absorbing material was additionally

designed to cover the sensor, aiming best adaptation to the

abdominal vibrating surface, as well as insulation of the air-

propagated ambient noise. The implemented sensors’ adapta-

tion module weights about 860gr and it has proved very useful,

since it keeps sensor(s) closely contacted to the measuring

surface, increasing the sound acquisition sensitivity, while the

presence of environmental acoustic noises is strongly

suppressed (an insertion loss of about 15 dB was measured).

Based on the experience of the clinicians that participated in the

examination procedures, the introduction of the WAAV module

facilitated the preparation of the monitoring process, while

reduced movements’ artefacts were appeared, due to the tighter

attachment of the sensors to the abdominal wall. According to

the opinions of the involved subjects, no discomfort was caused

by the use of the WAAV, since the foamy coating in

combination with the light construction made it very

convenient to adapt. Its utilization was proved very beneficial

especially in the case of mutli-channel recordings, where many

transducers had to be attached and adapted at the abdomen.

In general, any reasonable number of sensors can be

employed during BS auscultation and analysis [23,24,42,43].

The implementation of more than one sensor is enabled, in

order to overcome difficulties related with the weak-nature of

the propagated acoustic waves, selecting each time the sensor

closest to sound origination site, as well as to provide multi-site

monitoring and sound-field topographic interpretation [24].

Within this context, a 2-channel system was initially used, with

the sensors being symmetrically placed at the up-left and the

down-right abdominal quadrants [23,42,43]. The extension to

the four monitoring sensors, symmetrically distributed at the

four abdominal quadrants, was proved to serve best, both the

increased sensitivity demands and the multi-site monitoring

aspects [24]. The implemented de-noising methods were tested

using both 2-channel and 4-channel BS recordings. In addition,

the proposed methods could be also used in the case of signals,

captured by means of displacement, velocity and acceleration,

using tri-axial accelerometer. A hybrid arrangement, utilising

both contact piezoelectric transducers and tri-axial acceler-

ometers, has been recently introduced and it is currently

evaluated for sound-field visualization purposes.

In order to be able to satisfy the above recording demands, a

multi-channel sound card was selected so that signals were

directly captured in a computerised environment. Specifically,

the external card ‘‘PreSonus FIREPOD—24-bit/96K FireWire

Recording Studio’’ was selected, enabling mutli-channel

recordings from up to ten simultaneous monitoring sites.

The easily controlled level-adjustment potentiometers were

used in combination with the pre-amplifier settings, offering an

overall amplification of about 50 dB. Thus, the recording levels


of the highest-level BS were adjusted to about�3 dB, resulting

to a level of�40 dB for the unwanted additive broadband noise,

so that an almost 37 dB SNR was achieved in some extreme

cases (the best case scenario). Taking into account the fact that

BS spectral content is negligible above 2–2.5 kHz [36,48], data

were digitized to a PC with a sampling rate of 8 kHz, which was

considered as entirely adequate. A 16 bit quantization has been

selected to satisfy dynamic range demands, as well as to allow

processing algorithms to be applied efficiently.

3.2. Quantitative and qualitative evaluators

The qualitative evaluation was based on the audible

interpretation of the de-noised BS, as well as the visual

examination of the corresponding signal curves. This type of

assessment was continuously applied during the configuration

phase of the four parametric approaches. This process was

carefully monitored by Physicians. The experimental procedure

was based on short length BS that were selected and isolated

from the previous described long-term recordings. An

appropriate software-based signal processing environment

was setup using LabVIEW 7.1TM.

3.2.1. Validation of the selected WDWF parameters

As already stated, the configuration of the four WDWF

modules was primarily based on empirical experimentations,

using natural noise-contaminated BS recordings. Synthetic BS

were also employed via the ‘‘adaptive representation’’ method

[82], transforming the natural acoustic phenomena to determi-

nistic sequences (Eqs. (35) and (36)). Thus, we used sums of

weighted linear adaptive modulated Gaussian functions as test

signals that were artificially contaminated with various types of

noises. Specifically, Gaussian white noise (GWN) and pink

noise (PN) were employed, with the last case considered as a

more difficult de-noising task, since PN has a logarithmic

energy-frequency distribution, which is inherent to constant-Q

analysis [76]. Tests with uniform white noise (UWN) were also

performed, providing similar (almost identical) de-noising

results with the case of the GWN. Before we analyse the results

of this qualitative experimental evaluation, it is important to

mention that Gaussian-modulated tone signals (terms inside the

integral of Eq. (35)) are not best suited for wavelet processing

(in contrast to most natural acoustic phenomena), due to their

pure tonal content. Linear spectral analysis, such as modelled

approaches and Fourier-based techniques are preferred for

these signal-types. In addition, their symmetrical attack-release

envelope (refer to Figs. 9 and 10) is not very usual in physical

sound events (including BS), a fact that also has a negative

impact due to the adopted exponential-moving-average

scheme. Nevertheless, signals of Eq. (35) are easily localized

in the time–frequency plane (through the configuration of the

parameters uk and ik), while the parameter ak can be used to

control the duration or the impulsive characteristics of the

signals (it determines the shape of the Gaussian, bell-curve,

envelope) [76]. Thus, ‘‘adaptive-representation sequences’’ of

Eq. (35) were selected in a ‘‘worst case scenario’’ context,

offering the ability of easy configuration, at the same time.

Figs. 9 and 10 present the de-noising results of various

WDWF configurations in the time domain, for GWN and PN

contaminated signals, respectively. In the first two subplots,

(Figs. 9a and 10a) we observe the initial noise contaminated

signal x (subplots Figs. 9a1 and 10a1), as well as the

components of the noise-free signal s and the contaminating

noise n (black and grey graphical presentations in the subplots

Figs. 9a2 and 10a2, respectively). Subplots b1–j1 (left side of

the Figs. 9 and 10) present the de-noising results with the use of

the perceptual masking filter Akw, while subplots b2–j2 (right

side of the Figs. 9 and 10) do not use perceptual masking

criteria. For each one of these subplots, noise restored signal s�

is presented with the black-coloured curve, while the extracted

noise n� is given from the grey plotting. As test signal we used a

sequence of six Gaussian-modulated tones (D = 6 terms in the

integral of Eq. (35)), located at symmetrical distributed time-

instances, also with different impulsive and spectral character-

istics (refer to Figs. 11 and 12 for the last one). Spectrographic

colormaps of Figs. 11 and 12 represent the same (with Figs. 9

and 10) results, from a different point of view, providing Joint

Time Frequency Analysis (JTFA).

From a quick comparison of Fig. 9 with Fig. 10, we observe

that WDWF modules retain their efficiency for both cases of

GWN and PN signal contamination. Subplots b1–c1 and b2–c2,

(of all Figs. 9–11) indicate that the classical approach of Power

Spectral Subtraction (type I WDWF with c = 1), it fails to

eliminate birdy noise and residual artefacts, when signal

components are absent. The finally configured WDWF type I

(c = 3), eliminates this problem, for both 6-band and 17-band

implementations (subplots d1–d2 and e1–e2, respectively).

However, some ‘‘post-echo’’ phenomena are presented

(especially for the 17-band topology), forcing signals-

components to sustain for a while, so that their original

duration is slightly extended. The combination of types I and II

resolves this problem (types I and II configurations, subplots

f1–f2, g1–g2), while the differences of types-I&II and type-II

are very difficult to spot (subplots h1–h2, j1–j2). However, all

the modules have a slight destruction effect to the symmetrical

attack-release envelope (as it was expected from the analysis of

the previous paragraph), which is adapted to the characteristics

of the exponential curves. Comparing 6-band implementations

with 17-band ones, we observe that DWT works best when only

noise is present, since the noise residual artefacts are efficiently

suppressed, while the WPA approach offers more delicate de-

noising and less distortion, when signal is activated. This is

viewable from the spectrographic colormaps of Figs. 11 and 12,

where 17-band WPA reduces noise spectral components, even

for the time-bins that signal is activated. From Figs. 11 and 12 it

is also obvious that the WDWF modules retain their efficiently

along the frequency axis (for various spectral components of

the input signal). Comparing subplots b1–j1 with the

corresponding subplots b2–j2, we observe that the presence

of masking criteria completely eliminates noise residual

artefacts, especially for the cases of type II. However, the

rougher de-noising also affects the low-level frequency

components that correspond to useful signal. As already

stated, these perceptual approaches (including the filter Akw and


Fig. 9. De-noising results in the time domain of Gaussian-modulated tones (test signal) contaminated with additive Gaussian white noise (GWN), for various WDWF

configurations (qualitative validation): (a1) initial noised signal x[i], (a2) initial noise-free signal s[i] and additive noise n[i] and de-noising results (s�[i], n�[i]) for,

(b1) WDWFI-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (b2) WDWFI-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (c1) WDWFI-7 (a = 2, b = 1, c = 1, d = 0.1

and Akw enabled), (c2) WDWFI-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (d1) WDWFI-6 (a = 2, b = 1, c = 3, d = 0.2 and Akw enabled), (d2) WDWFI-6 (a = 2,

b = 1, c = 3, d = 0.2 and Akw disabled), (e1) WDWFI-17 (a = 2, b = 1, c = 3, d = 0.2 and Akw enabled), (e2) WDWFI-17 (a = 2, b = 1, c = 3, d = 0.2 and Akw disabled),

(f1) WDWFI&II-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (f2) WDWFI&II-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (g1) WDWFI&II-17 (a = 2, b = 1, c = 1,

d = 0.1 and Akw enabled), (g2) WDWFI&II-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (h1) WDWFII-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (h2)

WDWFII-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (j1) WDWFII-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (j2) WDWFII-17 (a = 2, b = 1, c = 1, d = 0.1

and Akw disabled).


Fig. 10. De-noising results in the time domain of Gaussian-modulated tones (test signal) contaminated with additive pink noise (PN), for various WDWF

configurations (qualitative validation): (a1) initial noised signal x[i], (a2) initial noise-free signal s[i] and additive noise n[i] and de-noising results (s�[i], n�[i]) for,

(b1) WDWFI-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (b2) WDWFI-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (c1) WDWFI-17 (a = 2, b = 1, c = 1, d = 0.1

and Akw enabled), (c2) WDWFI-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (d1) WDWFI-6 (a = 2, b = 1, c = 3, d = 0.2 and Akw enabled), (d2) WDWFI-6 (a = 2,

b = 1, c = 3, d = 0.2 and Akw disabled), (e1) WDWFI-17 (a = 2, b = 1, c = 3, d = 0.2 and Akw enabled), (e2) WDWFI-17 (a = 2, b = 1, c = 3, d = 0.2 and Akw disabled),

(f1) WDWFI&II-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (f2) WDWFI&II-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (g1) WDWFI&II-17 (a = 2, b = 1, c = 1,

d = 0.1 and Akw enabled), (g2) WDWFI&II-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (h1) WDWFII-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (h2)

WDWFII-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (j1) WDWFII-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (j2) WDWFII-17 (a = 2, b = 1, c = 1, d = 0.1

and Akw disabled).


Fig. 11. De-noising results in the time–frequency domain of Gaussian-modulated tones (test signal) contaminated with additive Gaussian white noise (GWN), for

various WDWF configurations (qualitative validation): (a1) spectrogram X[k,i] of the initial noised signal, (a2) spectrogram S[k,i] of the noise-free signal and de-

noising results (S�[k,i]) for, (b1) WDWFI-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (b2) WDWFI-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (c1) WDWFI-

17 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (c2) WDWFI-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (d1) WDWFI-6 (a = 2, b = 1, c = 3, d = 0.2 and Akw

enabled), (d2) WDWFI-6 (a = 2, b = 1, c = 3, d = 0.2 and Akw disabled), (e1) WDWFI-17 (a = 2, b = 1, c = 3, d = 0.2 and Akw enabled), (e2) WDWFI-17 (a = 2, b = 1,

c = 3, d = 0.2 and Akw disabled), (f1) WDWFI&II-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (f2) WDWFI&II-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (g1)

WDWFI&II-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (g2) WDWFI&II-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (h1) WDWFII-6 (a = 2, b = 1, c = 1,

d = 0.1 and Akw enabled), (h2) WDWFII-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (j1) WDWFII-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (j2) WDWFII-17

(a = 2, b = 1, c = 1, d = 0.1 and Akw disabled).


Fig. 12. De-noising results in the time-domain of Gaussian-modulated tones (test signal) contaminated with additive pink noise (PN), for various WDWF

configurations (qualitative validation): (a1) spectrogram X[k,i] of the initial noised signal, (a2) spectrogram S[k,i] of the noise-free signal, and de-noising results

(S�[k,i]) for, (b1) WDWFI-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (b2) WDWFI-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (c1) WDWFI-17 (a = 2, b = 1,

c = 1, d = 0.1 and Akw enabled), (c2) WDWFI-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (d1) WDWFI-6 (a = 2, b = 1, c = 3, d = 0.2 and Akw enabled), (d2)

WDWFI-6 (a = 2, b = 1, c = 3, d = 0.2 and Akw disabled), (e1) WDWFI-17 (a = 2, b = 1, c = 3, d = 0.2 and Akw enabled), (e2) WDWFI-17 (a = 2, b = 1, c = 3, d = 0.2 and

Akw disabled), (f1) WDWFI&II-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (f2) WDWFI&II-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (g1) WDWFI&II-17

(a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (g2) WDWFI&II-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (h1) WDWFII-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw

enabled), (h2) WDWFII-6 (a = 2, b = 1, c = 1, d = 0.1 and Akw disabled), (j1) WDWFII-17 (a = 2, b = 1, c = 1, d = 0.1 and Akw enabled), (j2) WDWFII-17 (a = 2, b = 1,

c = 1, d = 0.1 and Akw disabled).


the type-II configuration) provide improved audible results,

while the complete elimination of birdy noise and similar

residual artefacts is very useful in automated analysis [24,57].

Besides the previous qualitative analysis, a validation

procedure in quantitative terms was also necessary, to certify

the de-noising performance of the four proposed modules. This

approach was incorporated in the evaluation process, for two

basic reasons. Firstly, to confirm the effectiveness of the filter-

configuration, so that the finally selected parameters would

exhibit best de-noising capabilities. Secondly, to check if

different WDWF-configurations would provide remarkable

differences when applied to different types of signals, so that an

adaptive parameter-selection procedure could be involved.

Based on the theoretical aspects provided in the previous

sections, as well as on the fact that the influence of the

parameters a, b, c is well determined [2,20,21] we decided to

focus on the evaluation of the memory parameter d. Besides,

parameters a, b, c are mainly used to control filter attenuation

for various SNR levels [2,20,21], while the current approach

aims to evaluate de-noising results for different signal

morphological structures.

For this purpose, we decided to artificially contaminate test

signals with various types of noise, such as GWN, UWN and

PN. Single Gaussian-modulated tones (Eq. (35) for D = 1, first

order adaptive representation signals [82]), where used as initial

noise free signals. Adjusting parameters ik and uk we were able

to control the time–frequency location of the useful signal

components. Thus, the generated test signals were time-centred

in a time interval of 1 s (total duration of the Gaussian-

modulated tones), so that silence periods would be introduced

at the beginning and at the end. The parameter uk was controlled

to provide different test signals with various frequency

components, enabling a 1/3-octave analysis in the frequency

range [100 Hz, 3.1 kHz], which is an extended bandwidth were

BS components might be found. Various test signal samples

were generated at 16 different frequency bins, located at the

central frequencies of the classical third-octave analysis [76], as

they are indicated in Table 1: {100 Hz, 125 Hz, 160 Hz,

200 Hz, 250 Hz, 315 Hz, 400 Hz, 500 Hz, 630 Hz, 810 Hz,

1 kHz, 1.25 kHz, 1.6 kHz, 2 kHz, 2.5 kHz, 3.15 kHz}. In

addition, to control the impulsive characteristics and the

duration of the generated test signals, the standard deviation of

the Gaussian enveloped (Eq. (35)) was forced to a range from

0.001 to 0.2. Thus, using again a logarithmic-like scale, a grid

of 8 different instances was selected: {0.002, 0.005, 0.01, 0.02,

0.05, 0.1, 0.15, 0.2}. To be able to translate the above

parameters to signal explosive characteristics, we used a

logarithmic expression of the crest factor [75], which is a

suitable parameter to describe impulsiveness, also involved in

BS feature selection for pattern analysis [57]:

LCFðsÞ ¼ 20 � log10

rmsðsÞmaxðjsjÞ ½dB� (37)

where LCF(s) is the logarithmic crest factor of the signal s, rms

the root-mean-square operator, max is the maximum-value

operator and jsj indicates the absolute values of the sequence

s. With this settlement, the LCF values of the generated

Gaussian-modulated-tones samples that correspond to the pre-

vious standard deviations are (in reverse order): {10.6, 11.5,

12.8, 15.2, 18.6, 21.3, 24.1, 27.8, 30.4 dB}. Summing up, a 2D

grid was setup to generate 16 � 8 bins of various test signals,

with the frequency and the impulsiveness to act as the inde-

pendent variable of the signal generation procedures (produc-

tion of signals with different frequencies and impulsive

characteristics).

All these deterministic test signals were normalized to a

0 dB recording level, according to their peak values. Equal

duration noise samples were generated at �6 dB equivalent

recording level; their effective values were configured to half

the amplitude of the test-signals local maxima, so that a

localized signal to noise ratio [48,55] of six decibel

(LSNR = 6 dB) was achieved. This is a representative value

of noise-contamination level that could be utilized to evaluate

the influence of the parameter d to the de-noising results. We

avoided use of more severe noise contaminations cases,

because the 6 dB LSNR value is a typical average case, as well

as because of the fact that type-II would produce significant

signal suppression, due to its rougher de-noising nature. Five

different values were tested for the parameter d, based on the

theoretical issues discussed in the previous section (for the

type-I WDWF) and the information provided in Tables 1 and 2:

{0.01, 0.05, 0.1, 0.2, 0.4}.

Given the frequency-impulsiveness grid for the test signals,

as well as the candidate d values, the only unsettled issue was

the quantitative term to be employed for the evaluation of the

de-noising process. A quite common parameter usually

employed in such cases is the Cross Correlation Index

(CCI), also known as the Pearson linear correlation [48,50,55]:

CCI

¼P

i½sðiÞ �meanfsðiÞg� � ½s� ðiÞ �meanfs� ðiÞg�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPi½sðiÞ �meanfsðiÞg�2

q�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

i½s� ðiÞ �meanfs� ðiÞg�2q

(38)

The CCI estimates the similarity between the original clean

signal s(i) and the estimated one s�(i) produced from the de-

noising procedure, so that its values range in the interval [0,1].

In this sense, the closer the CCI value to 1, the greater the

resemblance of the two signal versions and of course, the better

the de-noising results. However, it is important to mention once

again that Gaussian-modulated tones are not best suited for

wavelet processing, as well as the fact that these ideal signal-

types have many differences when compared to natural BS. In

addition, all the perceptual characteristics of the type-II

WDWF, such as the strong suppression of the buried to noise

signal components, would produce degradation of the de-

noising results, as they are expressed with the parameter CCI.

For the same reason, the perceptual filter Akw was not

incorporated into the validation experiments. Nevertheless,

this procedure provides an overview-picture of the WDWF de-

noising capabilities. Besides, the validation intends to evaluate

the influence of the parameter d and not the WDWF de-noising


Fig. 13. Quantitative validation surface (Cross Correlation Index (CCI), between the initial noise-free signals and the WDWF de-noised ones) given the frequency ( f

[Hz]) and the impulsiveness (crest factor [dB]) as independent variables for d = 0.1 (1st column) and d = 0.2 (2nd column); it is also presented (3rd column) the mean

CCI and the expected variation (mean standard variation) for all the 2D CCI-bins and each of the tested d values (d = {0.01, 0.05, 0.1, 0.2, 0.4}); the four subplots

correspond to: (a) WDWFI-6 (a = 2, b = 1, c = 3), (b) WDWFI-17 (a = 2, b = 1, c = 3), (c) WDWFII-6 (a = 2, b = 1, c = 1), (d) WDWFII-17 (a = 2, b = 1, c = 1).

efficiency, which is analyzed both qualitatively and quantita-

tively in the following paragraphs (the validation results are

proposed only for comparisons between the various config-

urations).

All the four WDWF modules were tested within the above

experimental procedure. The parameters a, b were set to their

empirically selected values (a = 2, b = 1), which also provides

the simplest configuration for the parametric Wiener filter.


Fig. 14. De-noising results for low-level noise contamination: (a) initial signal, (b) WDWFI-6, (c) WDWFI-17, (d) WDWFII-6, (e) WDWFII-17. The effective signal to

noise ratio (ESNR) is indicated at each subplot for evaluation-comparison purposes x[i] is the initial noised signal, s�[i] the estimated noise-free signal and n�[i] the

extracted noise, while X[k], S�[k] and N�[k] are the corresponding power spectra, estimated via the Fast Fourier Transform.


The threshold parameter was set to c = 3 for the type-I

implementations and c = 1 for the type-II. Test signals were

artificially contaminated with various-multiple-samples of

additive broadband noise. All GWN, UWN and PN profiles

were employed during this procedure, with the level adjustment

rules that were described in the previous paragraphs.

Specifically, 30 different noise samples (10 � GWN,

10 � UWN and 10 � PN) were utilized for each test-signal-

bin, so that every WDWF module was implemented 30 times

for each bin, to estimate a mean CCI from all the 30 de-noising

results (per input signal). Tests with 300 noise samples were

also performed, without any significant difference in the

observed de-nosing results. As a consequence, we were able to

draw the 2D-surface of the CCI variation across the frequency

and crest-factor axes, for every tested d value. The statistical

properties of those parametric surfaces would provide the final

validation results.

In all the WDWF modules the values d = 0.1 and d = 0.2

provided the best results of all the tested values. The related

surface-plots are presented in Fig. 13, where the d = 0.1 results

are presented in the first column and the d = 0.2 in the second

column. The third column presents the overall mean value of

the CCI index (white curve) for all the tested d values, plus and

minus the standard deviation (grey-coloured solid curve),

across the entire frequency-LCF plane. It is obvious that

although some values are preferable for some signal types (e.g.,

bigger d values for more impulsive signal or higher spectral

components), the configurations d = 0.2 and d = 0.1, for the

types I and II, respectively, provides quite stable performance-

surfaces in the entire frequency-impulsiveness plane. Thus, the

proposed adjustments can be used in a more generic mode,

featuring almost identical de-noising capabilities in a wide

range of audio signals, and retaining the basic advantage of

simplicity and easy implementation. These issues confirm the

theoretical expectation for universal de-noising efficiency, as

they were presented in the previous section, due to the iterative

exponential averaging of the WDWF structure and the multi-

rate nature of the wavelet analysis.

3.2.2. Qualitative results

As already stated, the main target of the WDWF is to enhance

noise-contaminated BS by affecting useful information as

minimally as possible. Figs. 14 and 15 represent the de-noising

results of the four WDWF configured modules, for two different

BS samples. Time domain history and overall FFT-based power

spectrum are plotted for all three involved signals, which are the

noise contaminated BS x(i), the enhanced BS s�(i) and the

subtracted noise n�(i), aiming to provide thorough representation

of the de-noising results. Signals are also scaled, for convenient

representation, so that the initial, noise contaminated signal is

normalized in the [�1,1] interval of the y-axis values and its

power spectrum has a 0 dB maximum. The extracted normal-

ization rules are then applied to the remaining signals.

Before discussing qualitative results, it is essential to define

a procedure to measure the grade of noise contamination. This

is usually described using the a-posteriori signal to noise ratio,

(SNR) which is equivalent to the ratio between the de-noised

signal power and the extracted noise power, expressed in

decibels (dB). Since, long silent periods are observed in BS

signals, an alternative approach for SNR was necessary in order

to avoid overestimating of the de-noising performance. The

effective signal to noise ratio ESNR, which estimates SNR after

the silence period of the signal is removed (abs{s(i)} > 0), was

preferred in our case to comply with BS morphological

structure. In practice, a positive threshold value (th) was used to

calculate the silent period, instead of the 0 value one would

expect:

ESNRðthÞ ¼ 10 � log10

�Pi;jsðiÞj> thsðiÞ2Pi;jsðiÞj> thnðiÞ2

�

ffi 10 � log10

�Pi;sðiÞj> thsðiÞ2P

inðiÞ2

�(39)

where s(i) is either the original clean signal, if available, or the

reconstructed noise-free one (s�(i)) and n(i), again, is either the

original noise, if available, or the extracted one from the de-

noise procedure (n�(i)). Right-side equality of the Eq. (39) is

introduced due to the fact that noise has a more stationary

nature, so that the use of more samples provides more accurate

results in the estimation of the expected values. The estimation

of an applicable threshold value was a tiring procedure; how-

ever it finally successfully dealt with, by adopting the following

rule: (i) a local maximum signal region is selected and its

average power is calculated, (ii) a threshold is estimated so that

signal regions that are above �20 dB of the calculated average

power, are considered as non-silence. Exponential sound-level

averaging was proved very useful for the previously described

level estimation procedure, enabling a point to point compar-

ison mode. The coefficients {1, �0.92} and {0.08} of the IIR

filter (Eq. (26)) were experimentally selected, so that the

ESNR = 0 dB rating corresponds to a severe noise-contamina-

tion (the reader may refer to Fig. 24 for some representative

examples).

Returning to the example of Figs. 14 and 15, it is clear that

all four WDWF configurations provide favourable results with

slightly noticeable differences, whereas 17-band implementa-

tions are more delicate to signal treatment, compared to 6-band

implementations. From the specific examples it can also be

observed that type II WDWF provides better visually inspected

results, due to its ‘‘adaptive signal tracking’’ nature. 17-band

modules were proved to be the most delicate module that

preserves signal components, even when are ‘‘buried’’ in the

noise. WDWFII-6, on the other hand, provides robust and rough-

cut elimination of non-audible low level components, usually

featuring noise. However, it reduces computational overheads,

so it is recommended for long-term BS treatment [24].

Regarding the case of a severely noise contaminated signal,

the de-noising results are presented in Fig. 16. The effects of the

involved perceptual criteria (Akw factor), for the same de-

noising example, are presented in Fig. 17. It can be shown that

the hardly-listened high frequency components masked from

the ABN, are eliminated when perceptual criteria are active.

Experience showed that the incorporation of the Akw factor


Fig. 15. De-noising results for high-level noise contamination: (a) initial signal, (b) WDWFI-6, (c) WDWFI-17, (d) WDWFII-6, (e) WDWFII-17. The effective signal to

noise ratio (ESNR) is indicated at each subplot for evaluation-comparison purposes x[i] is the initial noised signal, s�[i] the estimated noise-free signal and n�[i] the

extracted noise, while X[k], S�[k] and N�[k] are the corresponding power spectra, estimated via the Fast Fourier Transform.


Fig. 16. De-noising results for a severely noise-contaminated signal with perceptual criteria (Ak) being disabled: (a) initial signal, (b) WDWFI-6, (c) WDWFI-17, (d)

WDWFII-6, (e) WDWFII-17. The effective signal to noise ratio (ESNR) is indicated at each subplot for evaluation-comparison purposes x[i] is the initial noised signal,

s�[i] the estimated noise-free signal and n�[i] the extracted noise, while X[k], S�[k] and N�[k] are the corresponding Power Spectra, estimated via the Fast Fourier

Transform.


Fig. 17. De-noising results for a severely noise-contaminated signal with perceptual criteria (Ak) being enabled: (a) initial signal, (b) WDWFI-6, (c) WDWFI-17, (d)

WDWFII-6, (e) WDWFII-17. The effective signal to noise ratio (ESNR) is indicated at each subplot for evaluation-comparison purposes x[i] is the initial noised signal,

s�[i] the estimated noise-free signal and n�[i] the extracted noise, while X[k], S�[k] and N�[k] are the corresponding Power Spectra, estimated via the Fast Fourier

Transform.


Fig. 18. De-noising results for a strongly noise-contaminated signal (hum is also present besides ABN): (a) initial signal, (b) WDWFI-6, (c) WDWFI-17, (d) WDWFII-

6, (e) WDWFII-17. The effective signal to noise ratio (ESNR) is indicated at each subplot for evaluation-comparison purposes x[i] is the initial noised signal, s�[i] the

estimated noise-free signal and n�[i] the extracted noise, while X[k], S�[k] and N�[k] are the corresponding power spectra, estimated via the Fast Fourier Transform.


Fig. 19. Comparison of the de-noising results with standard wavelet-based, auto-threshold strategies: (a) initial noised signal, (b) WDWFI-6, (c) WDWFI-17, (d)

WDWFII-6, (e) WDWFII-17, (f) RIGSURE soft threshold, (g) SQTWOLOG soft threshold, (h) RIGSURE hard threshold, (j) SQTWOLOG hard threshold. The

effective signal to noise ratio (ESNR) is indicated at each subplot for evaluation-comparison purposes x[i] is the initial-noise contaminated-signal, s�[i] the estimated

noise-free signal and n�[i] the extracted noise.


Fig. 20. Comparison of the de-noising results with auto-threshold, BS de-noising methods, for the case of an explosive bowel sound (EBS or IB): (a) initial noised

signal, (b) WDWFI-6, (c) WDWFI-17, (d) WDWFII-6, (e) WDWFII-17, (f) WTST–NST with Fadj = 3, (g) WTST–NST with Fadj = 4, (h) WT-FD, (j) IKD. The effective

signal to noise ratio (ESNR) is indicated at each subplot for evaluation-comparison purposes x[i] is the initial-noise contaminated-signal, s�[i] the estimated noise-

free signal and n�[i] the extracted noise.


Fig. 21. Comparison of the de-noising results with auto-threshold, BS de-noising methods, for the case of a regularly sustained BS (RS): (a) initial noised signal, (b)

WDWFI-6, (c) WDWFI-17, (d) WDWFII-6, (e) WDWFII-17, (f) WTST–NST with Fadj = 3, (g) WTST–NST with Fadj = 4, (g) WT-FD, (j) IKD. The effective signal to

noise ratio (ESNR) is indicated at each subplot for evaluation-comparison purposes x[i] is the initial-noise contaminated-signal, s�[i] the estimated noise-free signal

and n�[i] the extracted noise.


leads to audible improvements as well as elimination of birdy-

noise and residual artefacts, while useful information is barely

affected.

In the example of Fig. 18, the signal is also contaminated

with hum, besides ABN (not a rare incident in such

applications). The de-noising results prove the fact that the

method is capable of eliminating background noise even in

cases where broadband spectral characteristics are violated, as

long as the appropriate noise footprint is provided.

Another issue, connected with the qualitative analysis of the

method’s performance, is the comparison with standard de-

noising techniques. Two of the most famous wavelet de-noising

techniques, with automatic thresholds, were selected for this

task. The first uses the principle of Stein’s Unbiased Risk

Estimate (SURE) for threshold estimation, and the ‘‘SQTWO-

LOG’’ approach that uses the universal threshold, sqrt(2*lo-

g(length of x[i])) [13]. These reference methods were selected

from the ‘‘ready to use’’ tools of the National Instruments

Signal Processing Toolset v6.0TM [83]. For the rest of the paper,

we will use the notation ‘‘RIGSURE-(soft)’’ and ‘‘RIGUSRE-

(hard)’’ for soft and hard thresholding (Eq. (14)), respectively,

where the SURE threshold is employed [83]. Similarly the

notations ‘‘SQTWOLOG-(soft)’’ and ‘‘SQTWOLOG-(hard)’’

are defined, for the SQTWOLOG-based threshold estimation

[83]. Fig. 19 provides comparisons of de-noising results,

between the WDWF modules, and the above-mentioned

wavelet, auto-thresholding, strategies. It is clear that WDWF

approaches combine robust de-noise with bare minimum effect

on useful signal information. Experience showed that the

WDWF approach balances between the minimal signal

destruction of the RIGSURE-(soft) method and the rough

de-noising of the SQTWOLOG-(hard) method. The first

approach, suffers from the birdy noise and similar residual

artefacts, especially for low SNRs, while the second sweeps out

most of the stationary-like signal components. On the other

hand, RIGSURE-(hard) methods leaves out more noise

residuals compared to RIGSURE-(soft), while the SQTWO-

LOG-(soft) approach resolves this problem, usually affecting

useful signal components (refer to the extracted noise curve, in

Fig. 19g). In general, WDWF preserve the morphological

structure of all regularly sustained BSs, included whistling and

rumbling gastrointestinal sounds, produced by prolonged

sweep clusters of contractions [23,24], whereas most auto-

mated thresholding strategies seriously damage the shape of the

signals.

Related with the previous issue is the fact that all the

proposed auto-threshold de-noising algorithms, that have been

developed for BS processing [48,50–56], are mainly suggested

for explosive BS. However, it is interesting to compare the

WDWF modules with some of those methods, such as the

WTST–NST approach [10,48,50], which has been utilized in

various BS analysis demands [51] and received various

interpretations and implementation improvements [53], the

WT-FD method [54,55] and the IKD approach [52,56],

although the last filtering technique is mostly intended for

signal detection purposes. Since all the previous approaches

have been developed for EBS, it would be useful to evaluate and

compare them with WDWF, using such types of signals. Fig. 20

provides de-noising results of a noised explosive BS pattern

(IB), for all the WDWF modules and the standard EBS auto-

threshold de-nosing methods. It is obvious that all the tested

algorithms result to robust de-noising, with the IKD approach

to efficiently detect the signal initiation (Fig. 20j), inside a

small number of iterations (L = 3). The WTST–NST method

with Fadj = 3 (as it was proposed in its original configuration

[10,48]), results to significant signal enhancement with

minimum distortion (Fig. 20f). However, a weakness is spotted

in the presence of noise residual artefacts, problem that is

successfully dealt when Fadj = 4 (Fig. 20g). The last

modification also facilitates faster convergence, since a number

of L = 5 iterations is now required, instead of L = 7 for the case

of Fadj = 3. The WT-FD method (Fig. 20h) seems to be immune

to noise residual artefacts, where the ABN is completely

eliminated when the signal is absent. The filtering process is

softer in the opposite case, when signal components are present,

approaching noise-gate processing results, an issue that

preserves perceptual characteristics (the noise is masked by

the signal) and does not cause any spectral distortion.

Comparing all the previous results with the ones provide of

our proposed methods (Fig. 20, subplots b–e), it is important to

mention that all the four WDWF modules have outstanding

performance, with similar, if not superior, de-nosing results.

Fig. 21 provides an alternative example, where all the

candidate de-noising algorithms are compared for their ability

to be utilized in the case of clustered-contraction BS events,

referred as regularly sustained (RS) [57]. Although the auto-

threshold EBS de-noising methods are not indented for such

types of signals, this comparison is unavoidable for two basic

reasons: (i) for the completeness of the evaluation procedure

and (ii) to show up that the proposed WDWF techniques are

very efficient in these cases, where certified de-noising methods

are not available. From Fig. 21, it is obvious that all the four

WDWF modules retain their efficiency, providing robust

enhancement with accurate noise elimination (Fig. 21, subplots

b–e). This is not an incident for the case of all the other

methods, where we observe either insufficient de-noising

(Fig. 21f), or random signal distortions and morphological

destructions (Fig. 21g and h). Similarly, the IKD detection

results seem to be very confusing (Fig. 21j). These results are

quite natural, since according to the corresponding authors

[48,50–56] the proposed methods have been configured for

EBS processing demands. Thus, algorithms modifications or

alternative configuration are needed. Based on our experi-

mentation, the WTST–NST provides better results from the

other approaches, especially the original configuration

(Fig. 21f) with Fadj = 3 [10,48]. Nevertheless, a rather great

amount of noise remains in tact, after a heavy computational

workload (L = 24). The configuration with Fadj = 4 leads to

faster convergence (L = 8) and stronger noise-elimination

(Fig. 21g), an issue that causes serious morphological

destruction in most of the RS patterns. The WT-FD method

(Fig. 21h) seemed quite unstable, since very small modification

to the filter-configuration-parameters (accuracy, acc; epsilon, e;sliding window length, WL [54,55]), leads to completely


Fig. 22. Upper and lower ESNR bounds (ESNR = 0 dB and ESNR = 0 dB) for the noise-stress quantitative evaluation procedure, using both IB and RS representative

patterns; initial noise-free signals s[i] and additive GWN n[i] are presented in the left side (with black and grey colour, respectively), while noise contaminated signals

x[i] are plotted in the right side of the figure.

different de-noising results. The computational load remains

heavy, where besides the great number of iterations (L = 18 in

Fig. 21h), the sliding ‘‘Fractal Dimension Peak Peeling

Algorithm’’ (FD-PPA) [54,55] is also very demanding and

results to further computational demands when it is applied in

more than one wavelet scales (issue that seems unavoidable for

the cases of RS patterns).

Summing up, the proposed WDWF de-noising techniques

can be employed efficiently for both IB and RS patterns, or

for any combination of segments of the previous types, which

are very frequent in BS recordings [24]. Furthermore, the

duration of the processed BS does not affect neither the de-

noising accuracy, nor the operational demands, besides the

fact the greater amount of data have to be processed for longer


signal durations. All these attributes make WDWF ideal for

long-term unsupervised BS processing, where standard EBS

de-noising algorithms seem problematic due to their

erroneous behaviour and the increased computational cost

[24]. However, the combination of the WDWF modules and

some of the auto-threshold EBS enhancement methods (with

different configuration) seems very provoking, towards the

establishment of auto-threshold WDWF de-noising. Such

trials are currently tested and evaluated for their performance

and functionality.

3.2.3. Experimental evaluation procedure and quantitative

performance

An experimental procedure was necessary for quantitave

evaluation of the proposed methods’ results. Thus, ‘‘synthetic’’

BS and controlled noise contamination employed to carry out

such a type of evaluation analysis. Since, synthetic BS cannot

be directly constructed, an alternative technique was invented.

A number of about 600 representative BS, quite above the noise

level (ESNR > 15 dB), were selected from the previously

mentioned ASV recordings [24]. All BS were de-noised using

standard STFT spectral subtraction that works fine for high

SNR. These noise-free BS were considered as the test signals

for the quantitative analysis via a noise stress procedure, as it

was presented in [48,55]. BS samples were classified in IB and

RS patterns, and 100 samples of each class were finally selected

randomly for the quantitative analysis. The individual

quantitative analysis results for the IB and RS patterns, was

forced from the issues presented in the previous paragraphs,

since the available BS de-noising approaches are proposed for

EBS signals.

STFT de-noised BS test signals were artificially infected

with additive, zero-mean, white Gaussian noise with unity

variance (s2N ¼ 1). Other types of noise, mainly UWN and PN

were also tested, but they did not provide any significant

difference for the case of the WDWF, so the related experiments

were abandoned. Test BS were also manipulated to have

different amplitude levels, compared to noise, so as to provide

ESNR of 0–20 dB, with an increasing step of 0.5 dB. Multiple

noise generation was enabled for each of the 200 total BS

samples and for every of the 40 different ESNR levels, so that a

total number of 8000 different GWN profiles were generated

during the quantitative analysis. De-noising process was

applied using the four proposed WDWF modules, as well as

some reference wavelet thresholding approaches. Specifically,

the quantitative procedure was applied to RIGSURE-(soft),

SQTWOLOG-(hard), WTST–NST (Fadj = 3) and WTST–NST

(Fadj = 4), which according to the qualitative analysis of the

previous paragraph and the results presented in Figs. 19–21,

provide better de-noising results for both IB and RS patterns. A

preliminary experimental procedure with the rest of the

methods (RIGSURE-(hard), SQTWOLOG-(soft), WT-FD,

IKD) confirmed the observations made during qualitative

evaluation. Before we proceed to the core of the quantitative

analyisis, Fig. 22 presents two typical cases of the bound ESNR

criteria (ESNR = 0 dB and ESNR = 20 dB) and for both IB and

RS samples, in order to obtain a picture about the

characteristics of the noise stress procedure and to be able to

accurately survey quantitative ratings that follow.

In order to be able to express performance of the method

within quantitative terms, evaluation descriptors had to be

established, first. The performance evaluators employed, make

comparisons between the original clean signal and the de-

noised one both in time and frequency domains. A time-domain

performance evaluator was employed using the so-called signal

to deviation ratio (SDR) as it was introduced in [84]:

SDR ¼ 10 � log

� PN�1i¼0 ½sðiÞ�

2PN�1i¼0 ½s� ðiÞ � sðiÞ�2

�½dB� (40)

with s(i) and s�(i) again the original and the de-noised signal.

A common problem that usually arises, when using noise

cancellation techniques, is the spectral distortion of the original

signal. An appropriate spectral distortion measure (SDM) was

introduced according to the methodology adopted at [85]. If s(i)

and s�(i) are the original and the de-noised signal, the spectral

distortion measure depends on the ESNR of the experimental

noise contamination procedure and it is calculated as follows:

SDMðs; s� ;ESNRÞ

¼ 1

P� 1

256

XP

i¼1

X256

k¼0

20 � jlogðjSP;nðkÞjÞ � logðjS�P;nðkÞjÞj (41)

where SP;n(k) and S�P;nðkÞ are the kth frequency components of

the, STFT estimated, p-frame magnitude spectrum of the

normalized signals sn(i) and s�n ðiÞ:

snðiÞ ¼sðiÞjjsjj þ n0ðiÞ; s�n ðiÞ ¼

s� ðiÞjjs� jj þ n0ðiÞ (42)

Eq. (42) suggests than signals sn(i) and s�n ðiÞ are firstly normal-

ized, in order to get 0 dB energy, and then a white noise vector

n0(i) with �30 dB energy is added to prevent computation of

log(0) in Eq. (41) [85].

Finally, the CCI [48,55] was also employed to estimate the

similarity between the original clean signal s(i) and the

estimated one s�(i). According to Eq. (38), CCI should

approach unity for ‘‘perfect’’ de-noising results, since the de-

noised signal would be closer to the original noise-free one,

while values closer to zero would suggest complete failure of

the de-noising process.

It is important to mention that any perceptual criteria that are

connected with the type-II attribute to suppress ‘‘badly-heard’’,

low-level signal components, in order to avoid birdy noise and

residual artefacts, cannot be revealed during the quantitative

evaluation procedure. In fact it is likely to deteriorate the

quantitative evaluators, especially for low ESNR levels, where

the corresponding ratings are expected to have compromised-

efficiency results, or at least worsen than those of the

corresponding type-I. For the same reason, the perceptual

criteria, introduced by the filter Akw, were disabled during the

quantitative evaluation. Fig. 23 presents the SDR-based

performance ratings of the proposed and the reference de-

noise methods, for both the IB (subplots a1–e1) and the RS

(subplots a2–e2) samples. The first four subplots (a1–d1 and


Fig. 23. Performance evaluation based on the signal to deviation ratio—SDR [dB], as a function of the effective signal to noise ratio (ESNR), separately for intestinal

bursts (IB) and regularly sustained (RS) patterns (presented in the left and right side, respectively): (a1) WDWFI-6 (IB), (a2) WDWFI-6 (RS), (b1) WDWFI-17 (IB),

(b2) WDWFI-17 (RS), (c1) WDWFII-6 (IB), (c2) WDWFII-6 (RS), (d1) WDWFII-17 (IB), (d2) WDWFII-17 (RS), (e1) comparisons with the reference de-noising

methods (IB), (e2) comparisons with the reference de-noising methods (RS).

a2–d2) at each side (left for IB, right for RS) correspond to the

four alternative WDWF modules (WDWFI-6, WDWFI-17,

WDWFII-6, WDWFII-17). Specifically, the white-colour curve

presents the mean value, while the grey area expresses the

expected variation (mean value standard variation). The last

subplots (e1, e2) provide the mean values of the comparison de-

noising methods (SQTWOLOG-hard: RED, RIGSURE-soft:

GREEN, WTST–NST (Fadj = 3): YELLOW, WTST–NST

(Fadj = 4): BLUE), while the grey-colour area indicates the

variations of the mean values of the WDWF modules. Identical

is the configuration for Figs. 24 and 25, where they are

presented the CCI-based and the SDR-based ratings.

It is clear that all the WDWF ratings are favourable for both

the IB and RS patterns, suggesting superior performance when

compared to all the tested methods. All the evaluators have

quite stable behaviour (minimal variance) even for very small


Fig. 24. Performance evaluation based on the Cross Correlation Index (CCI), as a function of the effective signal to noise ratio (ESNR), separately for intestinal bursts

(IB) and regularly sustained (RS) patterns (presented in the left and right side, respectively): (a1) WDWFI-6 (IB), (a2) WDWFI-6 (RS), (b1) WDWFI-17 (IB), (b2)

WDWFI-17 (RS), (c1) WDWFII-6 (IB), (c2) WDWFII-6 (RS), (d1) WDWFII-17 (IB), (d2) WDWFII-17 (RS), (e1) comparisons with the reference de-noising methods

(IB), (e2) comparisons with the reference de-noising methods (RS).

ESNR levels. It is also observed that the type-II ratings worsen

for smaller ESNR, issue that was expected according to the

analysis of the previous paragraph. According to the results of

Fig. 24, 6-band implementations seem to have greater

performance from the 17-band ones, in the case of the IB

patterns and especially for low ESNR values. On the other

hand, 17-band modules work slightly better in the RS case, as it

is concluded from Fig. 23, when comparing subplots b1, d1

with the corresponding b2, d2. Examining Fig. 25, we observe

that type-II WDWF modules provide smaller spectral distor-

tions, due to the fact that the filter attenuation is softer. All the

modules tend to approach ‘‘perfect de-noising’’, as the ESNR

level gets higher, especially for values greater than 10 dB,

where the CCI index is about 0.95 and more.


Fig. 25. Performance evaluation based on the spectral distortion measure—SDM [dB], as a function of the effective signal to noise ratio (ESNR), separately for

intestinal bursts (IB) and regularly sustained (RS) patterns (presented in the left and right side, respectively): (a1) WDWFI-6 (IB), (a2) WDWFI-6 (RS), (b1) WDWFI-17

(IB), (b2) WDWFI-17 (RS), (c1) WDWFII-6 (IB), (c2) WDWFII-6 (RS), (d1) WDWFII-17 (IB), (d2) WDWFII-17 (RS), (e1) comparisons with the reference de-noising

methods (IB), (e2) comparisons with the reference de-noising methods (RS).

From all the three Figs. 23–25, it is obvious that WDWF

modules exhibit robust enhancement even for severe noise

contamination, which is by far superior when compared to all

reference methods, with the counterbalance that threshold

estimation is not fully automated. SQTWOLOG-(hard) and

WTST–NST (Fadj = 4) methods are very good choices for the

case of IB patterns, while RIGSURE-(soft) and WTST–NST

(Fadj = 3) work better with RS ones. An examination of the

Figs. 23–25 leads to the conclusion that all WDWF approaches

present better results than those of ‘‘RIGSURE’’-(soft) and

‘‘SQTWOLOG’’-(hard) wavelet de-noisers, as well as from

both the tested WTST–NST configurations. Thus, WDWF

modules preserve the advantages of robust noise elimination

and minimal signal distortion.


4. Discussion

This paper deals with the problem of additive broadband

noise in bio-acoustic signals. Four novel Wavelet Domain

Wiener Filter implementations have been successfully applied

and tested for the case of abdominal vibration recordings and

enhancement of the captured bowel sounds. The new methods

combine advantages of standard wavelet domain thresholding

strategies, such as robust noise elimination, as well as minimal

signal distortion suggested by Wiener filtering approaches.

Some of the surplus advantages of the suggested approaches,

are, (a) the fact that the method can be applied to any signal

length, with the computational overhead being the only

restriction, (b) it is also ideal for long-term, frame-based

simplified processing, avoiding to produce signal disconti-

nuities and keeping complexity to minimum. In fact, the overall

computational cost is quite affordable for the achieved de-

noising results, in contrast to other, often iterative approaches

that provide similar results. Another issue is the elimination of

the birdy noise and the related residuals, by applying slight

perceptual criteria. As a result, the enhancement operation

barely produces artefacts, while non-audible signal components

are reasonably bounded. Comparison with the related works of

auto-threshold EBS de-noising algorithms, proved that WDWF

work better for both IB and RS patterns. However, the

incorporation of some of their auto-threshold capabilities seems

very promising and the implementation of related combined

methods is currently examined. From the results, it can be said

that the proposed methods can be applied efficiently to almost

any sound signal, in contrast to previous studies of rough

wavelet-based threshold de-noising that work best for specific

signal types. Experimental results using tones, chirp-z signals,

even noise-buried speech, strengthen these prospects.

Acknowledgments

Authors wish to thank Assist. Prof. L.J. Hadjileontiadis for

his valuable contribution by providing his implemented

algorithms for comparison purposes. Authors would also like

to thank Dr. Marina Joannopoulou for carefully proofreading

and correcting the English language and style in this paper.

References

[1] X. Yang, K. Wang, S.A. Shamma, Auditory representations of acoustic

signals, IEEE Trans. Inf. Theory 38 (March (2)) (1992) 824–839.

[2] W. Etter, G. Moschytz, Noise reduction by noise adaptive spectral

magnitude expansion, J. Audio Eng. Soc. 42 (May (5)) (1994) 341–349.

[3] J. Berger, R.D. Coifman, M.J. Goldberg, Removing noise from music

using local trigonometric bases and wavelet packets, J. Audio Eng. Soc. 42

(October (10)) (1994) 808–818.

[4] D.E. Tsoukalas, J. Mourjopoulos, G. Kokkinakis, Perceptual filters for

audio signal enhancement, J. Audio Eng. Soc. 45 (January/February (1/2))

(1997) 22–36.

[5] M. Rosa, F. Lopez, P. Jarabo, S. Maldonado, N. Ruiz, A new algorithm for

translating psycho-acoustic information to the wavelet domain, Signal

Process. 81 (2001) 519–531.

[6] P.J. Wolfe, S.J. Godsill, Perceptually motivated approaches to music

restoration, J New Music Res. 30 (January (1)) (2001) 83–92.

[7] J. Yao, Y.-T. Zhang, Bionic wavelet transform: a new time–frequency

method based on auditory model, IEEE Trans. Biomed. Eng. 48 (August

(8)) (2001) 856–863.

[8] J. Johnston, Transform coding of audio signals using perceptual noise

criteria, IEEE J. Selected Areas Commun. 6 (February (2)) (1988) 314–323.

[9] R. Coifman, M.V. Wickerhauser, Adapted waveform ‘de-noising’ for

medical signals and images, IEEE Eng. Med. Biol. 14 (September/October

(5)) (1995) 578–586.

[10] L.J. Hadjileontiadis, S.M. Panas, Separation of discontinuous adventitious

sounds from vesicular sounds using a wavelet-based filter, IEEE Trans.

Biomed. Eng. 44 (December (12)) (1997) 1269–1281.

[11] L.J. Hadjileontiadis, S.M. Panas, A wavelet-based reduction of heart

sound noise from lung sounds, Int. J. Med. Inf. 52 (1998) 183–190.

[12] R. Coifman, M.V. Wickerhauster, Entropy based algorithms for best basis

selection, IEEE Trans. Inf. Theory 38 (March (2)) (1992) 713–718.

[13] D.L. Donoho, De-noising by soft-thresolding, IEEE Trans. Inf. Theory 41

(May (3)) (1995) 613–627.

[14] Adelino R. Ferreira da Silva, Bayesian wavelet denoising and evolutionary

calibration, Digital Signal Process. 14 (2004) 566–589.

[15] Y. Zheng, David B.H. Tay, L. Li, Signal extraction and power spectrum

estimation using wavelet transform scale space filtering and Bayes

shrinkage, Signal Process. 80 (2000) 1535–1549.

[16] A. Gupta, S.D. Joshi, S. Prasad, A new method for estimating wavelet with

desired features from a given signal, Signal Process. 85 (2005) 147–161.

[17] D. Leporini, J.-C. Pesquet, Bayesian wavelet denoising: Besov priors and

non-Gaussian noise, Signal Process. 81 (2001) 55–67.

[18] H. Qiu, J. Lee, J. Lin, G. Yu, Wavelet filter-based weak signature detection

method and its application on rolling element bearing prognostics, J.

Sound Vib. 289 (4–5) (2006) 1066–1090. doi:10.1016/j.jsv.2005.03.007.

[19] Y.Y. Kim, J.C. Hong, Frequency response function estimation via a robust

wavelet de-noising method, J. Sound Vib. 244 (2001) 635–649.

doi:10.1006/jsvi.2000.3509.

[20] G. Kalliris, New Techniques to Speech and Music Audio Restoration,

Ph.D. Thesis, Aristotle University of Thessaloniki, 1995.

[21] G. Kalliris, Ch. Dimoulas, G. Papanikolaou, Broad-Band Acoustic Noise

Reduction using a Novel Frequency Depended Parametric Wiener Filter:

Implementations using Filter-bank, STFT and Wavelet Analysis/Synthesis

Techniques, Audio Engineering Society Preprint, Proceedings of the 110th

AES Convetion, Amsterdam, May 2001 (Preprint 5382).

[22] S.V. Vaseghi, Advanced Digital Signal Processing and Noise Reduction,

second ed., John Wiley & Sons Ltd., New York, USA, 2000, , ISBN: 0-

471-62692-9.

[23] C. Pastiadis, G. Papanikolaou, Ch. Dimoulas, A. Kalampakas, Intestinal

Motility Recording and Analysis, Proceedings of the VIII Mediterranean

Conference on Medical and Biological Engineering and Computing

Medicon ’98, Lemesos, Cyprus, 1998.

[24] C.A. Dimoulas, G.M. Kalliris, G.V. Papanikolaou, A. Kalampakas, Long

term signal detection, segmentation and summarization using wavelets

and fractal dimension: a bioacoustics application in gastrointestinal

motility monitoring, Comput. Biol. Med. (2006) doi:10.1016/j.comp-

biomed.2006.08.013, Elsevier (special issue on wavelet-based algorithms

for medical problems), in press.

[25] D.A. Drossman, The functional gastrointestinal disorders and the Rome II

process, Gut 45 (Suppl. II) (1999) II1–II5.

[26] N.J. Talley, et al., B. Functional gastroduodenal disorders, in: D.A.

Drossman (Ed.), ROME II: The functional Gastrointestinal Disorders,

Second ed., Degnon Associates, 2000, pp. 299–350 (Chapter 6).

[27] W.G. Thompson, C. Functional bowel disorders and D. Functional

abdominal pain, in: D.A. Drossman (Ed.), ROME II: The Functional

Gastrointestinal Disorders, second ed., Degnon Associates, 2000, pp. 351–

432 (Chapter 7).

[28] G. Holtmann, P. Enck, Stress and gastrointestinal motility in humans: a

review of the literature, J. Gastrointest. Motil. 3 (4) (1991) 245–254.

[29] G. Vantrappen, J. Janssens, G. Coremans, R. Jian, Gastrointestinal motility

disorders, Dig. Dis. Sci. 31 (September (9)) (1986) 5S–25S.

[30] N.W. Weisbrodt, Motility of the small intestine, in: L.R. Johnson (Ed.),

second ed., Physiology of the Gastrointestinal Tract, vol. 1, Raven, New

York, 1987, pp. 631–663.

http://dx.doi.org/10.1016/j.compbiomed.2006.08.013

http://dx.doi.org/10.1016/j.compbiomed.2006.08.013


[31] J.E. Kellow, R.C. Gill, D.L. Wingate, Prolonged ambulant recordings of

small motility demonstrate abnormalities in the irritable bowel syndrome,

Gastroenterology 98 (May (5)) (1990) 1208–1218.

[32] B.L. Bardakjian, Gastrointestinal system, in: J.D. Bronzino (Ed.), The

Biomedical Engineering Handbook, vol. 6, IEEE Press and CRC Press

Inc., 1995, pp. 57–69.

[33] S. Sharna, in: N.W. Read (Ed.), Intestinal Manometry to Evaluate

Intestinal Motor Function, in Gastrointestinal Motility: Which Test?,

vol. 18, Wrighton Biomedical Publishing Ltd., 1989, pp. 157–166.

[34] J. Chen, B.D. Schirmer, R.W. McCallum, Measurement of electric activity

of the human small intestine using surface electrodes, IEEE Trans.

Biomed. Eng. 40 (June (6)) (1993) 598–602.

[35] J. Garcia-Casado, J.L. Martinez-de-Juan, J.L. Ponce, Non-invasive

measurement and analysis of intestinal myoelectrical activity using

surface electrodes, IEEE Trans. Biomed. Eng. 52 (June (6)) (2005)

983–991.

[36] H. Yoshino, Y. Abe, T. Yoshino, K. Oshato, Clinical application of spectral

analysis of bowel sounds in intestinal obstruction, Dis. Col. Rect. 33

(September (9)) (1990) 753–757.

[37] C. Liatsos, L.J. Hadjileontiadis, C. Mavrogiannis, D. Patch, S.M. Panas,

A.K. Burroughs, Bowel sounds analysis: a novel non-invasive method

for diagnosis of small-volume ascites, Dig. Dis. Sci. 48 (2003) 1630–

1636.

[38] D. Dalle, G. Devroede, R. Thibault, J. Perrault, Computer analysis of BS,

Comput. Biol. Med. 4 (1975) 247–256.

[39] C. Gamer, H. Ehrenreich, Non invasive topographic analysis of intestinal

activity in man on the basis of acoustic phenomena, Res. Exp. Med. (Bed.)

189 (2) (1989) 129–140.

[40] T. Tomomasa, A. Morikawa, R.H. Sandler, H.A. Mansy, H. Koneko, T.

Masahiko, P.E. Hyman, Z. Itoh, Gastrointestinal sounds and migrating

motor complex in fasted humans, Am. J. Gastroenterol. 94 (February (2))

(1999) 374–381.

[41] T. Tomomasa, A. Takahashi, Y. Nako, H. Kaneko, M. Tabata, Y. Tsuchida,

A. Morikawa, Analysis of gastrointestinal sounds in infants with pyloric

stenosis before and after pyloromyotomy, Pediatrics 104 (November (5))

(1999) e60.

[42] Ch. Dimoulas, G. Papanikolaou, G. Kalliris, C. Pastiadis, Computer aided

systems for prolonged recording and analysis of human bowel sounds, J.

Acoust. Soc. Am. 105 (February (2)) (1999) 1102 (Abstract).

[43] Ch. Dimoulas, G. Papanikolaou, G. Kalliris, C. Pastiadis, Computer aided

systems for prolonged recording and analysis of human bowel sounds, in:

Proceedings of the JOINT the ASA/EAA/DEGA MEETING ON ACOUS-

TICS, Berlin, March, 1999 (CD-ROM).

[44] B. Craine, M. Silpa, C. O’Toole, Computerized auscultation applied to

irritable bowel syndrome, Dig. Dis. Sci. 44 (9) (1999) 1887–1892.

[45] B. Craine, M. Silpa, C. O’Toole, Enterotachogram analysis to distinguish

irritable bowel syndrome from Crohn’s disease, Dig. Dis. Sci. 46 (9)

(2001) 1974–1979.

[46] B. Craine, M. Silpa, C. O’Toole, Two-dimensional positional mapping of

gastrointestinal sounds in control and functional bowel syndrome patients,

Dig. Dis. Sci. 47 (2002) 1290–1296.

[47] W.B. Cannon, Auscultation of the rhythmic sounds produced by the

stomach and intestine, Am. J. Physiol. 13 (1905) 339–353.

[48] L.J. Hadjileontiadis, C.N. Liatsos, C.C. Mavrogiannis, T.A. Rokkas, S.M.

Panas, Enhancement of bowel sounds by wavelet-based filtering, IEEE

Trans. Biomed. Eng. 47 (July (7)) (2000) 876–886.

[49] H.A. Mansy, R.H. Sandler, Bowel-sound signal enhancement using

adaptive filtering, IEEE Eng. Med. Biol. Mag. 16 (November/December

(6)) (1997) 105–117.

[50] C.N. Liatsos, L.J. Hadjileontiadis, C.C. Mavrogiannis, T.A. Rokkas, S.M.

Panas, Enhanced De-Noising of Bowel Sounds Using a Wavelet-Based

Filter, Proceedings of the VIII Mediterranean Conference on Medical and

Biological Engineering and Computing MEDICON ’98, Lemesos,

Cyprus, 1998.

[51] R. Ranta, C. Heinrich, V. Louis-Dorr, D. Wolf, F. Guillemin, Wavelet-

based bowel sounds denoising, segmentation and characterization, in:

Proceedings of 23rd Annual International Conference IEEE EMBS,

Istanbul, Turkey, October 25–28, (2001), pp. 1903–1906.

[52] L.J. Hadjileontiadis, I.T. Rekanos, Enhancement of explosive bowel

sounds using kurtosis-based filtering, in: Proceedings of 25th IEEE EMBS

2003, Cancun, Mexico, September, (2003), pp. 2479–2482.

[53] R. Ranta, C. Heinrich, V. Louis-Dorr, D. Wolf, Interpretation and

Improvement of an Iterative Wavelet-Based Denoising Method, IEEE

Signal Process. Lett. 10 (August (8)) (2003) 239–241.

[54] L.J. Hadjileontiadis, Wavelet-based enhancement of lung and bowel

sounds using fractal dimension thresholding—Part I: methodology, IEEE

Trans. Biomed. Eng. 52 (June (6)) (2005) 1143–1148.

[55] L.J. Hadjileontiadis, Wavelet-based enhancement of lung and bowel

sounds using fractal dimension thresholding—Part II: application results,

IEEE Trans. Biomed. Eng. 52 (June (6)) (2005) 1050–1064.

[56] I.T. Rekanos, L.J. Hadjileontiadis, An iterative kurtosis-based technique

for the detection of nonstationary bioacoustic signals, Signal Process. 86

(2006) 3787–3795.

[57] C. Dimoulas, G. Kalliris, G. Papanikolaou, A. Kalampakas, Abdominal

sounds pattern classification using advanced signal processing and arti-

ficial intelligence, in: Proceedings of the International Conference on

Computational Intelligence for Modelling Control and Automation

(CIMCA 2003), Vienna, February, (2003), pp. 71–82.

[58] Ol. Rioul, M. Vetterli, Wavelet and Signal Processing, IEEE Signal

Process. Mag. 8 (October (4)) (1991) 14–38.

[59] F.T. Agerkvist, A time–frequency auditory model using wavelet packets, J.

Audio Eng. Soc. 44 (January/February (1/2)) (1996) 37–50.

[60] M. Unser, T. Blu, Wavelet theory demistified, IEEE Trans. Signal Process.

51 (February (2)) (2003) 470–483.

[61] T. Gulzow, A. Engelsberg, U. Heute, Comparison of a discrete wavelet

transformation and a uniform polyphase filterbank applied to spectral

subtraction speech enhancement, Signal Process. (1997) 5–19.

[62] S. Mallat, A Wavelet Tour of Signal Processing, second ed., Academic

Press, Elsevier, USA, 1999, , ISBN: 0-12-466606-X.

[63] R.A. Wannamaker, Ed.R. Vrscay, Fractal wavelet compression of audio

signals, J. Audio Eng. Soc. 45 (July/August (7/8)) (1997) 540–553.

[64] Ol. Rioul, P. Duhamel, Fast algorithms for discrete and continuous

wavelets transforms, IEEE Trans. Inf. Theory 38 (March (2)) (1992)

569–586.

[65] S. Boll, Suppression of acoustic noise in speech using spectral subtraction,

IEEE Trans. Acoust. Speech Signal Process. ASSP-27 (April (2)) (1979)

113–120.

[66] N. Virag, Single channel speech enhancement based on masking proper-

ties of the human auditory system, IEEE Trans. Speech Signal Process. 7

(March (2)) (1999) 126–137.

[67] A. Spriet, M. Moonen, J. Wouters, Spatially pre-processed speech dis-

tortion weighted multi-channel Wiener filtering for noise reduction, Signal

Process. 84 (2004) 2367–2387.

[68] A. Spriet, M. Moonen, J. Wouters, The impact of speech detection errors

on the noise reduction performance of multi-channel filtering and general-

ized sidelob cancellation, Signal Process. 85 (2005) 1073–1088.

[69] S.Y. Yuen, C.K. Fong, K.L. Chan, Y.W. Leung, Fractal dimension

estimation and noise filtering using Hough transform, Signal Process.

84 (2004) 907–917.

[70] Jianye. Ching, Albert C. To, Steven D. Glaserb, Microseismic source

deconvolution: Wiener filter versus minimax, Fourier versus wavelets, and

linear versus nonlinear, J. Acoust. Soc. Am. 115 (June (6)) (2004) 3048–

3058.

[71] E. Zwicker, H. Fastl, Psychoacoustics Facts and Models, Springer Series

in Information Sciences, Springer-Verlag, Berlin, 1990, , ISBN: 3-540-

52600-5.

[72] A. Quinquis, A few practical applications of wavelet packets, Digital

Signal Process. 8 (1998) 49–60.

[73] C.P. Chan, P.C. Ching, T. Lee, Noisy speech recognition using de-noised

multiresolution analysis acoustic features, J. Acoust. Soc. Am. 110

(November (5)) (2001) 2567–2574.

[74] M. Sablatash, T. Cooklev, Compression of high-quality audio signals,

including recent methods using wavelet packets, Digital Signal Process. 6

(1996) 96–107.

[75] J.P. Dron, F. Bolaers, I. Rasolofondraibe, Improvement of the sensitivity of

the scalar indicators (crest factor, kurtosis) using a de-noising method by


spectral subtraction: application to the detection of defects in ball bear-

ings, J. Sound Vib. 270 (2004) 61–73. doi:10.1016/S0022-460X(03)

00483-8.

[76] National Instruments Corporation, Sound and Vibration Toolkit User

Manual, Austin, Texas, USA, April 2004 edition.

[77] R.J. McAulay, M.L. Malpass, Speech enhancement using a soft-decision

noise suppression filter, IEEE Trans. Acoust. Speech Signal Process.

ASSP-28 (April (2)) (1980) 137–145.

[78] P.K. Ramarapu, R.C. Maher, Methods for reducing audible artifacts in a

wavelet based broad-band denoising system, J. Audio Eng. Soc. 46 (March

(3)) (1998) 178–190.

[79] T. Painter, A. Spanias, Perceptual coding of digital audio, Proc. IEEE 88

(April (4)) (2000) 451–515.

[80] I. Daubechies, Orthonormal bases of compactly supported wavelets,

Commun. Pure Appl. Math. 41 (1988) 909–996.

[81] N. Ruiz, M. Rosa, F. Lopez, P. Jarabo, Adaptive wavelet-packet analysis

for audio coding purposes, Signal Process. 83 (2003) 919–929.

[82] National Instruments Corporation, Joint time frequency analysis, in:

Signal Processing Toolset Reference Manual, Austin, Texas, USA,

December 2002 edition.

[83] National Instruments Corporation, Wavelet Analysis, in: Signal Proces-

sing Toolset Reference Manual, Austin, Texas, USA, December 2002

edition.

[84] M. Kazama, M. Tohyama, Estimation of speech components by acf

analysis in a noisy environment, J. Sound Vib. 241 (1) (2001) 41–52.

doi:10.1006/jsvi.2000.3275.

[85] H.W. Lou, G.R. Hu, An approach based on simplied KLT and wavelet

transformfor enhancing speech degraded by non-stationary wideband

noise, J. Sound Vib. 268 (2003) 717–729. doi:10.1016/S0022-460X

(02)01556-0.