Assessment of military intercom headsets for maximum voice reproduction level in high noise...

Post on 23-Apr-2023

1 views 0 download

Transcript of Assessment of military intercom headsets for maximum voice reproduction level in high noise...

Assessment of military intercom headsets for maximum voice

reproduction level in high noise conditions

S.M. Potirakisa, N.-A. Tatlasa, N. Zafeiropoulosb, T. Ganchevc, and M. Rangoussia.

a. Department of Electronics, Technological Education Institute (TEI) of Piraeus, 250 Thivon & P.

Ralli, GR-12244, Aigaleo, Athens, Greece, {spoti; ntatlas; mariar}@teipir.gr.

b. Acoustics, Audio and Video Engineering, Newton Building, University of Salford, Greater

Manchester M5 4WT, N.Zafeiropoulos@edu.salford.ac.uk.

c. Division of Electronics and Microelectronics, Faculty of Electronics, Technical University – Varna,

9010 Varna, Bulgaria, tganchev@ieee.org

Abstract

Intercom headsets are mandatory communication apparatus in high noise environments

(HNE). The headset selection in HNE, such as combat vehicles, is crucial for achieving the

objectives of communication, as it serves the needs for both noise reduction and voice

reproduction. Although military-grade intercom headsets are typically used under extreme

environmental conditions, a standard performance evaluation method exists only for the

earphone elements. In the present work we propose an integrated method for the assessment

of the electroacoustic performance of HNE headsets in conditions of maximum reproduction

level and high environmental noise, focusing on the voice communication quality. Objective

methods, such as Automatic Speech Recognition (ASR), Perceptual Evaluation of Speech

Quality (PESQ) and Speech Transmission Index (STI) are comparatively evaluated and their

results are compared to subjective scores using Multiple Stimuli with Hidden Reference and

Anchor (MUSHRA) in order to reveal the best fit metrics.

Keywords: electroacoustic assessment; military intercom headsets; headset testing;

reproduction level; distortion; communication quality; ANR; ASR; PESQ; STI; MUSHRA

*Please address correspondence to:

Stelios M. Potirakis

Assistant Professor

Dept. of Electronics Eng.,

Technological Education Institute (TEI) of Piraeus

250 Thivon & P. Ralli

GR-12244 Aigaleo - Athens

GREECE

Tel./ FAX: +30 2105381550

Mob.: +30 6947934056

office : ZB106

e-mail : spoti@teipir.gr

url : http://audio.teipir.gr/

Please cite as:

S.M. Potirakis, N.-A. Tatlas, N. Zafeiropoulos, T. Ganchev, M. Rangoussi, «Assessment of military intercom

headsets for maximum voice reproduction level in high noise conditions», Applied Acoustics 74 (2013) 870–

881, doi: 10.1016/j.apacoust.2012.12.009

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 2

1. Introduction Intercoms provide the means of speech communication in high noise environments

(HNE), which is otherwise difficult or even impossible. The headset selection for such an

application is crucial since it serves both noise attenuation (protection) and voice

reproduction (communication) purposes. Only recently it has been acknowledged in

telecommunications [1] that headset electroacoustic measurements should be performed on a

Head and Torso Simulator (HATS). Actually, the existing standardized measurement

methods regarding telecommunication-related sensitivity and frequency characteristics [2,3],

are still used but a HATS is introduced as a measurement apparatus. However, in defense

applications, where headsets are mainly used in HNE, a standard performance evaluation

method does not exist. The single standard method used for sensitivity measurements refers

to the earphone elements only, [4], thus ignoring the impact of the acoustics of the earcup

cavity, the absorbing materials, the head/face fitting quality and the headset positioning.

It is well known that the reproduced speech level should be at least 10 dB above noise

level (without severe distortion) to achieve marginally acceptable intelligibility. Therefore,

sensitivity, maximum reproduction levels under specific distortion limit and noise attenuation

capability should be jointly assessed to evaluate a HNE intercom headset.

Inspired by the recently published work of J. Cui et al., [5] (followed by the relative

ANSI S12.42 standard [6]), a systematic methodology for measurement and performance

evaluation of HNE headsets has been proposed in [7,8]. This is based on the use of Acoustic

Test Fixtures (ATFs), like HATS, and addresses both signal reproduction and noise reduction

issues, while standardized methods are employed and measurements are maintained as close

as possible to the existing telecommunication and military standards. Headset electroacoustic

reproduction measurements, i.e., sensitivity, maximum sound pressure level (SPL) and

harmonic distortion (HD), are proposed to be acquired as a function of frequency.

Nevertheless, beyond hearing protection issues, the most important evaluation

parameter regarding the usefulness of a HNE intercom headset is the voice communication

quality under realistic conditions. This is not straightforwardly assessed by maximum SPL

and HD measurements, especially for Active Noise Reduction (ANR) headsets. In this case,

the environmental noise has a very important role on the intelligibility and the overall

performance, as it potentially affects both reproduction frequency response and distortion.

Moreover, the ANR functionality may also be influenced by maximum SPL reproduction

conditions, leading to worst noise reduction or even higher distortion.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 3

In this article a novel integrated method is proposed for the assessment of the

electroacoustic performance of HNE headsets in conditions of maximum reproduction level

and high environmental noise, focusing on the problem of assessing the communication

quality of a HNE intercom headset via objective measurements. Specifically, an objective

voice-based metric for the characterization of the voice communication quality in conditions

of maximum SPL reproduction and high environmental noise is suggested. In brief, given a

specific HD upper limit, the input level leading to maximum SPL is determined and the voice

reproduction of different ANR military headsets is assessed by employing widely accepted

objective evaluation methods for voice intelligibility and quality. Alternative objective

evaluation methods are examined; correlation of the results to the corresponding subjective

results is employed as eligibility criterion for the selection of the most suitable method for the

problem at hand. It should be noted that the proposed method is mainly suitable for

comparative study and quality control measurements.

2. Electroacoustic assessment of HNE headsets

2.1 Related work and motivation

The evaluation of the voice reproduced through HNE headsets has already been an

active research topic for various users / environments of interest (aircraft pilots in flight,

armored vehicle crew in operation, etc.) [9-12]. The quality of speech communication was

found to be critical for the accomplishment of the mission at hand, as well as for the

personnel safety and survival, [13]. The need to take into account both noise attenuation

capability and the electroacoustic properties, in respect to the received voice communication,

for an intercom headset, has already been pointed out as early as in 1975, [14]. It has been

found, [15], that it is required to adequately raise the reproduced voice level above the noise

finally reaching the ear in order to achieve acceptable intelligibility of speech.

A high voice level is necessary in order to obtain an acceptable intelligibility, even for

a headset well attenuating the noise in the frequency range that is important for intelligibility.

This necessity results from the psycho-acoustic masking of the high frequency components of

the communication by the low frequency components of the noise (which are usually not

enough attenuated) and by the poor quality of the transmission channel [16]. On the other

hand, in extremely noisy environments, the voice reproduction levels may become

sufficiently high to induce temporary hearing threshold shift in the user, [13]; this variable

has therefore to be restricted within the safety margin.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 4

As a consequence, evaluation of the communication quality offered by a military

headset is both a multi-parametric task and an important factor of its electroacoustic

performance; this is especially true under the worst case scenario of maximum SPL

reproduction in the presence of extremely high environmental noise. Related studies [9,17,18]

rely on subjective and/or objective methods for the relative quality assessment of alternative

solutions, the later being preferable from a practical point of view.

Communication quality primarily refers to the achieved speech intelligibility. The

former term is adopted here, however, in order to stress the fact that the ultimate criterion is

the end-user’s opinion about the offered communication quality and not just intelligibility.

Voice quality seems to be very important for end-users of military HNE headsets. In order to

reduce distraction and facilitate the completion of their main task during a military operation,

the cognitive load and the level of distraction due to the communication effort should be kept

low; difficulty to identify the speaker and distortions such as strange-sounding or corrupted

voice are therefore not desirable. Speaker recognition is, for example, extremely important

during wired or radio communication of the (authorized) main battle tank (MBT) crew

members to others outside the vehicle, whether on the same or between different military

hierarchy levels. Therefore, the broader feature of communication quality and not just

intelligibility is expected to be effectively offered by a HNE military headset.

2.3 The proposed approach

The proposed method for the assessment of the electroacoustic performance of HNE

headsets in conditions of maximum reproduction level and high environmental noise is

comprised of the following actions:

1. Given a specific HD upper limit, the maximum reproduced SPL is determined on

ATF as a function of frequency, using a log-swept chirp signal, at office level noise

conditions and in the presence of two different ambient noises (Pink noise and MBT–like

noise), at three different levels (90, 110, and 120 dBSPL(LIN)) (cf. paragraph 3.3.1).

2. A reference anechoic voice signal (of peak amplitude equal to the corresponding

sinus leading to the maximum SPL for each case) is reproduced by the headset and recorded

by the ATF in presence of the above noise environment cases (cf. paragraph 3.3.2).

3. In case of an ANR headset the above measurements are conducted both for

activated (ANR-On) and deactivated (ANR-Off) ANR.

4. All measurements are performed for both sides of the headset, in order to

investigate any differences in terms of amplitude and frequency content that can be

introduced due to variability in the acoustics and the electroacoustics at each side.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 5

5. Communication quality in conditions of maximum SPL reproduction in the

presence of different noise types and levels is assessed using the previously recorded voice

reproduction and a voice-based objective evaluation method (cf. subsection 3.4).

Such a method would be equally valuable at the development, the production and the

quality control stage of the headsets, which is of high importance for intercom developers and

end-users.

3. Material and methods

The following subsections outline the headsets, the measurement setup employed, as

well as the measurement and the evaluation methods used in the experimental work.

3.1. Headsets

The ANR Combat Vehicle Crewman (CVC) headsets are considered state-of-the-art

and recently employed for the intercommunication of the crew members of heavy armored

vehicles by a number of modern armies. Three ANR military CVC headsets of different

vendors and different design philosophy have been employed in this study. The first ANR

headset, hereafter referred to as “HS1”, uses two separate speakerphones, one for the voice

reproduction and another for the anti-noise signal emission. On the other hand, the second

ANR headset, hereafter referred to as “HS2”, as well as the third headset, hereafter referred to

as “HS3”, use a single speakerphone for both the voice and anti-noise signals, however they

employ different ANR systems. A standard1 CVC headset on the HATS of the measurement

setup is shown in Fig. 1a, while also in Fig. 2 of [8],

3.2. Measurement setup

The HATS Bruel & Kjaer type 4128C complemented with the left ear simulator Bruel

& Kjaer type 4159C was the ATF used throughout the measurements. Tests were carried in a

double wall room (Fig. 1). The HATS, the ambient noise microphone and the noise

reproduction speakers were located inside the room, while the rest of the instruments were

located outside of it. The ambient noise microphone was monitoring the reproduced noise at

the height of the left ear, aligned to the left shoulder, to ensure the correctness of the

reproduced noise spectra.

1 It is noted that the headset depicted in Fig. 1a was not one of the measured headsets HS1, HS2, HS3. The list

of the available CVC headsets on the market is really short; therefore the actually measured headsets are not

depicted here in order to ensure their anonymity.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 6

(a) (b)

Fig. 1. Test setup for the electroacoustic reproduction measurements based on HATS: (a)

instrument connections along with a typical CVC placed on the HATS, and (b) spatial

arrangement inside the noise room.

3.3. Measurement methods

Two measurement methods were employed here for the electroacoustic evaluation of

the headsets under test. The first one was intended to determine the input level leading to

maximum SPL for a specific HD upper limit in presence of different ambient noises. The

second one was reproducing a standard reference voice signal through each ANR headset at

the previously determined input level and recording the signal reaching HATS ear in

presence of different ambient noises.

Prior to any electroacoustic measurement, the Insertion Lost (IL) of each headset

under test was measured according to the method presented in [5], as adapted in [7], and

under the environmental noise conditions proposed in [8] (see also paragraph 3.3.1), and its

performance was noted. Both electroacoustic measurements described in this paper employed

the HATS as the main acoustic measurement apparatus; thus, the correct placement of the

headset under test on the HATS was crucial. Therefore, before the conduction of an

electroacoustic measurement the correct placement of the headset under test was verified by

the achievement of its a-priori known IL. It must be noted here that HS2 was the easiest to

mount on the HATS, while HS3, which was mainly relying on a neckband for the tight

positioning, was very difficult to mount reliably.

3.3.1. Maximum reproduction level under specific HD limits

The maximum reproduction level under specific HD upper limit conditions as a

metric of the electroacoustic performance of a HNE headset was first introduced by [7]. Αn

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 7

improvement of this method was proposed in [8] measuring the frequency response along

with the harmonic distortion, up to the 9th harmonic, using the test setup of Fig. 1 and the

log-swept chirp-based method of [19], by increasing the reproduction level up to the point

that HD is under -15 dB (≈17.8%) throughout the whole reproduction bandwidth, defined

typically within the limits [300, 4000] Hz (toll-quality bandwidth).

According to this method, the maximum reproduction level was measured at different

noise environments, including office level noise conditions, and different levels of artificially

produced noise. Pink noise and MBT–like noise (see Fig. 2) at 90, 110, and 120 dBSPL(LIN), at

the height of the ear (aligned to the shoulder) of the HATS were proposed as a realistic set of

representative high noise environments. Note that the lowest part of an MBT noise (<80 Hz)

was not reproduced as restricted by the employed loudspeakers, however this was not

considered an important limitation.

Measurement under high level noise is mandatory for ANR headsets, for which the

environmental noise characteristics (statistics, frequency content and level) are very

important on the overall performance, as they potentially affect both reproduction frequency

response and distortion. The maximum reproduction level frequency response in parallel to

distortion measurement is feasible since the proposed measurement method is convolution

based. By this method, uncorrelated background noise is not restrictive under reasonable

signal to noise ratios, which are considered to be achieved by military headsets at maximum

reproduction level conditions.

Fig. 2. MBT-like noise third octave spectrum with an overall level of ~110 dBSPL(LIN).

3.3.2. Voice recordings for maximum reproduction level in noise environment

The voice measurements were performed by reproducing the reference anechoic

“male 2” English male voice signal, as provided with ITU-T-REC-P.501 [20]. For each

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 8

{headset under test – noise condition} combination, the stimulus level was set so that the

voice stimulus maximum peak voltage was equal to the revealed (during the log-swept chirp-

based maximum reproduction SPL measurements, for the specific upper HD limit) maximum

sinus peak voltage. The signal (reproduced voice and noise) reaching HATS ear in presence

of different ambient noises was finally recorded.

It is noted that, due to the high crest factor of voice it is expected that generally lower

distortion will be present in the recorded voice compared to the sinus estimated HD during

the maximum reproduction SPL measurements. However, due to the same reason, peak-

signal equivalence was considered a better choice compared to a power-wise equivalence,

since the second one could lead to severe distortion (peak clipping) of the reproduced voice.

Nevertheless, the aim of this study is to assess communication quality through appropriate

objective metrics and not just by comparison of a long term mean ( eqL ) of the maximum

reproduced voice level against the corresponding of the acoustic noise level reaching the ear.

3.4. Communication quality evaluation methods

As already pointed-out, the focus of the proposed assessment method is on the

suggestion of an objective real voice-based metric for the characterization of the voice

communication quality in conditions of maximum SPL reproduction and high environmental

noise. The investigation involves three different objective voice evaluation metrics. All of

them are real voice-based, evaluating the sound signal, as recorded on ATF (cf. paragraph

3.3.2), in comparison to the electrical signal fed to the headphones as input. The use of real

voice as input signal stems from the idea to employ the degree of correlation of the results to

the corresponding of a subjective evaluation method as eligibility criterion for the selection of

the most suitable objective metric. This way, all the obtained results are based on the same

measured data.

It should be noted that STI [21-23] has become the most commonly employed

objective metric for the evaluation of HNE military headsets [9-12]. Of practical interest is

the fact that STI was found [12] to yield comparable results (rank ordering of alternative

solutions evaluated) with subjective tests, e.g., the Modified Rhyme Test (MRT), across an

adequate range of environmental noise types and levels, and practically the same results when

applied for a real human head (Microphone in Real Ear – MIRE method) or an artificial head

with a headset on.

However, STI presents certain known limitations. For instance, it is reported that the

accuracy in STI is compromised, when the channel distorts non-linearly the speech signals,

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 9

[23], when phase-jitter or linear phase shifts are present or when peak or envelop clipping

results from speech enhancement, e.g., spectral subtraction, or other speech compression and

coding employed, [24,25]. Recently, modified versions of STI, such as the STMI, [26] and

the Artificial Neural Networks (ANN) based form, [27], have been proposed to address these

limitations. Channel evaluation under maximum reproduction level conditions is the aim

here; it is therefore expected that a certain percentage of non-linearity is inevitably present in

the channel, especially for the ANR headsets. This is manifested by the acceptable upper

limit of HD under which measurements were carried out.

Therefore, the STI, which focuses on voice intelligibility, in its real voice-driven

ANN based form, [27] was the first choice for objective assessment. However, as discussed

in Subsection 2.1, voice quality, and not just intelligibility, seems to be very important for

end-users of military HNE headsets. Moreover, the untrained (“naïve”) listeners when asked

to score speech intelligibility and not voice communication quality usually find it difficult to

discriminate these two inextricably linked elements of one's communication capacity and/or

ability – even if a clear definition of speech intelligibility is given to them [28]. As a result

another widely accepted voice-based objective metric, the Perceived Evaluation of Speech

Quality (PESQ), [29], which focuses on voice quality rather than intelligibility, and presents

fundamental differences compared to STI, was employed. Notice, methods like PESQ are

using advanced speech perception models, based on recently developed auditory modeling, to

process the measured speech signals. Finally, we experimented with another quite different

voice-based objective method, the Automatic Speech Recognition (ASR), using a common

experimental setup based on the Hidden Markov Model ToolKit (HTK) [30-32]. Another

reason for the use of these three differently aimed objective metrics was their widespread use,

which renders our results directly comparable to future studies. Indeed, (a) STI is already a

telecommunication standard method for the assessment of voice intelligibility [21,22], (b)

PESQ is accordingly a standard voice quality method [29], and (c) ASR, despite the criticism

on its effectiveness, is also frequently employed in telecommunication quality assessment

studies, e.g., [33 , and references therein], while speech recognition is often employed in

military applications, e.g., [34-36].

The results of these three quite different objective metrics were comparatively studied

to the ones obtained by a subjective evaluation method, namely MUSHRA [37] in a quest for

the most suitable objective metric for the assessment of the communication quality of HNE

military headsets. Specifically, the comparative investigation of the three alternatives

consisted of three parts:

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 10

1. The evaluation results from the three objective evaluation methods were obtained

and comparatively examined for each headset, to assess whether they are generally able to

provide reliable information that roughly follows the communication quality changes.

2. A subset of noise type / level conditions was selected and the subjective method

was carried out for these conditions, in order to render the subjective evaluation feasible.

Further increase of the test cases would result to fatigue and confusion of the subjects leading

to unreliable results.

3. The results of the three alternative objective measures were correlated to those of

the subjective method under matched conditions of noise, ANR and reproduction level, in

order to select the one which follows more closely the trend of the subjective results. This

one is declared to be the most appropriate for communication quality assessment of HNE,

ANR intercom headsets, based on the particular experimental setup.

3.4.1. MUSHRA

In order to evaluate the perceptual quality for a number of cases, a MUSHRA, [37],

test was designed and performed using the MUSHRAM, [38]. In order to render the

subjective evaluation feasible, a subset of the headphone recordings were selected, six for

each model, while a hidden anchor using a steep low-pass filter with cut-off frequency 2000

Hz was also considered. All the reference sounds and test sounds were normalized to the

same loudness level in order to avoid loudness-related effects.

The 30 test subjects recruited as listeners for the subjective test were normal listeners,

without any hearing loss, non-experts. A short introduction regarding the procedure was

given to small groups of 4 subjects during which a clear directive to score the speech

intelligibility and not the overall speech quality was given. Furthermore, the groups were

asked to listen to the reference and test signals, before beginning the test and adjust the

volume to a comfortable level. Also, the existence of a hidden impaired anchor and a hidden

reference were pointed out, as well as a request to use as much of the quality scale dynamic

range. While the test phrases were in a non-native language, all the subjects included in the

test were fluent in English. Given the relatively small subject pool, the testing phase for each

subject was repeated 3 times, with random positioning in each repetition.

Results from subjects that missed the hidden reference twice or more, were rejected.

The mean value over all tests per recording, as well as the standard deviation and the 95%

confidence interval were calculated.

3.4.2. Speech Transmission Index

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 11

In order to avoid the use of the artificial probe signals of speech-like shaped noise, an

alternative STI approach based on running speech probe signals was proposed, [39]; yet,

convenience is traded for accuracy of the STI values thus obtained. In an attempt to address

the running speech STI accuracy problem, the approach proposed in [27] was adopted in the

present work. According to this method, the estimated or predicted STI value is the output of

an ANN fed by the (normalized) power spectra of signal envelopes. Herein, the ANN was

first trained on the traditionally calculated STI values resulting from a set of impulse

responses of CVC headsets, other than the ones evaluated here. The training process tunes the

ANN to produce STI estimates for the convolution of the employed reference voice (cf.

paragraph 3.3.2) with the above impulse responses, close to the corresponding traditionally

calculated STI values. As to the ANN type and architecture, the selection was experimentally

driven. Comparable results were obtained either using a Generalized Regression Neural Net

(GRNN) trained by a gradient-descent back-propagation type of minimization algorithm or

using a feedforward type of Multilayer Perceptron (MLP) of two layers with seven nodes in

the hidden layer and a single output node, trained by the Levenberg-Marquardt back-

propagation type of minimization algorithm. Eight-fold cross-validation is used for

computing the results. The results obtained by the GRNN are the ones presented in the

following.

3.4.3. PESQ

The perceptual evaluation of speech quality (PESQ) [29] is an objective method for

end-to-end speech evaluation of telecommunication systems. It is designed to objectively

predict the subjective quality of 3100 Hz (narrow-band) handset telephony (end-to-end

measurements) and narrow-band speech codecs, two error parameters are computed in a

cognitive model; these are combined to give an objective listening quality MOS (Mean

Opinion Score) in a 0.0 – 5.0 scale..

4. Results and Discussion

The maximum reproduction SPL spectra were first evaluated as described in

paragraph 3.3.1. In order to ensure the precision of results, a frequency response and

harmonic distortion, up to the 9th harmonic, measurement was performed on the power

amplifier driving the headsets (under the worst loading conditions). The results (cf. Fig. 3 in

[8]) assure that both frequency response and harmonic distortion are orders of magnitude

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 12

better than the corresponding results expected from the headset measurements within the

frequency band of interest.

The maximum reproduction SPL for HD under -15 dB, with ANR-on at: (a) office

level noise conditions, (b) pink noise 110 dBSPL(LIN), and (c) MBT-like noise 110 dBSPL(LIN),

are show in Fig. 3, Fig. 4 and Fig. 5 for HS1, HS2 and HS3, respectively. It is apparent from

Fig. 3 that all three cases provide practically identical measurements for HS1. The HS1

maximum reproduction level and HD spectra with ANR-on is practically not affected by the

noise type and level variation. The same is not valid for HS2 and HS3, as arises from Figs 4

and 5, respectively. Although with different spectral details for each headset, we can

conclude that HD was increased in high noise environment conditions, while the most

disturbing noise type was pink noise, for both of them. Moreover, HS3 is also affected in

terms of reproduction spectrum. For both noise types we observe a change (ripple) in

frequency response above 1000 Hz.

The comparison of the noise characteristics and level effect on the above headset

measurements, suggests that the design philosophy of the HS1 ANR headset, which uses two

separate speakerphones, one for the voice reproduction and another for the anti-noise signal

emission, results to a less noise dependent performance than that of HS2 and HS3 ANR

headsets, which use a single speakerphone for both the voice and anti-noise signal. The ANR

subsystem of HS1 is practically not affecting the max SPL voice reproduction in presence of

110 dBSPL(LIN), either pink or MBT-like, noise. The ANR subsystems of HS2 and HS3 are

affecting their corresponding max SPL voice reproduction, while the worst performance is

clearly that of HS3.

The maximum reproduction SPL for HD under -15 dB at 120 dBSPL(LIN) of: (a) pink

noise with ANR-off, (b) pink noise with ANR-on, (c) MBT-like noise with ANR-off, and (d)

MBT-like noise with ANR-on, are show in Fig. 6, Fig. 7 and Fig. 8 for HS1, HS2 and HS3,

respectively.

Comparing the plots in these figures, the first result is the verification of the HS1’s

voice reproduction insensitivity to environmental acoustic noise, even as high as 120

dBSPL(LIN), either with ANR-off or with ANR-on. On the other hand, the activation of HS1

ANR is leading to generally higher reproduction levels and distortion below 2000 Hz, see

Fig. 6.

Concerning HS2, Fig. 7, it is affected over the frequency band of interest in ANR-off

operation, both in regard to SPL and HD response. However, pink noise leads to more severe

disturbance. In ANR-on operation, HS2 is slightly affected by pink noise of this level, while

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 13

for 110 dBSPL(LIN) pink noise it was not affected. Its ANR-on performance under MBT-like

noise remains the same regardless of the noise level increase.

On the other hand, HS3, Fig. 8, presented a quite different behavior, compared to the

other two headsets – a further increase of the reproduced SPL was leading to an increase of

the HD. However, in the case of HS3 a higher SPL reproduction was the only way to achieve

a HD performance as close as possible to the considered limit of -15 dB in presence of 120

dBSPL(LIN) noise. Although it was finally not possible to keep the lower part of the spectrum,

where possibly the ANR subsystem of HS3 is working, under -15 dB HD, the HD at the

reproduction levels depicted in Fig. 5 resulted in much worst HD results. It is noted that

noticeably different maximum responses resulted for different noise types and ANR

operations (on / off) conditions; HS3 is the most affected by these parameters in such high

noise levels.

Judging solely from the max SPL measurements, corresponding to maximum

reproduced level under the -15 dB HD limit, that capture the effect of environmental noise to

the voice reproduction, but not reflecting the noise attenuation, one would expect that HS1

should present the best communication quality performance, followed by HS2, while HS3

should be last in the specific comparison. Indeed, without ignoring its relative high variation

of the SPL versus frequency, HS1 is the loudest one, although probably too loud to use in

max SPL reproduction without additional ear protection (e.g., earplugs), and the least affected

by the environmental noise. Of course, the use of earplugs would lead to communication

quality degradation; therefore it is considered a not suggested solution.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 14

Fig. 3. Max SPL for HD under -15 dB,, with ANR-on at: office level noise

conditions (green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red)

for HS1 (upper panel). The corresponding HD spectra depicted using the same color code

(bottom panel). For color graphics please refer to the on-line version. Both graph windows

share the same, aligned, horizontal frequency axis.

Fig. 4. Max SPL for HD under -15 dB, with ANR-on at: office level noise conditions

(green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red) for

HS2(upper panel). The corresponding HD spectra depicted using the same color code

(bottom panel). For color graphics please refer to the on-line version. Both graph windows

share the same, aligned, horizontal frequency axis.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 15

Fig. 5. Max SPL for HD under -15 dB, with ANR-on at: office level noise conditions

(green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red) for HS3

(upper panel). The corresponding HD spectra depicted using the same color code (bottom

panel). For color graphics please refer to the on-line version. Both graph windows share the

same, aligned, horizontal frequency axis.

Fig. 6. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue),

pink noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on

(green), for HS1 (upper panel). The corresponding HD spectra depicted using the same color

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 16

code (bottom panel). For color graphics please refer to the on-line version. Both graph

windows share the same, aligned, horizontal frequency axis.

Fig. 7. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue),

pink noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on

(green), for HS2 (upper panel). The corresponding HD spectra depicted using the same color

code (bottom panel).. For color graphics please refer to the on-line version. Both graph

windows share the same, aligned, horizontal frequency axis.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 17

Fig. 8. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue),

pink noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on

(green), for HS3 (upper panel). The corresponding HD spectra depicted using the same color

code (bottom panel). For color graphics please refer to the on-line version. Both graph

windows share the same, aligned, horizontal frequency axis.

Regardless of the availability of data about the headset noise attenuation performance,

which of course are very important for hearing protection assessment, the communication

quality under maximum reproduction SPL conditions (in presence of noise) should be

measurable by an objective intelligibility / quality metric. As explained in paragraph 3.3.2, a

reference male voice was used as a stimulus signal, while the resultant reproduced voices for

different {headset under test – noise condition} combinations were recorded.

As described in subsection 3.4, the results from the three objective evaluation

methods were comparatively examined for each headset, to assess whether they are generally

able to provide reliable information roughly following the communication quality changes.

The results for HS1, HS2 and HS3 are depicted in Figs 9a, 9b, and 9c, respectively. Table 1

summarizes the evaluated cases. The case numbers appearing in Table 1 are used as a

reference in Fig. 9 where the analysis results are depicted. It should be noted that while PESQ

and STI scores are usually presented in 1 to 5 and 0 to 1 scales, respectively, they have been

mapped to a percentage score of 0% to 100% in order for them to be directly comparable to

the MUSHRA ones.

Table 1. Different cases evaluated for all objective voice evaluation methods,

corresponding to different {ANR operation – noise type– noise level} conditions for each of

the headsets under test.

Case # ANR

operation Noise Type

Noise level

In dBSPL(LIN)

1 ANR-off No Noise Office level

2 ANR-on No Noise Office level

3 ANR-off MBT-like 90

4 ANR-on MBT-like 90

5 ANR-off MBT-like 110

6 ANR-on MBT-like 110

7 ANR-off MBT-like 120

8 ANR-on MBT-like 120

9 ANR-off pink 90

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 18

10 ANR-on pink 90

11 ANR-off pink 110

12 ANR-on pink 110

13 ANR-off pink 120

14 ANR-on pink 120

It is also noted that the ASR accuracy did not seem to provide usable information2

about changes in communication quality. Analyzing HS1, HS2, and HS3 we observed that

the ASR scores were not well correlated with most of the analyzed cases, while there was a

performance drop in the “no-noise/ANR-on” case for HS1, along with unexpectedly excellent

performance under pink noise environment, findings which are not compatible to the hitherto

analysis. The resulting behavior of the ASR scores, being far different from the behavior of

STI and PESQ, led us to the decision to exclude ASR from further investigation. It is noted

also that STI and PESQ present a generally compatible behavior for all cases and headsets

(Fig. 9). Specifically, the performance relatively deteriorates as noise type changes from no

noise, to MBT-like, to pink and noise level from 90 to 110, to 120 dBSPL(LIN). This trend,

which was roughly expected from the hitherto analysis, is closer followed by PESQ.

Therefore, it was decided to present only STI and PESQ results and compare them

against the subjective MUSHRA evaluation method ones. As already explained in subsection

3.4, a subset of noise type / level conditions was selected and the subjective method was

carried out for these conditions, in order to render the subjective evaluation feasible. Further

increase of the test cases would result to fatigue and confusion of the subjects leading to

unreliable results. Given that the different noise types led to similar behavior over noise level

increase for the two different employed types of noise, the worst case of 120 dBSPL(LIN) was

the selected level, while both pink noise and MBT-like noise conditions were evaluated. The

investigated conditions for each headset are summarized in Table 2. The case numbers

appearing in Table 2 are used as a reference in Fig. 10 where the MUSHRA results are

presented and in Fig. 11 where the results of STI, PESQ and MUSHRA are comparatively

portrayed.

Table 2. Different cases evaluated for the objective voice evaluation methods

compared to the subjective one, corresponding to different {headset – ANR operation – noise

type– noise level} conditions.

2 This behaviour of the ASR scores can be explained by the small number of words in each voice recording,

causing even slight recognition mistakes, insertions, or deletions to severely effect on the accuracy score.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 19

Case # Headset

under test ANR

operation Noise Type

Noise level

In dBSPL(LIN)

1 HS1 ANR-off MBT-like 120

2 HS1 ANR-on MBT-like 120

3 HS1 ANR-off No Noise Office level

4 HS1 ANR-on No Noise Office level

5 HS1 ANR-off Pink 120

6 HS1 ANR-on Pink 120

7 HS2 ANR-off MBT-like 120

8 HS2 ANR-on MBT-like 120

9 HS2 ANR-off No Noise Office level

10 HS2 ANR-on No Noise Office level

11 HS2 ANR-off Pink 120

12 HS2 ANR-on Pink 120

13 HS3 ANR-off MBT-like 120

14 HS3 ANR-on MBT-like 120

15 HS3 ANR-off No Noise Office level

16 HS3 ANR-on No Noise Office level

17 HS3 ANR-off Pink 120

18 HS3 ANR-on Pink 120

The results presented in Fig. 10 depict the MUSHRA test results, for the test cases

summarized in Table 2. Specifically, the MUSHRA average values for each test case as well

as the 95% confidence interval are shown. For each HS under test, the highest scores are

given to office-level noise cases, followed by the MBT-like noise and pink noise test cases.

HS1 and HS2 achieve the overall highest average score for office-like noise levels, while

HS3 has a significantly lower score. HS2 seems to provide the highest score for both MBT-

like and pink noise. Moreover, the ANR feature seems to slightly degrade the perceived

intelligibility for most test cases; improvement is noticed only for MBT noise when using

HS1 and HS3.

Fig. 11 provides a comparative presentation of the objective evaluation (STI, PESQ)

and subjective evaluation (MUSHRA) results. Ideally, the subjects should have listened to the

signal directly as it was reproduced by the headsets under analysis. However, this would

render their evaluation, up to a certain extent, incompatible to the evaluation resulting from

the STI and PESQ analyses which were performed on the HATS-recorded signal, since

neither the headset fitting nor the ear transfer function could be exactly the same. On the

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 20

other hand, one should take into account the safety of the subjects exposed to such high noise

environment and the resulting experiment limitations.

Although the adopted procedure adds the response of the artificial ear and the

response of the monitor headphones to the reproduction chain, it was considered safer than

the alternatives and accurate enough regarding the extreme communication conditions

examined and the purpose of the subjective evaluation. Nevertheless, as already pointed out

in subsection 3.4., it has been reported that STI measurements obtained with Subjects (MIRE)

and with an Artificial Head (HATS) yielded practically the same results (refer to Fig. 22 of

[12]). Of course, high quality headphones (AKG, K 240 Studio) were used to minimize their

influence. Therefore, it was not expected that the adopted procedure would lead to any

incompatibility between the subjective and the objective evaluation results.

The following notes can be made on the results depicted in Fig. 11:

(a) An offset in favor of STI can be consistently observed. This is expected since

PESQ and MUSHRA do not strictly judge speech intelligibility, but also take into account

quality of voice.

(b) In most cases, PESQ lies between the STI and the MUSHRA values, yet,

persistently following MUSHRA’s trend (observe cases 1, 2: HS1, MBT-like noise).

(c) Subjects’ opinion is largely rejecting HS3, especially in the pink noise case

(observe cases 17, 18: HS3, pink noise) which is compatible with the maximum reproduction

SPL results analysis.

(d) The closer agreement of all metrics is observed in the HS2 case; this behavior

could be an indication for the higher quality of the specific headset but also an indication that

MUSHRA’s results are indeed compatible to the objective methods results.

(e) The higher disagreement among the different methods results is observed in the

pink noise cases (5, 6: HS1, 11, 12: HS2, and 17, 18: HS3), where also the individual

performance of all metrics seems to be the worst for all headsets.

(f) All headsets have lower scores with ANR-on (as compared to the ANR-off

operation) in office noise level conditions, which is expected since the processing involved

leads to a certain degree of distortion increase.

(g) HS2 and HS3 perform better with ANR-on in presence of noise, clearly for MBT-

noise, while marginally for pink noise. In contrast, HS1 is always worst with ANR-on than

with ANR-off.

In relation to the comparison of the three headsets under evaluation in terms of

communication quality, it is clear that HS2 is the best of them as an ANR headset, while HS1

is also a good headset but not as an ANR one; its ANR subsystem seems to be inefficient.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 21

PESQ, being the metric the more consistent to MUSHRA’s results, seems to be the most

appropriate choice for the objective evaluation of the quality of communication offered by

HNE headsets at maximum SPL reproduction in high noise environment.

In relation to the overall evaluation of the maximum SPL reproduction performance, it

is suggested that both maximum SPL measurements and PESQ should be co-evaluated. This

way one could end-up to the same conclusions without the given of subjective evaluation.

Finally, it should be noted that, in addition to the experiments already described, all

three headsets were evaluated by a limited number of MBT crews on the same intercom

system; these subjects unanimously expressed their preference to HS2. It was declared as the

headset presenting the best overall voice reproduction behavior and the higher

communication quality. However, a systematic end-user subjective evaluation test has not

been performed.

Fig. 9. Analysis results after the application of the STI, and PESQ voice evaluation

methods for the case of: (a) HS1, (b) HS2, and (c) HS3. The results are presented normalized

to 100% for easier comparison. For the description of the different cases evaluated, please

refer to Table 1. For color graphics please refer to the on-line version.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 22

Fig. 10. MUSHRA analysis results for selected conditions for HS1, HS2 and HS3. For

the description of the different cases evaluated, please refer to Table 2. For color graphics

please refer to the on-line version.

Fig. 11. Comparative presentation of the objective (STI, PESQ) and subjective

(MUSHRA) analysis results for selected conditions for HS1, HS2 and HS3. For the

description of the different cases evaluated, please refer to Table2. For color graphics please

refer to the on-line version.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 23

4. Conclusions

In this article a novel integrated method for the assessment of the electroacoustic /

acoustic communication performance of HNE headsets in conditions of maximum

reproduction level and high acoustic environmental noise has been presented. The focus was

on the assessment of the voice communication quality, due to the degradation of the

reproduced speech by the headset, caused by the presence of different acoustic background

noise levels and spectra, and also by distortion that is mainly introduced by the electro-

acoustic transducer and the ANR system of the headset when they are driven with high

speech signal levels in high acoustic noise conditions.

The application of the proposed method on three commercially available military

HNE intercom headsets, using three different objective and one subjective evaluation method

for the communication quality assessment, revealed the PESQ as the most suitable objective

metric, on the ground that it follows more closely the trend of the subjective results.

As the application of the method revealed, both maximum SPL measurements and

PESQ should be co-evaluated for the overall evaluation of the maximum SPL reproduction

performance. Such a co-evaluation leads to assessment results very close to the subjective

ones.

It should be noted that a separate subjective evaluation of the same headsets,

connected to a specific -same across all of them- intercom system, was carried out inside a

tracked vehicle and under real high noise conditions, but with a limited number of MBT

crews. This subjective evaluation highlighted the HS2 as the headset with the best overall

voice reproduction behavior and higher communication quality. This is in agreement to the

results obtained by the proposed assessment method, which is an encouraging indication for

its effectiveness of predicting the acoustic communication performance of the headsets under

real high noise conditions.

Acknowledgments

The authors wish to thank Intracom Defense Electronics S.A. for the kind permission to use

the facilities of their Analog Electronics and Electroacoustics Laboratory for the conduction

of the presented measurements, as well as for the organization of the subjective evaluation

tests inside the tracked vehicle.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 24

References

[1] ITU-T-REC-P.380, Electro-acoustic measurements on headsets. Geneva: International

Telecommunication Union – Telecommunication Standardization Sector, 2003.

[2] ITU-T-REC-P.64, Determination of sensitivity/frequency characteristics of local

telephone systems. Geneva: International Telecommunication Union – Telecommunication

Standardization Sector, 2007.

[3] ITU-T-REC-P.79, Calculation of loudness ratings for telephone sets. Geneva:

International Telecommunication Union – Telecommunication Standardization Sector, 2007.

[4] MIL-PRF-25670B. Earphone Elements, General Specification For, Department of

Defense USA, 2006

[5] J. Cui, A. Behar, W. Wong, H. Kunov, Insertion loss testing of active noise reduction

headsets using acoustic fixture, Applied Acoustics, 64: 1011–1031, 2003.

[6] ANSI/ASA S.12.42-2010, Methods for the measurement of insertion loss of hearing

protection devices in continuous or impulsive noise using microphone-in-the-real-ear or

acoustic test fixture procedures, New York: American Standard Institute, New York, 2010.

[7] S.M. Potirakis, Y. Moisiadis, A. Varagis, Performance evaluation method for high noise

environment intercom headsets, Proceeding of Acoustics 08. Paris, France, Journal of

Acoustical Society of America, 2008; 123(5): 3825, 2008.

[8] S.M. Potirakis, N.A. Tatlas, M. Rangoussi, Electroacoustic measurements for high noise

environment intercom headsets, Proceeding of 128th Audio Engineering Society Convention,

London, UK, Paper Number: 8077, 2010.

[9] H. J. M. Steeneken and J. Verhave, Digitally controlled active noise Reduction with

integrated speech communication, Proceedings of CIOP noise conference, Kielce Poland,

2001.

[10] H. J. M. Steeneken, Assessment and standardization of personal hearing protection

including active νoise reduction, The Research and Technology Organisation (RTO) of

NATO, RTO HFM Lecture Series on “Personal Hearing Protection including Active Noise

Reduction”, Warsaw, Poland, 25-26 October 2004; Belgium, Brussels, 28-29 October 2004;

Virginia Beach, VA, USA, 9-10 November 2004, RTO-EN-HFM-111, 2004.

[11] R. B. Valimont, J. G. Casali, J. A. Lancaster, ANR vs. passive communications headsets:

Investigation of speech intelligibility, pilot workload and flight performance in an aircraft

simulator, Proceedings of the Human Factors and Ergonomics Society, 50th annual meeting,

pp. 2143-2147, 2006.

[12] A. L. Dancer, K. Buck, T. Wessling, H. J.M. Steeneken, J. Verhave, S. H. James, G.

Rood, R. McKinley, Assessment methods for personal active noise reduction validated in an

international round robin, The Research and Technology Organisation (RTO) of NATO, TR-

HFM-094, ISBN 92-837-1121-1, 2004.

[13] A. J. Brammer, D. R. Peterson, M. G. Cherniack, and S. Gullapalli, Improving the

Effectiveness of Communication Headsets with with Active Noise Reduction: Influence of

Control Structure, The Research and Technology Organisation (RTO) of NATO, In New

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 25

Directions for Improving Audio Effectiveness (pp. 6-1 – 6-8), RTO-MP-HFM-123, Paper 6,

Neuilly-sur-Seine, France: RTO, 2005.

[14] R. H. Campbell, Electroacoustic properties of noise attenuating headsets, Journal of

Audio Engineering Society 23(10): 806-809, 1975.

[15] R. G. Matschke, Kommunikation und Lärm: Sprachverständnis bei Luftfahrzeugführern

mit und ohne aktive Lärmkompensation (Communication and noise. Speech intelligibility of

aircraft pilots with and without electronic compensation for noise), HNO. Hals-Nasen-

Ohrenärzte, ISSN 0017-6192, 42(8): 499-504, 1994.

[16] K. Buck and V. Zimpfer-Jost, Active hearing protector systems and their performance,,

The Research and Technology Organisation (RTO) of NATO, RTO HFM Lecture Series on

“Personal Hearing Protection including Active Noise Reduction, Warsaw, Poland, 25-26

October 2004; Belgium, Brussels, 28-29 October 2004; Virginia Beach, VA, USA, 9-10

November 2004, and published in RTO-EN-HFM-111, 2004.

[17] A. J. Brammer, D. R. Peterson, M. G. Cherniack, S. Gullapalli, and R.B. Crabtree,

Maintaining speech intelligibility in communication headsets equipped with active noise

control, Canadian Acoustics - Acoustique Canadienne 32(3): 132-133, 2004.

[18] S. H. James, Defining the cockpit noise hazard, aircrew hearing damage risk and the

benefits active noise reduction headsets can provide, The Research and Technology

Organisation (RTO) of NATO, RTO HFM Lecture Series on “Personal Hearing Protection

including Active Noise Reduction”, Warsaw, Poland, 25-26 October 2004; Belgium,

Brussels, 28-29 October 2004; Virginia Beach, VA, USA, 9-10 November 2004, RTO-EN-

HFM-111, 2004.

[19] A. Farina, Simultaneous measurement of impulse response and distortion with a swept-

sine technique, Proceedings of 108th Audio Engineering Society Convention, Paris, Paper

Number: 5093, 2000.

[20] ITU-T-REC-P.501, Test signals for use in telephonometry, Geneva: International

Telecommunication Union – Telecommunication Standardization Sector, 2009.

[21] ANSI S3.5-1997 (R2007), Methods for calculation of the speech intelligibility index,

New York: American Standard Institute, New York, 1997.

[22] IEC 60268-16, Sound system equipment – Part 16: Objective rating of speech

intelligibility by speech transmission index, Geneva: International Electrotechnical

Commission, Fourth edition, Geneva, 2011.

[23] Ray L. Goldsworthy and Julie E. Greenberga, Analysis of speech-based speech

transmission index methods with implications for nonlinear operations, Journal of Acoustical

Society of America, 116 (6), 2004.

[24] J. Ma, Y. Hu. and P. Loizou, Objective measures for predicting speech intelligibility in

noisy conditions based on new band-importance functions, Journal of Acoustical Society of

America, 125(5): 3387-3405, 2009.

[25] J. Ma and P. Loizou, SNR loss: A new objective measure for predicting the intelligibility

of noise-suppressed speech, Journal of Acoustical Society of America, 53: 340-354, 2011.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 26

[26] M. Elhilali, T. Chi, S. A. Shamma, A spectro–temporal modulation index (STMI) for

assessment of speech intelligibility, Speech Communication, 41( 2-3): 331-348, 2003.

[27] F. F. Li and T. J. Cox, Speech transmission index from running speech: A neural

network approach, Journal of Acoustical Society of America, 113 (4), Pt. 1, DOI:

10.1121/1.1558373, 2003.

[28] P. C. Doyle, Clinical evaluation in head and neck cancer: voice quality and speech

intelligibility outcomes,Presented in 11th Annual Head and Neck Conference: Managing the

Effects of Treatment, October 24, 2008, Baltimore, 2008.

[29] ITU-T-REC- P.862, Perceptual evaluation of speech quality (PESQ): An objective

method for end-to-end speech quality assessment of narrow-band telephone networks and

speech codecs, Geneva: International Telecommunication Union – Telecommunication

Standardization Sector, 2001.

[30] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D.

Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK Book (for HTK Version 3.4),

Cambridge University Engineering Department, 2006.

[31] K. Vertanen, Baseline WSJ Acoustic Models for HTK and Sphinx: Training Recipes and

Recognition Experiments, Technical Report, Cavendish Laboratory, 2006.

[32] The Carnegie Mellon University Pronouncing Dictionary. On-line at

https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/ (last accessed:

14/12/2012).

[33] Y. Teng, Objective speech intelligibility assessment using speech recognition and

bigram statistics with application to low bit-rate codec evaluation, Ph.D. dissertation,

Department of Electrical and Computer Engineering, University of Wyoming, December,

2006.

[34] Weinstein, C. J., Opportunities for Advanced Speech Processing in Military Computer-

Based Systems, IEEE Proceedings, 79(11): 1626–1641, 1991.

[35] A. Amrouche, M. Debyeche,A. Taleb-Ahmed, J.-M. Rouvaen, M. C.E. Yagoub, An

efficient speech recognition system in adverse conditions using the nonparametric regression,

Engineering Applications of Artificial Intelligence, 23: 85–94, 2010

[36] H. Kim, J. Park, Y. Oh, S. Kim, B. Kim, Voice command recognition for fighter pilots

using grammar tree, Computer and Information Science, 352: 116-119, 2012, doi:

10.1007/978-3-642-35603-2_18.

[37] ITU-R BS.1534-1, Method for the subjective assessment of intermediate quality level of

coding systems. International Telecommunication Union, Radiocommunication Assembly,

2003.

[38] Dr. Emmanuel Vincent, MUSHRAM - A Matlab interface for MUSHRA listening

tests(version 1.0), http://www.elec.qmul.ac.uk/digitalmusic/downloads/#mushram, 2005.

[39] H. J. M. Steeneken and T. Houtgast, The temporal envelope spectrum and its

significance in room acoustics, Proceedings of the 11th ICA, 7, Paris, 1983, pp. 85–88, 1983.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 27

Figure Captions

Fig. 1. Test setup for the electroacoustic reproduction measurements based on HATS: (a)

instrument connections along with a typical CVC placed on the HATS, and (b) spatial

arrangement inside the noise room.

Fig. 2. MBT-like noise third octave spectrum with an overall level of ~110 dBSPL(LIN).

Fig. 3. Max SPL for HD under -15 dB,, with ANR-on at: office level noise conditions

(green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red) for HS1

(upper panel). The corresponding HD spectra depicted using the same color code (bottom

panel). For color graphics please refer to the on-line version. Both graph windows share the

same, aligned, horizontal frequency axis.

Fig. 4. Max SPL for HD under -15 dB, with ANR-on at: office level noise conditions

(green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red) for

HS2(upper panel). The corresponding HD spectra depicted using the same color code

(bottom panel). For color graphics please refer to the on-line version. Both graph windows

share the same, aligned, horizontal frequency axis.

Fig. 5. Max SPL for HD under -15 dB, with ANR-on at: office level noise conditions

(green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red) for HS3

(upper panel). The corresponding HD spectra depicted using the same color code (bottom

panel). For color graphics please refer to the on-line version. Both graph windows share the

same, aligned, horizontal frequency axis.

Fig. 6. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue), pink

noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on

(green), for HS1 (upper panel). The corresponding HD spectra depicted using the same color

code (bottom panel). For color graphics please refer to the on-line version. Both graph

windows share the same, aligned, horizontal frequency axis.

Fig. 7. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue), pink

noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on

(green), for HS2 (upper panel). The corresponding HD spectra depicted using the same color

code (bottom panel).. For color graphics please refer to the on-line version. Both graph

windows share the same, aligned, horizontal frequency axis.

S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 28

Fig. 8. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue), pink

noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on

(green), for HS3 (upper panel). The corresponding HD spectra depicted using the same color

code (bottom panel). For color graphics please refer to the on-line version. Both graph

windows share the same, aligned, horizontal frequency axis.

Fig. 9. Analysis results after the application of the STI, and PESQ voice evaluation methods

for the case of: (a) HS1, (b) HS2, and (c) HS3. The results are presented normalized to 100%

for easier comparison. For the description of the different cases evaluated, please refer to

Table 1. For color graphics please refer to the on-line version.

Fig. 10. MUSHRA analysis results for selected conditions for HS1, HS2 and HS3. For the

description of the different cases evaluated, please refer to Table 2. For color graphics please

refer to the on-line version.

Fig. 11. Comparative presentation of the objective (STI, PESQ) and subjective (MUSHRA)

analysis results for selected conditions for HS1, HS2 and HS3. For the description of the

different cases evaluated, please refer to Table2. For color graphics please refer to the on-line

version.