Assessment of military intercom headsets for maximum voice reproduction level in high noise...
Transcript of Assessment of military intercom headsets for maximum voice reproduction level in high noise...
Assessment of military intercom headsets for maximum voice
reproduction level in high noise conditions
S.M. Potirakisa, N.-A. Tatlasa, N. Zafeiropoulosb, T. Ganchevc, and M. Rangoussia.
a. Department of Electronics, Technological Education Institute (TEI) of Piraeus, 250 Thivon & P.
Ralli, GR-12244, Aigaleo, Athens, Greece, {spoti; ntatlas; mariar}@teipir.gr.
b. Acoustics, Audio and Video Engineering, Newton Building, University of Salford, Greater
Manchester M5 4WT, [email protected].
c. Division of Electronics and Microelectronics, Faculty of Electronics, Technical University – Varna,
9010 Varna, Bulgaria, [email protected]
Abstract
Intercom headsets are mandatory communication apparatus in high noise environments
(HNE). The headset selection in HNE, such as combat vehicles, is crucial for achieving the
objectives of communication, as it serves the needs for both noise reduction and voice
reproduction. Although military-grade intercom headsets are typically used under extreme
environmental conditions, a standard performance evaluation method exists only for the
earphone elements. In the present work we propose an integrated method for the assessment
of the electroacoustic performance of HNE headsets in conditions of maximum reproduction
level and high environmental noise, focusing on the voice communication quality. Objective
methods, such as Automatic Speech Recognition (ASR), Perceptual Evaluation of Speech
Quality (PESQ) and Speech Transmission Index (STI) are comparatively evaluated and their
results are compared to subjective scores using Multiple Stimuli with Hidden Reference and
Anchor (MUSHRA) in order to reveal the best fit metrics.
Keywords: electroacoustic assessment; military intercom headsets; headset testing;
reproduction level; distortion; communication quality; ANR; ASR; PESQ; STI; MUSHRA
*Please address correspondence to:
Stelios M. Potirakis
Assistant Professor
Dept. of Electronics Eng.,
Technological Education Institute (TEI) of Piraeus
250 Thivon & P. Ralli
GR-12244 Aigaleo - Athens
GREECE
Tel./ FAX: +30 2105381550
Mob.: +30 6947934056
office : ZB106
e-mail : [email protected]
url : http://audio.teipir.gr/
Please cite as:
S.M. Potirakis, N.-A. Tatlas, N. Zafeiropoulos, T. Ganchev, M. Rangoussi, «Assessment of military intercom
headsets for maximum voice reproduction level in high noise conditions», Applied Acoustics 74 (2013) 870–
881, doi: 10.1016/j.apacoust.2012.12.009
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 2
1. Introduction Intercoms provide the means of speech communication in high noise environments
(HNE), which is otherwise difficult or even impossible. The headset selection for such an
application is crucial since it serves both noise attenuation (protection) and voice
reproduction (communication) purposes. Only recently it has been acknowledged in
telecommunications [1] that headset electroacoustic measurements should be performed on a
Head and Torso Simulator (HATS). Actually, the existing standardized measurement
methods regarding telecommunication-related sensitivity and frequency characteristics [2,3],
are still used but a HATS is introduced as a measurement apparatus. However, in defense
applications, where headsets are mainly used in HNE, a standard performance evaluation
method does not exist. The single standard method used for sensitivity measurements refers
to the earphone elements only, [4], thus ignoring the impact of the acoustics of the earcup
cavity, the absorbing materials, the head/face fitting quality and the headset positioning.
It is well known that the reproduced speech level should be at least 10 dB above noise
level (without severe distortion) to achieve marginally acceptable intelligibility. Therefore,
sensitivity, maximum reproduction levels under specific distortion limit and noise attenuation
capability should be jointly assessed to evaluate a HNE intercom headset.
Inspired by the recently published work of J. Cui et al., [5] (followed by the relative
ANSI S12.42 standard [6]), a systematic methodology for measurement and performance
evaluation of HNE headsets has been proposed in [7,8]. This is based on the use of Acoustic
Test Fixtures (ATFs), like HATS, and addresses both signal reproduction and noise reduction
issues, while standardized methods are employed and measurements are maintained as close
as possible to the existing telecommunication and military standards. Headset electroacoustic
reproduction measurements, i.e., sensitivity, maximum sound pressure level (SPL) and
harmonic distortion (HD), are proposed to be acquired as a function of frequency.
Nevertheless, beyond hearing protection issues, the most important evaluation
parameter regarding the usefulness of a HNE intercom headset is the voice communication
quality under realistic conditions. This is not straightforwardly assessed by maximum SPL
and HD measurements, especially for Active Noise Reduction (ANR) headsets. In this case,
the environmental noise has a very important role on the intelligibility and the overall
performance, as it potentially affects both reproduction frequency response and distortion.
Moreover, the ANR functionality may also be influenced by maximum SPL reproduction
conditions, leading to worst noise reduction or even higher distortion.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 3
In this article a novel integrated method is proposed for the assessment of the
electroacoustic performance of HNE headsets in conditions of maximum reproduction level
and high environmental noise, focusing on the problem of assessing the communication
quality of a HNE intercom headset via objective measurements. Specifically, an objective
voice-based metric for the characterization of the voice communication quality in conditions
of maximum SPL reproduction and high environmental noise is suggested. In brief, given a
specific HD upper limit, the input level leading to maximum SPL is determined and the voice
reproduction of different ANR military headsets is assessed by employing widely accepted
objective evaluation methods for voice intelligibility and quality. Alternative objective
evaluation methods are examined; correlation of the results to the corresponding subjective
results is employed as eligibility criterion for the selection of the most suitable method for the
problem at hand. It should be noted that the proposed method is mainly suitable for
comparative study and quality control measurements.
2. Electroacoustic assessment of HNE headsets
2.1 Related work and motivation
The evaluation of the voice reproduced through HNE headsets has already been an
active research topic for various users / environments of interest (aircraft pilots in flight,
armored vehicle crew in operation, etc.) [9-12]. The quality of speech communication was
found to be critical for the accomplishment of the mission at hand, as well as for the
personnel safety and survival, [13]. The need to take into account both noise attenuation
capability and the electroacoustic properties, in respect to the received voice communication,
for an intercom headset, has already been pointed out as early as in 1975, [14]. It has been
found, [15], that it is required to adequately raise the reproduced voice level above the noise
finally reaching the ear in order to achieve acceptable intelligibility of speech.
A high voice level is necessary in order to obtain an acceptable intelligibility, even for
a headset well attenuating the noise in the frequency range that is important for intelligibility.
This necessity results from the psycho-acoustic masking of the high frequency components of
the communication by the low frequency components of the noise (which are usually not
enough attenuated) and by the poor quality of the transmission channel [16]. On the other
hand, in extremely noisy environments, the voice reproduction levels may become
sufficiently high to induce temporary hearing threshold shift in the user, [13]; this variable
has therefore to be restricted within the safety margin.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 4
As a consequence, evaluation of the communication quality offered by a military
headset is both a multi-parametric task and an important factor of its electroacoustic
performance; this is especially true under the worst case scenario of maximum SPL
reproduction in the presence of extremely high environmental noise. Related studies [9,17,18]
rely on subjective and/or objective methods for the relative quality assessment of alternative
solutions, the later being preferable from a practical point of view.
Communication quality primarily refers to the achieved speech intelligibility. The
former term is adopted here, however, in order to stress the fact that the ultimate criterion is
the end-user’s opinion about the offered communication quality and not just intelligibility.
Voice quality seems to be very important for end-users of military HNE headsets. In order to
reduce distraction and facilitate the completion of their main task during a military operation,
the cognitive load and the level of distraction due to the communication effort should be kept
low; difficulty to identify the speaker and distortions such as strange-sounding or corrupted
voice are therefore not desirable. Speaker recognition is, for example, extremely important
during wired or radio communication of the (authorized) main battle tank (MBT) crew
members to others outside the vehicle, whether on the same or between different military
hierarchy levels. Therefore, the broader feature of communication quality and not just
intelligibility is expected to be effectively offered by a HNE military headset.
2.3 The proposed approach
The proposed method for the assessment of the electroacoustic performance of HNE
headsets in conditions of maximum reproduction level and high environmental noise is
comprised of the following actions:
1. Given a specific HD upper limit, the maximum reproduced SPL is determined on
ATF as a function of frequency, using a log-swept chirp signal, at office level noise
conditions and in the presence of two different ambient noises (Pink noise and MBT–like
noise), at three different levels (90, 110, and 120 dBSPL(LIN)) (cf. paragraph 3.3.1).
2. A reference anechoic voice signal (of peak amplitude equal to the corresponding
sinus leading to the maximum SPL for each case) is reproduced by the headset and recorded
by the ATF in presence of the above noise environment cases (cf. paragraph 3.3.2).
3. In case of an ANR headset the above measurements are conducted both for
activated (ANR-On) and deactivated (ANR-Off) ANR.
4. All measurements are performed for both sides of the headset, in order to
investigate any differences in terms of amplitude and frequency content that can be
introduced due to variability in the acoustics and the electroacoustics at each side.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 5
5. Communication quality in conditions of maximum SPL reproduction in the
presence of different noise types and levels is assessed using the previously recorded voice
reproduction and a voice-based objective evaluation method (cf. subsection 3.4).
Such a method would be equally valuable at the development, the production and the
quality control stage of the headsets, which is of high importance for intercom developers and
end-users.
3. Material and methods
The following subsections outline the headsets, the measurement setup employed, as
well as the measurement and the evaluation methods used in the experimental work.
3.1. Headsets
The ANR Combat Vehicle Crewman (CVC) headsets are considered state-of-the-art
and recently employed for the intercommunication of the crew members of heavy armored
vehicles by a number of modern armies. Three ANR military CVC headsets of different
vendors and different design philosophy have been employed in this study. The first ANR
headset, hereafter referred to as “HS1”, uses two separate speakerphones, one for the voice
reproduction and another for the anti-noise signal emission. On the other hand, the second
ANR headset, hereafter referred to as “HS2”, as well as the third headset, hereafter referred to
as “HS3”, use a single speakerphone for both the voice and anti-noise signals, however they
employ different ANR systems. A standard1 CVC headset on the HATS of the measurement
setup is shown in Fig. 1a, while also in Fig. 2 of [8],
3.2. Measurement setup
The HATS Bruel & Kjaer type 4128C complemented with the left ear simulator Bruel
& Kjaer type 4159C was the ATF used throughout the measurements. Tests were carried in a
double wall room (Fig. 1). The HATS, the ambient noise microphone and the noise
reproduction speakers were located inside the room, while the rest of the instruments were
located outside of it. The ambient noise microphone was monitoring the reproduced noise at
the height of the left ear, aligned to the left shoulder, to ensure the correctness of the
reproduced noise spectra.
1 It is noted that the headset depicted in Fig. 1a was not one of the measured headsets HS1, HS2, HS3. The list
of the available CVC headsets on the market is really short; therefore the actually measured headsets are not
depicted here in order to ensure their anonymity.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 6
(a) (b)
Fig. 1. Test setup for the electroacoustic reproduction measurements based on HATS: (a)
instrument connections along with a typical CVC placed on the HATS, and (b) spatial
arrangement inside the noise room.
3.3. Measurement methods
Two measurement methods were employed here for the electroacoustic evaluation of
the headsets under test. The first one was intended to determine the input level leading to
maximum SPL for a specific HD upper limit in presence of different ambient noises. The
second one was reproducing a standard reference voice signal through each ANR headset at
the previously determined input level and recording the signal reaching HATS ear in
presence of different ambient noises.
Prior to any electroacoustic measurement, the Insertion Lost (IL) of each headset
under test was measured according to the method presented in [5], as adapted in [7], and
under the environmental noise conditions proposed in [8] (see also paragraph 3.3.1), and its
performance was noted. Both electroacoustic measurements described in this paper employed
the HATS as the main acoustic measurement apparatus; thus, the correct placement of the
headset under test on the HATS was crucial. Therefore, before the conduction of an
electroacoustic measurement the correct placement of the headset under test was verified by
the achievement of its a-priori known IL. It must be noted here that HS2 was the easiest to
mount on the HATS, while HS3, which was mainly relying on a neckband for the tight
positioning, was very difficult to mount reliably.
3.3.1. Maximum reproduction level under specific HD limits
The maximum reproduction level under specific HD upper limit conditions as a
metric of the electroacoustic performance of a HNE headset was first introduced by [7]. Αn
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 7
improvement of this method was proposed in [8] measuring the frequency response along
with the harmonic distortion, up to the 9th harmonic, using the test setup of Fig. 1 and the
log-swept chirp-based method of [19], by increasing the reproduction level up to the point
that HD is under -15 dB (≈17.8%) throughout the whole reproduction bandwidth, defined
typically within the limits [300, 4000] Hz (toll-quality bandwidth).
According to this method, the maximum reproduction level was measured at different
noise environments, including office level noise conditions, and different levels of artificially
produced noise. Pink noise and MBT–like noise (see Fig. 2) at 90, 110, and 120 dBSPL(LIN), at
the height of the ear (aligned to the shoulder) of the HATS were proposed as a realistic set of
representative high noise environments. Note that the lowest part of an MBT noise (<80 Hz)
was not reproduced as restricted by the employed loudspeakers, however this was not
considered an important limitation.
Measurement under high level noise is mandatory for ANR headsets, for which the
environmental noise characteristics (statistics, frequency content and level) are very
important on the overall performance, as they potentially affect both reproduction frequency
response and distortion. The maximum reproduction level frequency response in parallel to
distortion measurement is feasible since the proposed measurement method is convolution
based. By this method, uncorrelated background noise is not restrictive under reasonable
signal to noise ratios, which are considered to be achieved by military headsets at maximum
reproduction level conditions.
Fig. 2. MBT-like noise third octave spectrum with an overall level of ~110 dBSPL(LIN).
3.3.2. Voice recordings for maximum reproduction level in noise environment
The voice measurements were performed by reproducing the reference anechoic
“male 2” English male voice signal, as provided with ITU-T-REC-P.501 [20]. For each
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 8
{headset under test – noise condition} combination, the stimulus level was set so that the
voice stimulus maximum peak voltage was equal to the revealed (during the log-swept chirp-
based maximum reproduction SPL measurements, for the specific upper HD limit) maximum
sinus peak voltage. The signal (reproduced voice and noise) reaching HATS ear in presence
of different ambient noises was finally recorded.
It is noted that, due to the high crest factor of voice it is expected that generally lower
distortion will be present in the recorded voice compared to the sinus estimated HD during
the maximum reproduction SPL measurements. However, due to the same reason, peak-
signal equivalence was considered a better choice compared to a power-wise equivalence,
since the second one could lead to severe distortion (peak clipping) of the reproduced voice.
Nevertheless, the aim of this study is to assess communication quality through appropriate
objective metrics and not just by comparison of a long term mean ( eqL ) of the maximum
reproduced voice level against the corresponding of the acoustic noise level reaching the ear.
3.4. Communication quality evaluation methods
As already pointed-out, the focus of the proposed assessment method is on the
suggestion of an objective real voice-based metric for the characterization of the voice
communication quality in conditions of maximum SPL reproduction and high environmental
noise. The investigation involves three different objective voice evaluation metrics. All of
them are real voice-based, evaluating the sound signal, as recorded on ATF (cf. paragraph
3.3.2), in comparison to the electrical signal fed to the headphones as input. The use of real
voice as input signal stems from the idea to employ the degree of correlation of the results to
the corresponding of a subjective evaluation method as eligibility criterion for the selection of
the most suitable objective metric. This way, all the obtained results are based on the same
measured data.
It should be noted that STI [21-23] has become the most commonly employed
objective metric for the evaluation of HNE military headsets [9-12]. Of practical interest is
the fact that STI was found [12] to yield comparable results (rank ordering of alternative
solutions evaluated) with subjective tests, e.g., the Modified Rhyme Test (MRT), across an
adequate range of environmental noise types and levels, and practically the same results when
applied for a real human head (Microphone in Real Ear – MIRE method) or an artificial head
with a headset on.
However, STI presents certain known limitations. For instance, it is reported that the
accuracy in STI is compromised, when the channel distorts non-linearly the speech signals,
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 9
[23], when phase-jitter or linear phase shifts are present or when peak or envelop clipping
results from speech enhancement, e.g., spectral subtraction, or other speech compression and
coding employed, [24,25]. Recently, modified versions of STI, such as the STMI, [26] and
the Artificial Neural Networks (ANN) based form, [27], have been proposed to address these
limitations. Channel evaluation under maximum reproduction level conditions is the aim
here; it is therefore expected that a certain percentage of non-linearity is inevitably present in
the channel, especially for the ANR headsets. This is manifested by the acceptable upper
limit of HD under which measurements were carried out.
Therefore, the STI, which focuses on voice intelligibility, in its real voice-driven
ANN based form, [27] was the first choice for objective assessment. However, as discussed
in Subsection 2.1, voice quality, and not just intelligibility, seems to be very important for
end-users of military HNE headsets. Moreover, the untrained (“naïve”) listeners when asked
to score speech intelligibility and not voice communication quality usually find it difficult to
discriminate these two inextricably linked elements of one's communication capacity and/or
ability – even if a clear definition of speech intelligibility is given to them [28]. As a result
another widely accepted voice-based objective metric, the Perceived Evaluation of Speech
Quality (PESQ), [29], which focuses on voice quality rather than intelligibility, and presents
fundamental differences compared to STI, was employed. Notice, methods like PESQ are
using advanced speech perception models, based on recently developed auditory modeling, to
process the measured speech signals. Finally, we experimented with another quite different
voice-based objective method, the Automatic Speech Recognition (ASR), using a common
experimental setup based on the Hidden Markov Model ToolKit (HTK) [30-32]. Another
reason for the use of these three differently aimed objective metrics was their widespread use,
which renders our results directly comparable to future studies. Indeed, (a) STI is already a
telecommunication standard method for the assessment of voice intelligibility [21,22], (b)
PESQ is accordingly a standard voice quality method [29], and (c) ASR, despite the criticism
on its effectiveness, is also frequently employed in telecommunication quality assessment
studies, e.g., [33 , and references therein], while speech recognition is often employed in
military applications, e.g., [34-36].
The results of these three quite different objective metrics were comparatively studied
to the ones obtained by a subjective evaluation method, namely MUSHRA [37] in a quest for
the most suitable objective metric for the assessment of the communication quality of HNE
military headsets. Specifically, the comparative investigation of the three alternatives
consisted of three parts:
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 10
1. The evaluation results from the three objective evaluation methods were obtained
and comparatively examined for each headset, to assess whether they are generally able to
provide reliable information that roughly follows the communication quality changes.
2. A subset of noise type / level conditions was selected and the subjective method
was carried out for these conditions, in order to render the subjective evaluation feasible.
Further increase of the test cases would result to fatigue and confusion of the subjects leading
to unreliable results.
3. The results of the three alternative objective measures were correlated to those of
the subjective method under matched conditions of noise, ANR and reproduction level, in
order to select the one which follows more closely the trend of the subjective results. This
one is declared to be the most appropriate for communication quality assessment of HNE,
ANR intercom headsets, based on the particular experimental setup.
3.4.1. MUSHRA
In order to evaluate the perceptual quality for a number of cases, a MUSHRA, [37],
test was designed and performed using the MUSHRAM, [38]. In order to render the
subjective evaluation feasible, a subset of the headphone recordings were selected, six for
each model, while a hidden anchor using a steep low-pass filter with cut-off frequency 2000
Hz was also considered. All the reference sounds and test sounds were normalized to the
same loudness level in order to avoid loudness-related effects.
The 30 test subjects recruited as listeners for the subjective test were normal listeners,
without any hearing loss, non-experts. A short introduction regarding the procedure was
given to small groups of 4 subjects during which a clear directive to score the speech
intelligibility and not the overall speech quality was given. Furthermore, the groups were
asked to listen to the reference and test signals, before beginning the test and adjust the
volume to a comfortable level. Also, the existence of a hidden impaired anchor and a hidden
reference were pointed out, as well as a request to use as much of the quality scale dynamic
range. While the test phrases were in a non-native language, all the subjects included in the
test were fluent in English. Given the relatively small subject pool, the testing phase for each
subject was repeated 3 times, with random positioning in each repetition.
Results from subjects that missed the hidden reference twice or more, were rejected.
The mean value over all tests per recording, as well as the standard deviation and the 95%
confidence interval were calculated.
3.4.2. Speech Transmission Index
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 11
In order to avoid the use of the artificial probe signals of speech-like shaped noise, an
alternative STI approach based on running speech probe signals was proposed, [39]; yet,
convenience is traded for accuracy of the STI values thus obtained. In an attempt to address
the running speech STI accuracy problem, the approach proposed in [27] was adopted in the
present work. According to this method, the estimated or predicted STI value is the output of
an ANN fed by the (normalized) power spectra of signal envelopes. Herein, the ANN was
first trained on the traditionally calculated STI values resulting from a set of impulse
responses of CVC headsets, other than the ones evaluated here. The training process tunes the
ANN to produce STI estimates for the convolution of the employed reference voice (cf.
paragraph 3.3.2) with the above impulse responses, close to the corresponding traditionally
calculated STI values. As to the ANN type and architecture, the selection was experimentally
driven. Comparable results were obtained either using a Generalized Regression Neural Net
(GRNN) trained by a gradient-descent back-propagation type of minimization algorithm or
using a feedforward type of Multilayer Perceptron (MLP) of two layers with seven nodes in
the hidden layer and a single output node, trained by the Levenberg-Marquardt back-
propagation type of minimization algorithm. Eight-fold cross-validation is used for
computing the results. The results obtained by the GRNN are the ones presented in the
following.
3.4.3. PESQ
The perceptual evaluation of speech quality (PESQ) [29] is an objective method for
end-to-end speech evaluation of telecommunication systems. It is designed to objectively
predict the subjective quality of 3100 Hz (narrow-band) handset telephony (end-to-end
measurements) and narrow-band speech codecs, two error parameters are computed in a
cognitive model; these are combined to give an objective listening quality MOS (Mean
Opinion Score) in a 0.0 – 5.0 scale..
4. Results and Discussion
The maximum reproduction SPL spectra were first evaluated as described in
paragraph 3.3.1. In order to ensure the precision of results, a frequency response and
harmonic distortion, up to the 9th harmonic, measurement was performed on the power
amplifier driving the headsets (under the worst loading conditions). The results (cf. Fig. 3 in
[8]) assure that both frequency response and harmonic distortion are orders of magnitude
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 12
better than the corresponding results expected from the headset measurements within the
frequency band of interest.
The maximum reproduction SPL for HD under -15 dB, with ANR-on at: (a) office
level noise conditions, (b) pink noise 110 dBSPL(LIN), and (c) MBT-like noise 110 dBSPL(LIN),
are show in Fig. 3, Fig. 4 and Fig. 5 for HS1, HS2 and HS3, respectively. It is apparent from
Fig. 3 that all three cases provide practically identical measurements for HS1. The HS1
maximum reproduction level and HD spectra with ANR-on is practically not affected by the
noise type and level variation. The same is not valid for HS2 and HS3, as arises from Figs 4
and 5, respectively. Although with different spectral details for each headset, we can
conclude that HD was increased in high noise environment conditions, while the most
disturbing noise type was pink noise, for both of them. Moreover, HS3 is also affected in
terms of reproduction spectrum. For both noise types we observe a change (ripple) in
frequency response above 1000 Hz.
The comparison of the noise characteristics and level effect on the above headset
measurements, suggests that the design philosophy of the HS1 ANR headset, which uses two
separate speakerphones, one for the voice reproduction and another for the anti-noise signal
emission, results to a less noise dependent performance than that of HS2 and HS3 ANR
headsets, which use a single speakerphone for both the voice and anti-noise signal. The ANR
subsystem of HS1 is practically not affecting the max SPL voice reproduction in presence of
110 dBSPL(LIN), either pink or MBT-like, noise. The ANR subsystems of HS2 and HS3 are
affecting their corresponding max SPL voice reproduction, while the worst performance is
clearly that of HS3.
The maximum reproduction SPL for HD under -15 dB at 120 dBSPL(LIN) of: (a) pink
noise with ANR-off, (b) pink noise with ANR-on, (c) MBT-like noise with ANR-off, and (d)
MBT-like noise with ANR-on, are show in Fig. 6, Fig. 7 and Fig. 8 for HS1, HS2 and HS3,
respectively.
Comparing the plots in these figures, the first result is the verification of the HS1’s
voice reproduction insensitivity to environmental acoustic noise, even as high as 120
dBSPL(LIN), either with ANR-off or with ANR-on. On the other hand, the activation of HS1
ANR is leading to generally higher reproduction levels and distortion below 2000 Hz, see
Fig. 6.
Concerning HS2, Fig. 7, it is affected over the frequency band of interest in ANR-off
operation, both in regard to SPL and HD response. However, pink noise leads to more severe
disturbance. In ANR-on operation, HS2 is slightly affected by pink noise of this level, while
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 13
for 110 dBSPL(LIN) pink noise it was not affected. Its ANR-on performance under MBT-like
noise remains the same regardless of the noise level increase.
On the other hand, HS3, Fig. 8, presented a quite different behavior, compared to the
other two headsets – a further increase of the reproduced SPL was leading to an increase of
the HD. However, in the case of HS3 a higher SPL reproduction was the only way to achieve
a HD performance as close as possible to the considered limit of -15 dB in presence of 120
dBSPL(LIN) noise. Although it was finally not possible to keep the lower part of the spectrum,
where possibly the ANR subsystem of HS3 is working, under -15 dB HD, the HD at the
reproduction levels depicted in Fig. 5 resulted in much worst HD results. It is noted that
noticeably different maximum responses resulted for different noise types and ANR
operations (on / off) conditions; HS3 is the most affected by these parameters in such high
noise levels.
Judging solely from the max SPL measurements, corresponding to maximum
reproduced level under the -15 dB HD limit, that capture the effect of environmental noise to
the voice reproduction, but not reflecting the noise attenuation, one would expect that HS1
should present the best communication quality performance, followed by HS2, while HS3
should be last in the specific comparison. Indeed, without ignoring its relative high variation
of the SPL versus frequency, HS1 is the loudest one, although probably too loud to use in
max SPL reproduction without additional ear protection (e.g., earplugs), and the least affected
by the environmental noise. Of course, the use of earplugs would lead to communication
quality degradation; therefore it is considered a not suggested solution.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 14
Fig. 3. Max SPL for HD under -15 dB,, with ANR-on at: office level noise
conditions (green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red)
for HS1 (upper panel). The corresponding HD spectra depicted using the same color code
(bottom panel). For color graphics please refer to the on-line version. Both graph windows
share the same, aligned, horizontal frequency axis.
Fig. 4. Max SPL for HD under -15 dB, with ANR-on at: office level noise conditions
(green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red) for
HS2(upper panel). The corresponding HD spectra depicted using the same color code
(bottom panel). For color graphics please refer to the on-line version. Both graph windows
share the same, aligned, horizontal frequency axis.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 15
Fig. 5. Max SPL for HD under -15 dB, with ANR-on at: office level noise conditions
(green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red) for HS3
(upper panel). The corresponding HD spectra depicted using the same color code (bottom
panel). For color graphics please refer to the on-line version. Both graph windows share the
same, aligned, horizontal frequency axis.
Fig. 6. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue),
pink noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on
(green), for HS1 (upper panel). The corresponding HD spectra depicted using the same color
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 16
code (bottom panel). For color graphics please refer to the on-line version. Both graph
windows share the same, aligned, horizontal frequency axis.
Fig. 7. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue),
pink noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on
(green), for HS2 (upper panel). The corresponding HD spectra depicted using the same color
code (bottom panel).. For color graphics please refer to the on-line version. Both graph
windows share the same, aligned, horizontal frequency axis.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 17
Fig. 8. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue),
pink noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on
(green), for HS3 (upper panel). The corresponding HD spectra depicted using the same color
code (bottom panel). For color graphics please refer to the on-line version. Both graph
windows share the same, aligned, horizontal frequency axis.
Regardless of the availability of data about the headset noise attenuation performance,
which of course are very important for hearing protection assessment, the communication
quality under maximum reproduction SPL conditions (in presence of noise) should be
measurable by an objective intelligibility / quality metric. As explained in paragraph 3.3.2, a
reference male voice was used as a stimulus signal, while the resultant reproduced voices for
different {headset under test – noise condition} combinations were recorded.
As described in subsection 3.4, the results from the three objective evaluation
methods were comparatively examined for each headset, to assess whether they are generally
able to provide reliable information roughly following the communication quality changes.
The results for HS1, HS2 and HS3 are depicted in Figs 9a, 9b, and 9c, respectively. Table 1
summarizes the evaluated cases. The case numbers appearing in Table 1 are used as a
reference in Fig. 9 where the analysis results are depicted. It should be noted that while PESQ
and STI scores are usually presented in 1 to 5 and 0 to 1 scales, respectively, they have been
mapped to a percentage score of 0% to 100% in order for them to be directly comparable to
the MUSHRA ones.
Table 1. Different cases evaluated for all objective voice evaluation methods,
corresponding to different {ANR operation – noise type– noise level} conditions for each of
the headsets under test.
Case # ANR
operation Noise Type
Noise level
In dBSPL(LIN)
1 ANR-off No Noise Office level
2 ANR-on No Noise Office level
3 ANR-off MBT-like 90
4 ANR-on MBT-like 90
5 ANR-off MBT-like 110
6 ANR-on MBT-like 110
7 ANR-off MBT-like 120
8 ANR-on MBT-like 120
9 ANR-off pink 90
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 18
10 ANR-on pink 90
11 ANR-off pink 110
12 ANR-on pink 110
13 ANR-off pink 120
14 ANR-on pink 120
It is also noted that the ASR accuracy did not seem to provide usable information2
about changes in communication quality. Analyzing HS1, HS2, and HS3 we observed that
the ASR scores were not well correlated with most of the analyzed cases, while there was a
performance drop in the “no-noise/ANR-on” case for HS1, along with unexpectedly excellent
performance under pink noise environment, findings which are not compatible to the hitherto
analysis. The resulting behavior of the ASR scores, being far different from the behavior of
STI and PESQ, led us to the decision to exclude ASR from further investigation. It is noted
also that STI and PESQ present a generally compatible behavior for all cases and headsets
(Fig. 9). Specifically, the performance relatively deteriorates as noise type changes from no
noise, to MBT-like, to pink and noise level from 90 to 110, to 120 dBSPL(LIN). This trend,
which was roughly expected from the hitherto analysis, is closer followed by PESQ.
Therefore, it was decided to present only STI and PESQ results and compare them
against the subjective MUSHRA evaluation method ones. As already explained in subsection
3.4, a subset of noise type / level conditions was selected and the subjective method was
carried out for these conditions, in order to render the subjective evaluation feasible. Further
increase of the test cases would result to fatigue and confusion of the subjects leading to
unreliable results. Given that the different noise types led to similar behavior over noise level
increase for the two different employed types of noise, the worst case of 120 dBSPL(LIN) was
the selected level, while both pink noise and MBT-like noise conditions were evaluated. The
investigated conditions for each headset are summarized in Table 2. The case numbers
appearing in Table 2 are used as a reference in Fig. 10 where the MUSHRA results are
presented and in Fig. 11 where the results of STI, PESQ and MUSHRA are comparatively
portrayed.
Table 2. Different cases evaluated for the objective voice evaluation methods
compared to the subjective one, corresponding to different {headset – ANR operation – noise
type– noise level} conditions.
2 This behaviour of the ASR scores can be explained by the small number of words in each voice recording,
causing even slight recognition mistakes, insertions, or deletions to severely effect on the accuracy score.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 19
Case # Headset
under test ANR
operation Noise Type
Noise level
In dBSPL(LIN)
1 HS1 ANR-off MBT-like 120
2 HS1 ANR-on MBT-like 120
3 HS1 ANR-off No Noise Office level
4 HS1 ANR-on No Noise Office level
5 HS1 ANR-off Pink 120
6 HS1 ANR-on Pink 120
7 HS2 ANR-off MBT-like 120
8 HS2 ANR-on MBT-like 120
9 HS2 ANR-off No Noise Office level
10 HS2 ANR-on No Noise Office level
11 HS2 ANR-off Pink 120
12 HS2 ANR-on Pink 120
13 HS3 ANR-off MBT-like 120
14 HS3 ANR-on MBT-like 120
15 HS3 ANR-off No Noise Office level
16 HS3 ANR-on No Noise Office level
17 HS3 ANR-off Pink 120
18 HS3 ANR-on Pink 120
The results presented in Fig. 10 depict the MUSHRA test results, for the test cases
summarized in Table 2. Specifically, the MUSHRA average values for each test case as well
as the 95% confidence interval are shown. For each HS under test, the highest scores are
given to office-level noise cases, followed by the MBT-like noise and pink noise test cases.
HS1 and HS2 achieve the overall highest average score for office-like noise levels, while
HS3 has a significantly lower score. HS2 seems to provide the highest score for both MBT-
like and pink noise. Moreover, the ANR feature seems to slightly degrade the perceived
intelligibility for most test cases; improvement is noticed only for MBT noise when using
HS1 and HS3.
Fig. 11 provides a comparative presentation of the objective evaluation (STI, PESQ)
and subjective evaluation (MUSHRA) results. Ideally, the subjects should have listened to the
signal directly as it was reproduced by the headsets under analysis. However, this would
render their evaluation, up to a certain extent, incompatible to the evaluation resulting from
the STI and PESQ analyses which were performed on the HATS-recorded signal, since
neither the headset fitting nor the ear transfer function could be exactly the same. On the
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 20
other hand, one should take into account the safety of the subjects exposed to such high noise
environment and the resulting experiment limitations.
Although the adopted procedure adds the response of the artificial ear and the
response of the monitor headphones to the reproduction chain, it was considered safer than
the alternatives and accurate enough regarding the extreme communication conditions
examined and the purpose of the subjective evaluation. Nevertheless, as already pointed out
in subsection 3.4., it has been reported that STI measurements obtained with Subjects (MIRE)
and with an Artificial Head (HATS) yielded practically the same results (refer to Fig. 22 of
[12]). Of course, high quality headphones (AKG, K 240 Studio) were used to minimize their
influence. Therefore, it was not expected that the adopted procedure would lead to any
incompatibility between the subjective and the objective evaluation results.
The following notes can be made on the results depicted in Fig. 11:
(a) An offset in favor of STI can be consistently observed. This is expected since
PESQ and MUSHRA do not strictly judge speech intelligibility, but also take into account
quality of voice.
(b) In most cases, PESQ lies between the STI and the MUSHRA values, yet,
persistently following MUSHRA’s trend (observe cases 1, 2: HS1, MBT-like noise).
(c) Subjects’ opinion is largely rejecting HS3, especially in the pink noise case
(observe cases 17, 18: HS3, pink noise) which is compatible with the maximum reproduction
SPL results analysis.
(d) The closer agreement of all metrics is observed in the HS2 case; this behavior
could be an indication for the higher quality of the specific headset but also an indication that
MUSHRA’s results are indeed compatible to the objective methods results.
(e) The higher disagreement among the different methods results is observed in the
pink noise cases (5, 6: HS1, 11, 12: HS2, and 17, 18: HS3), where also the individual
performance of all metrics seems to be the worst for all headsets.
(f) All headsets have lower scores with ANR-on (as compared to the ANR-off
operation) in office noise level conditions, which is expected since the processing involved
leads to a certain degree of distortion increase.
(g) HS2 and HS3 perform better with ANR-on in presence of noise, clearly for MBT-
noise, while marginally for pink noise. In contrast, HS1 is always worst with ANR-on than
with ANR-off.
In relation to the comparison of the three headsets under evaluation in terms of
communication quality, it is clear that HS2 is the best of them as an ANR headset, while HS1
is also a good headset but not as an ANR one; its ANR subsystem seems to be inefficient.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 21
PESQ, being the metric the more consistent to MUSHRA’s results, seems to be the most
appropriate choice for the objective evaluation of the quality of communication offered by
HNE headsets at maximum SPL reproduction in high noise environment.
In relation to the overall evaluation of the maximum SPL reproduction performance, it
is suggested that both maximum SPL measurements and PESQ should be co-evaluated. This
way one could end-up to the same conclusions without the given of subjective evaluation.
Finally, it should be noted that, in addition to the experiments already described, all
three headsets were evaluated by a limited number of MBT crews on the same intercom
system; these subjects unanimously expressed their preference to HS2. It was declared as the
headset presenting the best overall voice reproduction behavior and the higher
communication quality. However, a systematic end-user subjective evaluation test has not
been performed.
Fig. 9. Analysis results after the application of the STI, and PESQ voice evaluation
methods for the case of: (a) HS1, (b) HS2, and (c) HS3. The results are presented normalized
to 100% for easier comparison. For the description of the different cases evaluated, please
refer to Table 1. For color graphics please refer to the on-line version.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 22
Fig. 10. MUSHRA analysis results for selected conditions for HS1, HS2 and HS3. For
the description of the different cases evaluated, please refer to Table 2. For color graphics
please refer to the on-line version.
Fig. 11. Comparative presentation of the objective (STI, PESQ) and subjective
(MUSHRA) analysis results for selected conditions for HS1, HS2 and HS3. For the
description of the different cases evaluated, please refer to Table2. For color graphics please
refer to the on-line version.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 23
4. Conclusions
In this article a novel integrated method for the assessment of the electroacoustic /
acoustic communication performance of HNE headsets in conditions of maximum
reproduction level and high acoustic environmental noise has been presented. The focus was
on the assessment of the voice communication quality, due to the degradation of the
reproduced speech by the headset, caused by the presence of different acoustic background
noise levels and spectra, and also by distortion that is mainly introduced by the electro-
acoustic transducer and the ANR system of the headset when they are driven with high
speech signal levels in high acoustic noise conditions.
The application of the proposed method on three commercially available military
HNE intercom headsets, using three different objective and one subjective evaluation method
for the communication quality assessment, revealed the PESQ as the most suitable objective
metric, on the ground that it follows more closely the trend of the subjective results.
As the application of the method revealed, both maximum SPL measurements and
PESQ should be co-evaluated for the overall evaluation of the maximum SPL reproduction
performance. Such a co-evaluation leads to assessment results very close to the subjective
ones.
It should be noted that a separate subjective evaluation of the same headsets,
connected to a specific -same across all of them- intercom system, was carried out inside a
tracked vehicle and under real high noise conditions, but with a limited number of MBT
crews. This subjective evaluation highlighted the HS2 as the headset with the best overall
voice reproduction behavior and higher communication quality. This is in agreement to the
results obtained by the proposed assessment method, which is an encouraging indication for
its effectiveness of predicting the acoustic communication performance of the headsets under
real high noise conditions.
Acknowledgments
The authors wish to thank Intracom Defense Electronics S.A. for the kind permission to use
the facilities of their Analog Electronics and Electroacoustics Laboratory for the conduction
of the presented measurements, as well as for the organization of the subjective evaluation
tests inside the tracked vehicle.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 24
References
[1] ITU-T-REC-P.380, Electro-acoustic measurements on headsets. Geneva: International
Telecommunication Union – Telecommunication Standardization Sector, 2003.
[2] ITU-T-REC-P.64, Determination of sensitivity/frequency characteristics of local
telephone systems. Geneva: International Telecommunication Union – Telecommunication
Standardization Sector, 2007.
[3] ITU-T-REC-P.79, Calculation of loudness ratings for telephone sets. Geneva:
International Telecommunication Union – Telecommunication Standardization Sector, 2007.
[4] MIL-PRF-25670B. Earphone Elements, General Specification For, Department of
Defense USA, 2006
[5] J. Cui, A. Behar, W. Wong, H. Kunov, Insertion loss testing of active noise reduction
headsets using acoustic fixture, Applied Acoustics, 64: 1011–1031, 2003.
[6] ANSI/ASA S.12.42-2010, Methods for the measurement of insertion loss of hearing
protection devices in continuous or impulsive noise using microphone-in-the-real-ear or
acoustic test fixture procedures, New York: American Standard Institute, New York, 2010.
[7] S.M. Potirakis, Y. Moisiadis, A. Varagis, Performance evaluation method for high noise
environment intercom headsets, Proceeding of Acoustics 08. Paris, France, Journal of
Acoustical Society of America, 2008; 123(5): 3825, 2008.
[8] S.M. Potirakis, N.A. Tatlas, M. Rangoussi, Electroacoustic measurements for high noise
environment intercom headsets, Proceeding of 128th Audio Engineering Society Convention,
London, UK, Paper Number: 8077, 2010.
[9] H. J. M. Steeneken and J. Verhave, Digitally controlled active noise Reduction with
integrated speech communication, Proceedings of CIOP noise conference, Kielce Poland,
2001.
[10] H. J. M. Steeneken, Assessment and standardization of personal hearing protection
including active νoise reduction, The Research and Technology Organisation (RTO) of
NATO, RTO HFM Lecture Series on “Personal Hearing Protection including Active Noise
Reduction”, Warsaw, Poland, 25-26 October 2004; Belgium, Brussels, 28-29 October 2004;
Virginia Beach, VA, USA, 9-10 November 2004, RTO-EN-HFM-111, 2004.
[11] R. B. Valimont, J. G. Casali, J. A. Lancaster, ANR vs. passive communications headsets:
Investigation of speech intelligibility, pilot workload and flight performance in an aircraft
simulator, Proceedings of the Human Factors and Ergonomics Society, 50th annual meeting,
pp. 2143-2147, 2006.
[12] A. L. Dancer, K. Buck, T. Wessling, H. J.M. Steeneken, J. Verhave, S. H. James, G.
Rood, R. McKinley, Assessment methods for personal active noise reduction validated in an
international round robin, The Research and Technology Organisation (RTO) of NATO, TR-
HFM-094, ISBN 92-837-1121-1, 2004.
[13] A. J. Brammer, D. R. Peterson, M. G. Cherniack, and S. Gullapalli, Improving the
Effectiveness of Communication Headsets with with Active Noise Reduction: Influence of
Control Structure, The Research and Technology Organisation (RTO) of NATO, In New
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 25
Directions for Improving Audio Effectiveness (pp. 6-1 – 6-8), RTO-MP-HFM-123, Paper 6,
Neuilly-sur-Seine, France: RTO, 2005.
[14] R. H. Campbell, Electroacoustic properties of noise attenuating headsets, Journal of
Audio Engineering Society 23(10): 806-809, 1975.
[15] R. G. Matschke, Kommunikation und Lärm: Sprachverständnis bei Luftfahrzeugführern
mit und ohne aktive Lärmkompensation (Communication and noise. Speech intelligibility of
aircraft pilots with and without electronic compensation for noise), HNO. Hals-Nasen-
Ohrenärzte, ISSN 0017-6192, 42(8): 499-504, 1994.
[16] K. Buck and V. Zimpfer-Jost, Active hearing protector systems and their performance,,
The Research and Technology Organisation (RTO) of NATO, RTO HFM Lecture Series on
“Personal Hearing Protection including Active Noise Reduction, Warsaw, Poland, 25-26
October 2004; Belgium, Brussels, 28-29 October 2004; Virginia Beach, VA, USA, 9-10
November 2004, and published in RTO-EN-HFM-111, 2004.
[17] A. J. Brammer, D. R. Peterson, M. G. Cherniack, S. Gullapalli, and R.B. Crabtree,
Maintaining speech intelligibility in communication headsets equipped with active noise
control, Canadian Acoustics - Acoustique Canadienne 32(3): 132-133, 2004.
[18] S. H. James, Defining the cockpit noise hazard, aircrew hearing damage risk and the
benefits active noise reduction headsets can provide, The Research and Technology
Organisation (RTO) of NATO, RTO HFM Lecture Series on “Personal Hearing Protection
including Active Noise Reduction”, Warsaw, Poland, 25-26 October 2004; Belgium,
Brussels, 28-29 October 2004; Virginia Beach, VA, USA, 9-10 November 2004, RTO-EN-
HFM-111, 2004.
[19] A. Farina, Simultaneous measurement of impulse response and distortion with a swept-
sine technique, Proceedings of 108th Audio Engineering Society Convention, Paris, Paper
Number: 5093, 2000.
[20] ITU-T-REC-P.501, Test signals for use in telephonometry, Geneva: International
Telecommunication Union – Telecommunication Standardization Sector, 2009.
[21] ANSI S3.5-1997 (R2007), Methods for calculation of the speech intelligibility index,
New York: American Standard Institute, New York, 1997.
[22] IEC 60268-16, Sound system equipment – Part 16: Objective rating of speech
intelligibility by speech transmission index, Geneva: International Electrotechnical
Commission, Fourth edition, Geneva, 2011.
[23] Ray L. Goldsworthy and Julie E. Greenberga, Analysis of speech-based speech
transmission index methods with implications for nonlinear operations, Journal of Acoustical
Society of America, 116 (6), 2004.
[24] J. Ma, Y. Hu. and P. Loizou, Objective measures for predicting speech intelligibility in
noisy conditions based on new band-importance functions, Journal of Acoustical Society of
America, 125(5): 3387-3405, 2009.
[25] J. Ma and P. Loizou, SNR loss: A new objective measure for predicting the intelligibility
of noise-suppressed speech, Journal of Acoustical Society of America, 53: 340-354, 2011.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 26
[26] M. Elhilali, T. Chi, S. A. Shamma, A spectro–temporal modulation index (STMI) for
assessment of speech intelligibility, Speech Communication, 41( 2-3): 331-348, 2003.
[27] F. F. Li and T. J. Cox, Speech transmission index from running speech: A neural
network approach, Journal of Acoustical Society of America, 113 (4), Pt. 1, DOI:
10.1121/1.1558373, 2003.
[28] P. C. Doyle, Clinical evaluation in head and neck cancer: voice quality and speech
intelligibility outcomes,Presented in 11th Annual Head and Neck Conference: Managing the
Effects of Treatment, October 24, 2008, Baltimore, 2008.
[29] ITU-T-REC- P.862, Perceptual evaluation of speech quality (PESQ): An objective
method for end-to-end speech quality assessment of narrow-band telephone networks and
speech codecs, Geneva: International Telecommunication Union – Telecommunication
Standardization Sector, 2001.
[30] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D.
Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK Book (for HTK Version 3.4),
Cambridge University Engineering Department, 2006.
[31] K. Vertanen, Baseline WSJ Acoustic Models for HTK and Sphinx: Training Recipes and
Recognition Experiments, Technical Report, Cavendish Laboratory, 2006.
[32] The Carnegie Mellon University Pronouncing Dictionary. On-line at
https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/ (last accessed:
14/12/2012).
[33] Y. Teng, Objective speech intelligibility assessment using speech recognition and
bigram statistics with application to low bit-rate codec evaluation, Ph.D. dissertation,
Department of Electrical and Computer Engineering, University of Wyoming, December,
2006.
[34] Weinstein, C. J., Opportunities for Advanced Speech Processing in Military Computer-
Based Systems, IEEE Proceedings, 79(11): 1626–1641, 1991.
[35] A. Amrouche, M. Debyeche,A. Taleb-Ahmed, J.-M. Rouvaen, M. C.E. Yagoub, An
efficient speech recognition system in adverse conditions using the nonparametric regression,
Engineering Applications of Artificial Intelligence, 23: 85–94, 2010
[36] H. Kim, J. Park, Y. Oh, S. Kim, B. Kim, Voice command recognition for fighter pilots
using grammar tree, Computer and Information Science, 352: 116-119, 2012, doi:
10.1007/978-3-642-35603-2_18.
[37] ITU-R BS.1534-1, Method for the subjective assessment of intermediate quality level of
coding systems. International Telecommunication Union, Radiocommunication Assembly,
2003.
[38] Dr. Emmanuel Vincent, MUSHRAM - A Matlab interface for MUSHRA listening
tests(version 1.0), http://www.elec.qmul.ac.uk/digitalmusic/downloads/#mushram, 2005.
[39] H. J. M. Steeneken and T. Houtgast, The temporal envelope spectrum and its
significance in room acoustics, Proceedings of the 11th ICA, 7, Paris, 1983, pp. 85–88, 1983.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 27
Figure Captions
Fig. 1. Test setup for the electroacoustic reproduction measurements based on HATS: (a)
instrument connections along with a typical CVC placed on the HATS, and (b) spatial
arrangement inside the noise room.
Fig. 2. MBT-like noise third octave spectrum with an overall level of ~110 dBSPL(LIN).
Fig. 3. Max SPL for HD under -15 dB,, with ANR-on at: office level noise conditions
(green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red) for HS1
(upper panel). The corresponding HD spectra depicted using the same color code (bottom
panel). For color graphics please refer to the on-line version. Both graph windows share the
same, aligned, horizontal frequency axis.
Fig. 4. Max SPL for HD under -15 dB, with ANR-on at: office level noise conditions
(green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red) for
HS2(upper panel). The corresponding HD spectra depicted using the same color code
(bottom panel). For color graphics please refer to the on-line version. Both graph windows
share the same, aligned, horizontal frequency axis.
Fig. 5. Max SPL for HD under -15 dB, with ANR-on at: office level noise conditions
(green), pink noise 110 dBSPL(LIN) (blue), and MBT-like noise 110 dBSPL(LIN) (red) for HS3
(upper panel). The corresponding HD spectra depicted using the same color code (bottom
panel). For color graphics please refer to the on-line version. Both graph windows share the
same, aligned, horizontal frequency axis.
Fig. 6. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue), pink
noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on
(green), for HS1 (upper panel). The corresponding HD spectra depicted using the same color
code (bottom panel). For color graphics please refer to the on-line version. Both graph
windows share the same, aligned, horizontal frequency axis.
Fig. 7. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue), pink
noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on
(green), for HS2 (upper panel). The corresponding HD spectra depicted using the same color
code (bottom panel).. For color graphics please refer to the on-line version. Both graph
windows share the same, aligned, horizontal frequency axis.
S.M. Potirakis et al., High noise environment military intercom headsets assessment p. 28
Fig. 8. Max SPL for HD under -15 dB at 120 dBSPL(LIN): pink noise ANR-off (blue), pink
noise ANR-on (magenta), MBT-like noise ANR-off (red), and MBT-like noise ANR-on
(green), for HS3 (upper panel). The corresponding HD spectra depicted using the same color
code (bottom panel). For color graphics please refer to the on-line version. Both graph
windows share the same, aligned, horizontal frequency axis.
Fig. 9. Analysis results after the application of the STI, and PESQ voice evaluation methods
for the case of: (a) HS1, (b) HS2, and (c) HS3. The results are presented normalized to 100%
for easier comparison. For the description of the different cases evaluated, please refer to
Table 1. For color graphics please refer to the on-line version.
Fig. 10. MUSHRA analysis results for selected conditions for HS1, HS2 and HS3. For the
description of the different cases evaluated, please refer to Table 2. For color graphics please
refer to the on-line version.
Fig. 11. Comparative presentation of the objective (STI, PESQ) and subjective (MUSHRA)
analysis results for selected conditions for HS1, HS2 and HS3. For the description of the
different cases evaluated, please refer to Table2. For color graphics please refer to the on-line
version.