Post on 31-Mar-2023
www.elsevier.com/locate/apacoust
Applied Acoustics 65 (2004) 473–483
Spectral enhancement of Polish vowelsto improve their identification by
hearing impaired listeners
E. Ozimek *, A. Sezk, A. Wicher, E. Skrodzka, J. Konieczny
Institute of Acoustics, A. Mickiewicz University, Pozna�n, Poland
Received 23 March 2003; received in revised form 10 November 2003; accepted 17 November 2003
Abstract
Abnormalities in the cochlear function usually cause broadening of the auditory filters
which reduces the speech intelligibility. An attempt to apply a spectral enhancement algorithm
has been undertaken to improve the identification of Polish vowels by subjects with cochlear-
based hearing-impairment. The identification scores of natural (unprocessed) vowels and
spectrally enhanced (processed) vowels has been measured for hearing-impaired subjects. It
has been found that spectral enhancement improves vowel scores by about 10% for those
subjects, however, a wide variation in individual performance among subjects has been ob-
served. The overall vowels identification scores obtained were 85% for natural vowels and 96%
for spectrally enhanced vowels.
� 2003 Elsevier Ltd. All rights reserved.
Keywords: Spectral enhancement; Vowels; Identification scores; Cochlear function
1. Introduction
Broadening of the auditory filters resulting from the abnormalities in the cochlear
function [13] is usually related to worsening of frequency resolution and spectral
contrast (difference in amplitude between spectral peaks and troughs of the succes-
sive formants) of speech sounds. This leads to a smear of internal auditory repre-
sentations (perceptual spectra) of those sounds worsening their identification.
*Corresponding author.
E-mail address: ozimaku@main.amu.edu.pl (E. Ozimek).
0003-682X/$ - see front matter � 2003 Elsevier Ltd. All rights reserved.
doi:10.1016/j.apacoust.2003.11.004
474 E. Ozimek et al. / Applied Acoustics 65 (2004) 473–483
The question arises to what extent the changes in the internal representations of
vowels lead to a reduction of their identification for hearing-impaired subjects. It is
usually assumed that such changes effect at least partly the vowel recognition ability
of those subjects. The identification of vowels may be facilitated by spectral en-
hancement which leads to increase of their spectral contrast. The spectral enhance-
ment, causing a greater than normal concentration of the spectrum energy aroundformant frequencies, may improve recognition of vowels because increasing of
spectral contrast partly compensates for the poorer than normal frequency resolu-
tion accompanying a sensorineural hearing impairment [18]. Some researchers have
attempted to change the spectral contrast of speech sounds by changing their for-
mant bandwidths [4,17,20]. Increasing formant bandwidths reduces formant peak-
to-trough differences and consequently reduces spectral contrast, while decreasing
formant bandwidths causes opposite effect. Boers [4] found only insignificant effect
of narrowed bandwidths on sentence intelligibility. Van Veen and Houtgast [20]found that decreasing formant bandwidths has small effect on the judged similarity
of vowels. Summerfield et al. [17] varied spectral contrast by varying formant
bandwidths in the synthetic CVC syllables. They found little improvement in iden-
tification of stop consonants resulting from narrowed bandwidths. Leek et al. [12]
examined a minimum spectral contrast for identification of synthesized vowels by
normal-hearing and hearing-impaired subjects. They measured identification ability
in noise as a function of peak-to-trough differences in the spectrum of those vowels
and found that the peak-to-trough amplitude differences required for 75% identifi-cation accuracy amounted to 1–2 dB for normal-hearing subjects and 6–7 dB for
hearing-impaired subjects. Franck et al. [9] investigated effects of the reduced dy-
namic range on vowel perception by compression and compensation of the reduced
frequency resolution by spectral enhancement. They found in some measurable
conditions that spectral enhancement produced improvements of vowel scores but
this was counteracted by deterioration of the consonant scores.
The basic purpose of this study was to examine the influence of spectral en-
hancement on the identification of Polish vowels by hearing-impaired subjects. Thevowels were modified by means of the spectral enhancement algorithm developed at
Cambridge University by Baer and co-workers [3,2], which increases the peak-to-
trough differences in the spectral envelope.
2. Concept of the enhancement algorithm
The applied algorithm assumes that the vowels sounds supplied to the impaired-hearing system should be transformed in such a way that it would produce a similar
excitation as the non-transformed signal in the normal-hearing system. If PN and FNstand for the matrices representing the excitation patterns and auditory filters in the
normal-hearing system, PS and FS in the impaired-hearing system, and X is the vector
representing the spectrum of the signal, then
PN ¼ FNX ð1Þ
E. Ozimek et al. / Applied Acoustics 65 (2004) 473–483 475
and
PS ¼ FSX : ð2Þ
The signal whose spectrum would be represented by the vector #ðX Þ, that is thesignal which passed through the broadened filters would stimulate the same activityof the hearing system as the signal X in the normal-hearing system, should satisfy the
equation:
PN ¼ FS#ðX Þ ð3Þ
and finally#ðX Þ ¼ F �1S FNX : ð4Þ
The principle of the algorithm is as follows. The vowels recorded on audio CD were
sampled at a 24-kHz rate and low-pass filtered at 4.5 kHz. Each vowel, was divided
by the Hamming window into short time segments (256 points, 10 ms long) andconverted to the frequency domain (X vector) by the fast Fourier transform (FFT).
Obtained in this way short-term spectra were processed by the enhancement pro-
cedure in which only the magnitude (not phase) spectra were processed. In the en-
hancement process, the spectra were subjected to a function which enhanced their
valleys. In the inverted FFT, the adapted magnitude data were combined with the
original phase data and retransformed into the time domain. Resulting time seg-
ments were added using an overlap-add procedure to get the final output signal. The
algorithm was implemented using the TDT system (Tucker–Davies Technology) anda PC computer. The exact value of the contrast weight used for the enhanced stimuli
was chosen individually for subjects with hearing impairment, depending on their
absolute threshold elevations versus frequency.
3. Experiment
3.1. Subjects
Three hearing-impaired subjects (age range 24–40 years) who participated in
the experiment showed bilaterally symmetrical hearing impairment within the range
35–75 dB HL. No air-gaps were observed for any subjects. Basic audiological tests
(air and bone conductions, speech audiometry and SISI test) indicated sensorineural
impairment of cochlear origin. The air audiograms for hearing-impaired subjects (1,
2, 3) participating in the study are shown in Fig. 1. Two subjects 1 and 3 showed
hearing losses of between 30 and 35 dB at 125 Hz and between 60 and 65 dB at 4kHz. The hearing loss of subject 2 increased from 40 dB at 125 Hz to approximately
75 dB at 4 kHz.
For subjects with sensorineural hearing losses characteristics of the auditory filters
centered at 1, 2 and 4 kHz were also determined using a notched-noise method [16].
It was found that the shapes of the obtained auditory filters, for the frequencies for
which hearing loss were observed were asymmetric, had lower dynamic range and
Fig. 1. Audiograms of the three hearing-impaired subjects at octave intervals. The subjects are indicated
as 1–3.
476 E. Ozimek et al. / Applied Acoustics 65 (2004) 473–483
were much broader than the corresponding filters of the normal-hearing subjects.
The broadening of the auditory filters deteriorates the frequency selectivity of the
auditory system and impairs vowels identification. Amplification of the signal
reaching the ear of a subject with broadened filters cannot improve this identifica-
tion, because the signal-to-noise ratio remains the same.
3.2. Stimuli and procedure
Six Polish vowels (/o/, /a/, /i/, /e/, /u/, /y/) were used as the stimuli. Each vowel was
approximately of 500 ms duration, including 50 ms linear rise-fall times. They were
stored on a computer disk and presented to subjects in random sequence. The stimuli
were presented in 8 blocks of 4 trials to each subject. Each block of trials consisted of
eight randomly ordered presentations of six vowels. Correct answer feedback was
not given. Subjects were asked to listen to each of the blocks of stimuli a minimum offour times on different days. After each stimulus presentation, the subject wrote on a
list which vowel was heard. All testing was completed in five sessions, each lasting
3 h. The spectral enhancement was done off-line. Subjects were seated in a sound-
treated booth. They were listening to stimuli via Sennheiser 580 headphones. The
desired level for each subject was adjusted, so the subject could hear the stimuli at the
most comfortable level (MCL). The MCL levels were within a range from 85 to 100
dB SPL. To avoid distortions, a high-pass (cutoff frequency 50 Hz) and a low-pass
filter (cutoff frequency 4500 Hz) were implemented.
3.3. Results for unprocessed (natural) vowels
Fig. 2 shows example spectra of three natural (unprocessed) vowels (i, e, y) (solidlines) and the same vowels processed by the enhancement procedure (broken lines).
1 2 3 4 5-80
-60
-40
-20
0
20
/i/ /i/
Enh
1 2 3 4 5-80
-60
-40
-20
0
20
/y/ /y/
Enh
1 2 3 4 5
-80
-60
-40
-20
0
20
/e/ /e/
Enh
LE
VE
L [
dB]
FREQUENCY [kHz]
Fig. 2. Spectra of three Polish vowels produced by female speaker (solid line) and vowels processed by the
enhancement procedure (broken line).
E. Ozimek et al. / Applied Acoustics 65 (2004) 473–483 477
Fig. 3. Individual and mean correct identification (in percent) for the natural vowels obtained by the
hearing-impaired subjects.
478 E. Ozimek et al. / Applied Acoustics 65 (2004) 473–483
Individual data on identification scores of natural vowels by hearing-impaired
subjects are shown in Fig. 3. Mean percent correct identification of tested vowels is
given by blank bars. The accuracy of identification is indicated by standard deviation.
As follows from Fig. 3 the identification score varies depending on the vowel. The
best hearing-impaired subject�s score equaled as an average across vowels 97%-correct vowel identification and the worst subject scored 75%. Clear decrease in
identification is observed for vowel /o/. Van Tasell et al. [19] testing hearing-impaired
subjects found that one of their three subjects identified the seven synthetic vowel
stimuli well (93%) while the other two performed with a nearly 70% accuracy.
Nabelek et al. [14] reported a range of vowel-identification performance of 68–93%
for subjects with a mild-to-moderate sensorineural hearing loss.
The data in Fig. 3 display not only the individual identification scores of the
successive vowels, but also a tendency to use some responses more frequently thanothers. To check this tendency, the confusion matrices [6] were calculated to estimate
the effect of response bias and the pairwise discriminability of the vowels. The ob-
tained data are presented in Table 1. The percent of correct identifications of par-
ticular vowels is given at the diagonal of the confusion matrices obtained for
particular subjects.
Table 1
Vowel confusion matrices and overall percent-correct vowel identification for natural vowels
Subject 1 Subject 2 Subject 3
Vowel o a i e u y o a i e u y o a i e u y
o 66 0 0 28 0 6 98 0 0 2 0 0 2 0 0 98 0 0
a 3 45 0 49 3 0 2 97 0 1 0 0 2 98 0 0 0 0
i 0 0 94 3 3 0 0 0 98 2 0 0 0 0 97 3 0 0
e 6 0 3 91 0 0 0 2 1 97 0 0 0 2 0 98 0 0
u 0 3 0 6 63 28 0 0 0 0 96 4 0 3 0 0 97 0
y 0 0 3 3 3 91 0 3 0 0 3 96 0 0 0 0 2 98
Mean (%) 75 97 82
E. Ozimek et al. / Applied Acoustics 65 (2004) 473–483 479
The data from Table 1 show that the easiest identifiable were the vowels /e/ /i/ (97%
mean value) characterized by a significant difference in frequencies between the first and
second formant. The vowels showing small frequency difference between the formants
F1 and F2 (e.g., /o/ and /u/) weremuchworse identifiable. These results indicate that the
frequency information provided by the spectrum in the region of the first two formants
is important for recognition of vowels for subjects with moderate hearing loss.
3.4. Results for spectrally enhanced vowels
In the second stage of the experiment, identification scores were determined for
spectrally enhanced vowels whose spectra are shown in Fig. 2 (broken line). The
weight of spectral contrast used for the enhanced stimuli was chosen individually for
each subject with hearing impairment, depending on his (her) threshold elevation
and auditory filter shape. As can be seen the enhanced vowels are characterized by alarger level differences of peak and trough of successive formants and better peak
resolution relative to natural (unprocessed) vowels.
Individual (hatched bars) and average (black bars) data on vowel identification are
shown in Fig. 4. The accuracy of identification is indicated by standard deviation.
As follows from Fig. 4, the subject performance for the spectrally enhanced
vowels is better across all vowels than it is for the unprocessed vowels. Averaged
identification score across vowels and hearing-impaired subjects is equal to about
93% (83% was for unprocessed vowels). The highest identification improvement dueto spectral enhancement, is observed for vowels /o/. The best hearing-impaired
subject�s score equaled as an average 100%-correct vowel identification (97% was for
unprocessed vowels) and the worst subject scored 97% (75%).
In order to check a possible tendency among subjects to use some responses more
frequently than others, confusion matrices (similar, as in the first part of the ex-
periment) were calculated. The data are presented in Table 2.
The data from Table 2, show that the most frequently confused vowel was /o/,
whereas the least confused vowel was /e/. In the first stage of this experiment the
Fig. 4. Individual and mean correct identification (in percent) for the spectrally enhanced vowels obtained
by hearing-impaired subjects (blank bars refer to unprocessed vowels).
Table 2
Vowel confusion matrices and overall percent-correct vowel identification for spectrally enhanced vowel
Subject 1 Subject 2 Subject 3
Vowel o a i e u y o a i e u y o a i e u y
o 100 0 0 0 0 0 100 0 0 0 0 0 19 0 3 78 0 0
a 6 88 0 6 0 0 0 100 0 0 0 0 0 94 3 3 0 0
i 0 0 100 0 0 0 0 0 100 0 0 0 3 0 94 3 0 0
e 0 0 0 100 0 0 0 0 0 100 0 0 0 0 3 97 0 0
u 0 0 0 0 100 0 0 0 0 0 100 0 0 0 3 3 94 0
y 0 0 0 0 6 94 0 0 0 0 0 100 0 0 6 0 0 94
Mean
(%)
97 100 82
480 E. Ozimek et al. / Applied Acoustics 65 (2004) 473–483
most and the least confused vowels were /o/ and /i/, respectively. The vowel /e/ was
the only one that was rarely confused with any of the others.
The results shown in Fig. 4 were subjected to the variance analysis (ANOVA) in
order to check the statistical significance of the improvement of the vowel identifi-cation due to spectral enhancement. It was found that the combined effects of
spectral enhancement and subject was statistically significant [F ð1; 30Þ ¼ 6; 5,p < 0:05]. No statistical significance was found for the dependence between the
identification improvement due to spectral enhancement and the shape of the au-
ditory filters.
4. Discussion
The experimental data showed that the mean percent correct identification for
hearing-impaired subjects averaged across vowels was 85% with SD¼ 12%. For the
sake of comparison, a vowel identification study was also conducted for five un-
trained normal-hearing subjects. It was found that the vowel identification score for
those subjects equaled 100% except for the vowel /o/ whose identification score
amounted to 99.5% (data not presented in the paper). Such a high score for normal-
hearing subjects is similar to that obtained by Van Tasell et al. [19] who found thatnatural vowels were identifiable by untrained normal subjects with an average
identification score of 98.2%.
Worsening of the performance of the hearing-impaired subjects relative to that of
the normal-hearing generally supports the assumption that broadening of the au-
ditory filters associated with sensorineural hearing impairment reduces peak-to-
trough amplitude differences in the internal auditory representation of vowel spectra.
The low identification scores for some vowels (/o/ and /u/) is probably due to the fact
that hearing-impaired subjects have some difficulty to discriminate closely spacedformant peaks corresponding to those vowels, which was caused by a smoothing of
the their internal representation by broadened auditory filters.
In the literature, it is generally assumed that to get good vowel identification, the
internal representations of vowels should clearly exhibit spectral peaks. However,
E. Ozimek et al. / Applied Acoustics 65 (2004) 473–483 481
there are some discrepancies in the role of the formant structure. Dubno and
Dorman [8] stressed a special role of the first formant in this identification. Coughlin
et al. [7] suggested that vowel identification is partially predicted by reduced ability
to discriminate spectral differences in the F2 region. However, Chistovich [5] found
that poorly resolved formant peaks would not necessarily predict poor vowel iden-
tification performance. Other researchers suggest that accurate identification ofvowels is dependent on combined information of such acoustic properties as spectral
cues, vowel duration, and formant dynamics [1,10]. Our data suggest that the fre-
quency information provided by the spectrum in the region of the first two formants
is important for recognition of Polish vowels for subjects with moderate hearing loss.
In the second stage of the experiment, it was found that the spectral contrast
enhancement generally improved identification of Polish vowels for hearing-im-
paired subjects. The positive effect of spectral enhancement stated in this experiment
is similar to that of Leek et al. [12], who found that spectral contrast in vowelsprovides a useful cue to vowel identification for persons with moderate hearing
impairment. However, it is not an obvious outcome since it is not in agreement with
the Klatt [11] finding who stated that overriding importance in vowel identification
are the formant frequencies and not specially important are formant amplitude,
peak-to-valley differences and overall spectral slope. Our data suggest that the vowel
identification requires only gross estimation of formant peaks rather than resolution
of details across the spectrum.
The increase in the spectral contrast of vowels might partly compensate for thereduced frequency resolution of hearing-impaired subjects, so as to produce an in-
ternal representation similar to that produced in the normal-hearing subjects. The
contrast enhancement was not equally successful in improving the vowel identifi-
cation for tested hearing-impaired subjects. The improvement is clearly seen for
subjects 1 but less for 3. Several authors have examined the effect of spectral contrast
on the identification of speech sounds and some of them rather failed to demonstrate
its beneficial effects for the hearing-impaired subjects [4,8,17]. Boers [4] found only
insignificant effect of narrowed bandwidths on the sentence intelligibility. But forsome experimental conditions it was stated that increased spectral contrast resulted
in poorer speech-reception threshold scores for hearing-impaired subjects. Sum-
merfield et al. [17] varied spectral contrast by varying formant bandwidth in the
synthetic CVC syllables and found little improvement in identification of stop
consonants resulting from narrowed bandwidths. Franck et al. [9] stated the positive
effect of spectral enhancement on the vowels identification, however, it was coun-
teracted by the negative effect on the consonants.
These inconsistencies in the literature on vowel identification can have severalpossible reasons. The spectral contrast in the internal representation can be signifi-
cantly reduced but it is still sufficient for vowel identification. Leek et al. [12] showed
that normal-hearing subjects required only a 1–2 dB difference in the amplitude of
harmonics at spectral peaks and troughs to achieve greater than 75% accuracy in
identification of some synthetic vowels. Moreover, hearing-impaired subjects may
use some additional cues such as, vowel duration and pitch [15], formant transition
[21] or some linguistic factors which may improve identification performance. One
482 E. Ozimek et al. / Applied Acoustics 65 (2004) 473–483
can also assume that vowels are characterized by a unique patterns in the internal
representations. Such assumption would help to explain why vowels identification is
not seriously impaired for mild and moderate hearing loss.
It should be added that the study presented in this paper concern only Polish
language vowels. Their specific spectral–temporal character is rather significantly
different from that in other languages, e.g., English or German. For instance Englishhas central vowels unknown in Polish or retroflexion vowels. German has a pro-
found contribution of front vowels as does French. Because of these differences the
amplitude and frequency relations of the formants are different in these languages
and are idiosyncratic for each language. In view of the above results on the effect of
the spectral enhancement procedure on identification of vowels should not be ap-
plied to any other language as it might lead to some errors. Besides at this stage of
the study aimed just to test the enhancement procedure we decided to make mea-
surements on only three hearing-impaired subjects only. In further studies aimed atimplementation of the spectral enhancement algorithm in the hearing aids, at least
several dozen subjects with hearing-impairment of cochlear origin will participate.
Moreover, besides natural vowels other speech tests will be used.
Generally one can state that the results obtained in this paper indicated the in-
crease of vowel identification when spectral enhancement was applied. But one has
to remember that the enhancement procedure also produced some distortions that
could be not easily acceptable by some people with hearing impairment. Thus further
investigation, for a much wider range of speech stimuli and a greater number ofhearing-impaired subjects is needed before the spectral enhancement procedure finds
any application to hearing aids.
5. Conclusions
The following conclusions come out from this study:
The hearing-impaired subjects showed slight-to-moderate difficulty with identifi-cation of the natural Polish vowels. The averaged identification scores for tested
subjects ranged from 75% to 97%.
The applied spectral enhancement algorithm improved vowel identification an
average by about 8%. In the individual results, it was found that this improvement
was strongly dependent on type of vowel. The highest identification improvement
was observed for vowels /o/ (18%), the lowest for vowel /i/ (2%). The best hearing-
impaired subject�s score was 97% averaged across vowels and the worst subject
scored 82% (99% and 75% for unprocessed vowels respectively).
Acknowledgements
This research was supported by Grant # 8 T11E 017 17 from State Committee
for Scientific Research (KBN). Permission from Cambridge University (B. Mooreand T. Baer) for the use of their spectral enhancement algorithm is gratefully
acknowledged.
E. Ozimek et al. / Applied Acoustics 65 (2004) 473–483 483
References
[1] Andruski J, Nearey T. On the sufficiency of compound target specification of isolated vowel and
vowels in /bVb/ syllables. J Acoust Soc Am 1992;91:390–410.
[2] Baer T, Moore BCJ, Gatehouse S. Spectral contrast enhancement of speech in noise for listeners with
sensorineural hearing impairment: effects on intelligibility, quality and response times. J Rehabil Res
Dev 1993;30:49–72.
[3] Baer T, Moore BCJ. Spectral enhancement to compensate for reduced frequency selectivity. J Acoust
Soc Am 1994;95:2992.
[4] Boers PM. Formant enhancement of speech for listeners with sensorineural hearing loss. IPO Ann
Prog Rep 1980;15:21–8.
[5] Chistovich LA. Central auditory processing of peripheral vowel spectra. J Acoust Soc Am
1985;77:789–805.
[6] Clarke FR. Constant-ratio rule for confusion matrices in speech communication. J Acoust Soc Am
1957;29:715–20.
[7] Coughlin M, Kewley-Port D, Humes LE. The relation between identification and discrimination of
vowels in young and elderly listeners. J Acoust Soc Am 1998;104:3597–607.
[8] Dubno JR, Dorman FM. Effects of spectral flattening on vowel identification. J Acoust Soc Am
1987;82:1503–11.
[9] Franck BAM, Sidonne C, van Kreveld-Bos GM, Dreschler WA. Evaluation of spectral enhancement
in hearing aids combined with phonemic compression. J Acoust Soc Am 1999;106:1452–64.
[10] Jenkins JJ, Strange W, Miranda S. Vowel identification in mixed-speaker silent-center syllables. J
Acoust Soc Am 1994;95:1030–43.
[11] Klatt DH. Prediction of perceived phonetic distance from critical-band spectra: A first step. Proc
IEEE Int Conf Speech Acoust Signal Process 1982;129:1278–81.
[12] Leek MR, Dorman MF, Summerfield Q. Minimum spectral contrast for vowel identification by
normal-hearing and hearing-impaired listeners. J Acoust Soc Am 1987;81:148–54.
[13] Moore BCJ. Parallels between frequency selectivity measured psychophysically and in cochlear
mechanism. Scand Audiol Suppl 1986;25:139–52.
[14] N�ab�elek AK, Czyzewski Z, Krishnan LA. The influence of talker differences on vowel identification
by norm hearing and hearing impaired people. J Acoust Soc Am 1992;92:1228–46.
[15] Peterson GE, Lehiste I. Duration of syllable nuclei in English. J Acoust Soc Am 1960;32:693–703.
[16] Skrodzka E, Wicher A, Ozimek E, Sezk A. Auditory filters in sensorineural hearing-impaired subjects.
Arch Acoust 2002;27:159–74.
[17] Summerfield Q, Foster J, Tyler R, Bailey P. Influences of formant bandwidths and auditory frequency
selectivity on identification of place of articulation in stop consonants. Speech Commun 1985;4:213–
29.
[18] Tyler RS, Fernande M, Wood EJ. Masking, temporal integration and speech intelligibility in
individuals with noise-induced hearing loss. In: Taylor I, Markides A, editors. Disorder of auditory
function, vol. 3. London: Academic Press; 1980.
[19] Van Tasell D, Fabry DA, Thibodeau LM. Vowel identification and vowel masking patterns of
hearing-impaired subjects. J Acoust Soc Am 1987;81:1586–97.
[20] Van Veen TM, Houtgast T. Spectral sharpness and vowel dissimilarity. J Acoust Soc Am
1985;77:628–34.
[21] Verbrugge R, Strange W, Shankweiler D, Edman T. What information enables a listener to map a
talker�s vowel space? J Acoust Soc Am 1976;60:198–212.