Correlates of varying vocal fold adduction deficiencies in perception and production: methodological...

16
Original Paper Folia Phoniatr Logop 2004;56:305–320 DOI: 10.1159/000080067 Correlates of Varying Vocal Fold Adduction Deficiencies in Perception and Production: Methodological and Practical Considerations Jacques Koreman a Manfred Pützer a Manfred Just b a Institute of Phonetics, Saarland University, and b ENT Department, Caritasklinik St. Theresia, Saarbrücken, Germany Jacques Koreman Institute of Phonetics, Saarland University, PO Box 15 11 50 DE–66041 Saarbrücken (Germany) Tel. +49 681 302 4690, Fax +49 681 302 4684 E-Mail [email protected] ABC Fax + 41 61 306 12 34 E-Mail [email protected] www.karger.com © 2004 S. Karger AG, Basel 1021–7762/04/0565–0305$21.00/0 Accessible online at: www.karger.com/fpl Key Words Vocal fold adduction deficiency W Interrater agreement W Speaker sex W Acoustic measurements W Acoustic basis of percepts Abstract In this study the voice characteristics of nor- mal male and female speakers are compared to those of two groups of patients with unilat- eral vocal fold paralysis. In order to enhance phonation, the patients in the first group compensate for the adduction deficiency which results from paralysis. The patients in the second group do not use compensatory strategies. Sustained vowels [i:, a:, u:] were produced by the speakers and scored for roughness, breathiness and hoarseness (RBH) by 8 raters. Although interrater agree- ment for RBH scores is only moderate on average, these percepts make consistent dis- tinctions between the three speaker groups. Consistent but different distinctions are made between the three speaker groups for male and female speakers. The results show that male and female speakers should not be pooled in experimental studies of the patho- logical voice. Our results also indicate that female patients with a compensated unilater- al vocal fold paralysis cannot be clinically evaluated solely on the basis of perception, because their voices cannot be distinguished from normal, healthy female speakers, de- spite their physiological impairment. The group distinctions made on the basis of RBH scores are supported by differences in the acoustic parameters which are derived by automatic analysis of the sustained vowels. Despite identical group distinctions for RBH scores and acoustic parameters, the acoustic

Transcript of Correlates of varying vocal fold adduction deficiencies in perception and production: methodological...

Original Paper

Folia Phoniatr Logop 2004;56:305–320DOI: 10.1159/000080067

Correlates of Varying Vocal Fold AdductionDeficiencies in Perception and Production:Methodological and PracticalConsiderations

Jacques Koremana Manfred Pützera Manfred Justb

aInstitute of Phonetics, Saarland University, and bENT Department, Caritasklinik St. Theresia,Saarbrücken, Germany

Jacques KoremanInstitute of Phonetics, Saarland University, PO Box 15 11 50DE–66041 Saarbrücken (Germany)Tel. +49 681 302 4690, Fax +49 681 302 4684E-Mail [email protected]

ABCFax + 41 61 306 12 34E-Mail [email protected]

© 2004 S. Karger AG, Basel1021–7762/04/0565–0305$21.00/0

Accessible online at:www.karger.com/fpl

Key WordsVocal fold adduction deficiency W Interrateragreement W Speaker sex W Acousticmeasurements W Acoustic basis of percepts

AbstractIn this study the voice characteristics of nor-mal male and female speakers are comparedto those of two groups of patients with unilat-eral vocal fold paralysis. In order to enhancephonation, the patients in the first groupcompensate for the adduction deficiencywhich results from paralysis. The patients inthe second group do not use compensatorystrategies. Sustained vowels [i:, a:, u:] wereproduced by the speakers and scored forroughness, breathiness and hoarseness(RBH) by 8 raters. Although interrater agree-ment for RBH scores is only moderate on

average, these percepts make consistent dis-tinctions between the three speaker groups.Consistent but different distinctions aremade between the three speaker groups formale and female speakers. The results showthat male and female speakers should not bepooled in experimental studies of the patho-logical voice. Our results also indicate thatfemale patients with a compensated unilater-al vocal fold paralysis cannot be clinicallyevaluated solely on the basis of perception,because their voices cannot be distinguishedfrom normal, healthy female speakers, de-spite their physiological impairment. Thegroup distinctions made on the basis of RBHscores are supported by differences in theacoustic parameters which are derived byautomatic analysis of the sustained vowels.Despite identical group distinctions for RBHscores and acoustic parameters, the acoustic

306 Folia Phoniatr Logop 2004;56:305–320 Koreman/Pützer/Just

basis of the percepts is not straightforward.Different acoustic predictors of the perceptswere found for male compared to femalespeakers. Additionally, interrater differencespoint towards the presence of perceptualtrading relations.

Copyright © 2004 S. Karger AG, Basel

Introduction

Commenting on the importance of percep-tual voice evaluation in clinical practice, Ger-ratt and Kreiman [1] observe that a ‘clinicianmay judge success [of treatment] by docu-menting changes in laryngeal anatomy orphysiology, but in general, patients are moreconcerned with how their voices sound aftertreatment’. Both for the choice of speech ther-apy measures and to motivate the patient it istherefore important to know what the percep-tually salient characteristics of vocal fold pa-thologies are. It is also important to knowtheir acoustic correlates, so that speech thera-py measures can be directed at improving theperceptual voice characteristics by addressingthese correlates.

The reliability of the perception of patho-logical voice qualities, in particular those ofroughness, breathiness and hoarseness (RBH)and the related GRBAS scales, has been thetopic of much discussion and varying opin-ions [1–3], so that the question remains opento what extent these percepts are a good basisfor any decisions about voice therapy. Weshall first investigate this problem by lookinginto the interrater agreement for the percep-tual scores for listeners with different back-grounds and levels of professional training,and then evaluate the raters’ ability to distin-guish pathological and normal voice typeson the basis of roughness, breathiness andhoarseness.

The acoustic basis of roughness, breathi-ness and hoarseness has been described quitevariably in the literature – in some casesdescriptions are even contradictory [cf. thecorrelates found for breathiness in ref. 4–6].There are several possible reasons for this.One reason is that male and female voices areoften pooled in experimental studies. Sincethey have different acoustic characteristics [7]and are therefore likely to be perceived differ-ently, we analyse them separately.

Another factor that may obscure the rela-tionship between perceptual characteristicsand acoustic correlates in many studies is thata wide variety of pathological voices whichcover a range of physiological states are ofteninvestigated in the same study without differ-entiating for pathology type [1–5, 8, 9]. AsWolfe et al. [4] note, ‘the physiological factorscontributing to dysphonia can be many andvaried’. If, as Gerratt and Kreiman [1] argue,roughness, breathiness and hoarseness arecomplex psychophysical percepts, it is proba-ble that the perception of these voice proper-ties for different pathologies are caused by dif-fering combinations and weightings of acous-tic characteristics. Compensatory articula-tions and perceptual trading relations be-tween acoustic cues can obscure the relation-ship(s) between perceptual characteristics onthe one hand and acoustic measures on theother. A more in-depth study of single patho-logies may be better suited to illuminate theserelationships. In this study we therefore con-centrate on voice patients with a unilateralparalysis of the recurrent nerve [10], which isa frequent cause of deficient vocal fold adduc-tion. The patients cover a range of physiologi-cal constellations.

These issues, which have been discussed inthe literature, are investigated in this paper bycomparing pathological voices with varyingvocal fold adduction deficiencies with normalvoices. They are considered in the paper as

Correlates of Adduction Deficiency Folia Phoniatr Logop 2004;56:305–320 307

follows. After a description of the speakersand the material, the perceptual ratings andacoustic measures derived for them are pre-sented. Then, the results are presented in fourparts. In the first part, we determine the dis-tinguishability of the three speaker groups onthe basis of perceived roughness, breathinessand hoarseness. In the second part, we look atthe relationship between the three percepts. Inthe third part, the acoustic voice propertiesthat distinguish the three speaker groups fromeach other are identified. In the fourth part,the perceptual scores are related to the acous-tic voice characteristics. In the discussion, theissues brought up in this introduction are tak-en up again.

Material and Methods

Speakers and MaterialThe recordings of 100 German speakers were used

for the present study. They are a subset of a larger data-base [11].

Besides a control group of 25 male and 25 femalespeakers with no known speaking or hearing problems(group 1, matched for age to the pathological speakers),all speakers with unilateral vocal fold paralysis wereselected from the database. Since women more fre-quently undergo an operation of the thyroid gland,which is a frequent cause of paralysis of the recurrentnerve and the resulting vocal fold paralysis [12], thenumber of female pathological speakers in our sampleexceeds that of male pathological speakers. Becausethe immobilised vocal fold takes up a paramedianposition or, sometimes, an intermediate position be-tween the normal voicing position and complete ab-duction, this results in a large unmodulated airflowand ineffective vocal fold vibration if the adductiondeficiency is not compensated for. The selected sampleof pathological speakers was divided into two groupson the basis of the observed vocal fold adduction,judged from laryngoscopic and videostroboscopic re-cordings of the patients’ vocal folds during phonationby an experienced ENT physician. The clinical judge-ments were made during consultation. In problemati-cal cases, they were viewed again and discussed at alater stage.

The first group of pathological speakers (group 2),consisting of 4 male and 18 female speakers in the agerange of 35–77, are characterised by glottal compensa-tion for the lack of vocal fold adduction. Such compen-sation is achieved by moving the healthy vocal foldbeyond its normal voicing position towards the immo-bilised vocal fold, creating a sufficiently close adduc-tion of the vocal folds to allow the Bernoulli effect toinduce strong vocal fold vibration (at a given airflowthrough the glottis) [6, 13, 14]. The other group ofpathological speakers (group 3) consists of 13 male and15 female speakers (42–75 years old) who do not com-pensate for the lack of vocal fold adduction.

The speakers were requested to produce sustainedvowels [i:], [a:] and [u:] with a minimum duration of 2 sat a self-selected comfortable pitch. The total numberof vowel recordings is 300 (100 speakers ! 3 vowels).

Perceptual RatingsThe 300 vowel recordings were judged over head-

phones in a quiet room by 8 raters (1 ENT clinician,3 speech therapists, 3 phoneticians and 1 biophysicistwith experience in the area of vocal fold pathology).The raters scored roughness, breathiness and hoarse-ness on a four-point scale (0 = not present, 1 = slightly,2 = quite, 3 = very; the use of intermediate values wasnot permitted). The vowel stimuli were presented onlyonce in randomised order and were preceded and fol-lowed by 10 fillers. Each stimulus was preceded by ashort beep and a 500-ms pause and followed by asilence of 10 s, during which each of the raters typed inhis responses.

Since interrater agreement for roughness, breathi-ness and hoarseness scores as well as for dissimilarityjudgements in general [15] is often low for pathologicalvoices, the 8 raters in our experiment underwent atraining session immediately prior to carrying out theevaluation task. The aim of the training session was tohelp improve interrater agreement. The raters listenedto an RBH training CD containing 40 examples withfeedback, compiled by Nawka and Anders [16]. SinceNawka and Anders’ [16] written instructions state thata voice must be scored as hoarse if it is either rough orbreathy, our raters may have been influenced in estab-lishing their hoarseness criteria (cf. also Fairbanks [17]quoted in Gerratt and Kreiman [1]). In our experi-ment, however, no specific instructions were given bythe experimenter to influence the hoarseness ratings.

Acoustic MeasurementsFor each speaker, the microphone signal was re-

corded in a sound-treated room, using a neckband con-denser microphone (NEM 192.15, Beyerdynamic). By

308 Folia Phoniatr Logop 2004;56:305–320 Koreman/Pützer/Just

Table 1a. Spearman’s correlationsbetween the 8 raters’ RBH scoresfor male speakers

R PM JK BB MP ST IP BA

MJ NS NS 0.195 NS NS NS NSBA 0.587 0.435 0.646 0.459 0.447 0.470IP 0.526 0.430 0.505 0.508 0.526ST 0.591 0.543 0.549 0.776MP 0.622 0.672 0.558BB 0.590 0.520JK 0.569

B PM JK BB MP ST IP BA

MJ 0.721 0.759 0.642 0.708 0.681 0.670 0.590BA 0.617 0.625 0.550 0.624 0.622 0.587IP 0.730 0.696 0.626 0.717 0.710ST 0.695 0.764 0.678 0.810MP 0.788 0.844 0.759BB 0.717 0.724JK 0.813

H PM JK BB MP ST IP BA

MJ 0.640 0.574 0.462 0.682 0.639 0.680 0.375BA 0.526 0.450 0.371 0.505 0.569 0.529IP 0.700 0.694 0.362 0.733 0.747ST 0.720 0.639 0.488 0.775MP 0.759 0.774 0.604BB 0.472 0.443JK 0.773

p ! 0.05; NS = not significant.

using a headset microphone, the distance to the lipsremains constant during speech independent of headmovements [18]. The signal was fed directly into aComputerised Speech Lab (CSL) station (model4300B) at a sampling rate of 50 kHz to reduce the tem-poral quantisation error to 0.02 ms. Amplitude resolu-tion was 16 bit.

For each signal, a portion of more than half a sec-ond between positive zero-crossings was selected, start-ing 0.5 s after the beginning of phonation. The selectedportion of the signal contains more than the 20–30 pitch periods which, according to Klingholz [19],are needed to draw conclusions about voice quality.

The signal was analysed using the Multi-Dimen-sional Voice Program (MDVP, Kay Elemetrics model4338). The acoustic parameters computed by the

MDVP program are grouped into seven classes: Fun-damental Frequency, Frequency Perturbation, Ampli-tude Perturbation, Tremor, Subharmonic Measure-ments, Spectral Energy (also called Noise-RelatedMeasurements) and Voice Breaks [4, 20]. Althoughvocal fold paralysis can lead to interruptions of vocalfold vibrations under normal speaking conditions, allthe sustained vowels used in this study were voicedthroughout, even for the pathological speakers1.

1 The occurrence of Voice Breaks despite the fact that allvowels are in fact fully voiced is explained by low signal ampli-tudes which prevent the MDVP algorithms from detecting thevoiced signal.

Correlates of Adduction Deficiency Folia Phoniatr Logop 2004;56:305–320 309

Table 1b. Spearman’s correlationsbetween the 8 raters’ RBH scoresfor female speakers

R PM JK BB MP ST IP BA

MJ 0.222 0.367 0.380 0.434 0.476 0.312 0.387BA 0.320 0.430 0.424 0.453 0.463 0.286IP 0.481 0.325 0.296 0.419 0.376ST 0.508 0.529 0.500 0.790MP 0.523 0.592 0.587BB 0.528 0.543JK 0.528

B PM JK BB MP ST IP BA

MJ 0.368 0.562 NS 0.510 0.431 0.515 0.532BA 0.373 0.578 NS 0.553 0.531 0.523IP 0.389 0.551 NS 0.554 0.503ST 0.386 0.591 NS 0.741MP 0.554 0.699 0.159BB 0.353 NSJK 0.586

H PM JK BB MP ST IP BA

MJ 0.455 0.530 0.277 0.533 0.457 0.457 0.378BA 0.340 0.412 0.405 0.478 0.559 0.384IP 0.597 0.586 0.363 0.619 0.508ST 0.399 0.586 0.293 0.642MP 0.559 0.673 0.411BB 0.460 0.266JK 0.493

p ! 0.05; NS = not significant.

Results

Perceptual Distinguishability of the ThreeSpeaker GroupsThe agreement between the 8 raters was

determined from Spearman’s correlations foreach pair of listeners. Correlations betweenthe raters’ RBH scores varied strongly. Mostof the interrater correlations for male voiceswere moderate to high (table 1a), while theywere only low to moderate for female voices(table 1b). This is true for roughness, breathi-ness and hoarseness scores. Two exceptions

can be observed: rater MJ’s roughness scoresdid not correlate with those of the other ratersfor male voices, although the correlationswere low to moderate for female voices; raterBB’s breathiness scores did not correlate withthose of most other raters for female voices,while moderate to high correlations werefound for his scores of male voices. Not onlydo we observe a lack of very high correlations,apparently individual raters evaluate aspectsof voice quality very differently for male andfemale voices [21].

310 Folia Phoniatr Logop 2004;56:305–320 Koreman/Pützer/Just

Table 2. Group distinctions (indicated by a dash) formale and female voices on the basis of RBH scores ofindividual raters

Rater Percept Male Female

MJ R NS NSB 1–2–3 1, 2–3H 1–2, 3 1, 2–3

BA R 1–2, 3 1, 2–3B 1–2, 3 1, 2–3H 1, 2–3 1, 2–3

IP R 1–2–3 1, 2–3B 1–2–3 1, 2–3H 1–2–3 1, 2–3

ST R 1–2, 3 1, 2–3B 1–2–3 1, 2–3H 1–2–3 1, 2–3

MP R 1–2, 3 1, 2–3B 1–2–3 1, 2–3H 1–2–3 1, 2–3

BB R 1–2, 3 1–3B 1–2–3 1, 2–3H 1, 2–3 1, 2–3

JK R 1–2, 3 1–3B 1–2–3 1, 2–3H 1–2–3 1, 2–3

PM R 1–2–3 1, 2–3B 1–2–3 1, 2–3H 1–2–3 1, 2–3

1 = Normal; 2 = with compensation; 3 = withoutcompensation; p ! 0.05; NS = not significant.

The generally only moderate interrateragreement seems to support Kreiman andGerratt’s [22] conclusion that raters find itdifficult to assess pathological voice quality interms of single perceptual attributes (mea-sured on an ordinal scale) of complex stimuli.For this reason, we preferred not to averagethe RBH scores to evaluate whether we candistinguish the three speaker groups in this

study, since it is not clear whether averagescores over seemingly individual perceptualdimensions are meaningful. The distinguisha-bility of the three speaker groups by rough-ness, breathiness and hoarseness scores wastherefore evaluated separately for each rater.As explained in the introduction, these per-cepts may be based on different aspects of thespeech signal in male and female voices, sothat we also analysed these separately. De-spite the moderate correlations between indi-vidual raters, the distinctions between speak-er groups were almost identical for each rater.

For both male and female speakers, signifi-cant main effects for speaker group werefound for roughness, breathiness and hoarse-ness (multivariate analyses of variance). Posthoc tests (Tukey’s honestly significant differ-ences) for the perception of male voicesshowed that all three perception measures dif-fer significantly for the three speaker groups(all at p ! 0.05), with the exception that theroughness rating does not usually distinguishbetween the patients who do compensate fortheir adduction deficiency and those who donot (table 2). For the female speakers, the posthoc tests showed that normal speakers mostlycannot be distinguished from those with acompensated unilateral adduction deficiency(table 2). If we look at figure 1 (which showsaveraged RBH scores), we first notice that theRBH scores for normal female voices areslightly higher than for male voices, whilethey are substantially lower for female patho-logical voices in comparison to male patholog-ical voices. Female speakers with a compen-sated adduction deficiency are perceived assimilar to normal female speakers. This is notthe case for male speakers.

We also found some interrater differences.For rater MJ, whose roughness scores for malevoices did not correlate with those of the otherraters, no distinctions are made between malespeaker groups on the basis of this property

Correlates of Adduction Deficiency Folia Phoniatr Logop 2004;56:305–320 311

Fig. 1. Mean roughness, breathiness and hoarseness for three speaker groups for male andfemale speakers separately.

and, despite significant (low to moderate) cor-relations, his roughness scores for femalevoices do not distinguish speaker groups ei-ther. And although rater BB’s breathinessscores for female speakers did not correlatewith those of the other raters, he made exactlythe same group distinctions as all other raters.This shows that interrater reliability does notcompletely determine the group distinctions.

Relationship between Perceptual ScoresIntrarater correlations (Spearman’s rho)

between the ratings of roughness, breathinessand hoarseness (table 3) show that only raterBB scores roughness, breathiness and hoarse-ness as unrelated (or only weakly related) per-cepts. For all other raters, hoarseness mostlycorrelates strongly with breathiness, especial-ly for male voices. In addition, raters IP andPM’s hoarseness scores also correlate stronglywith their roughness scores. For these 2 raterswe also find the highest correlations betweenroughness and breathiness, so that they judgethe three percepts less as distinctive qualities

than the other raters do. Breathiness androughness are judged as separate percepts bymost other raters, as shown by the mostly lowor non-significant correlations between them(especially for female voices).

In summary, although there is a clear gen-eral trend in the correlations between rough-ness, breathiness and hoarseness scores, wecan also observe interrater differences.

Acoustic Distinguishability of the ThreeSpeaker GroupsThe group distinctions on the basis of

acoustic parameters correspond closely tothose found in the perception part of thisstudy. Many of the parameters derived fromthe acoustic signals show a significant maineffect for the three speaker groups, both formale and female speakers (which were againanalysed separately). Post hoc tests (Tukey’shonestly significant differences) show that thesame parameter often makes a different dis-tinction between the speaker groups for malecompared to female speakers, as shown in

312 Folia Phoniatr Logop 2004;56:305–320 Koreman/Pützer/Just

Table 3. Spearman’s correlations between roughnessand breathiness (R ! B), roughness and hoarseness(R ! H) and breathiness and hoarseness (B ! H) for8 raters and for male and female voices

R ! B R ! H B ! H

Male voicesMJ NS NS 0.772BA 0.450 0.277 0.682IP 0.648 0.847 0.828ST 0.328 0.458 0.807MP 0.432 0.522 0.840BB 0.398 0.205 0.381JK 0.239 0.372 0.900PM 0.695 0.861 0.900

Female voicesMJ NS 0.343 0.564BA NS NS 0.489IP 0.357 0.762 0.643ST 0.163 0.252 0.760MP 0.210 0.397 0.782BB NS 0.258 0.260JK NS NS 0.871PM 0.521 0.807 0.837

p ! 0.05; NS = not significant.

table 4. Some parameters distinguish all threespeaker groups (code: 1–2–3), while othersonly distinguish some of the groups, e.g. thespeakers with an uncompensated adductiondeficiency (group 3) from the other two speak-er groups (code: 1, 2–3). All significant param-eters (p ! 0.05) are listed according to thespeaker group distinctions they make. A largemajority of the parameters do not distinguishbetween normal speakers and speakers with acompensated vocal fold paralysis.

There are evident differences betweenmale and female speakers, however. In partic-ular, none of the parameters derived from theacoustic signal distinguish normal femalespeakers from female speakers with a com-pensated adduction deficiency. Four out of

7 Frequency Perturbation measures (JITA,JITT, RAP, PPQ [20]) significantly distin-guish all three male speaker groups, while inthe case of female speakers these parametersonly distinguish speakers who do not compen-sate (group 3) from the other two groups. Theother Frequency Perturbation parameters(SPPQ and VF0) distinguish patients with anuncompensated adduction deficiency fromthe two other speaker groups, for male as wellas female speakers.

From the Amplitude Perturbation mea-sures, only SHDB and SHIM distinguish allthree male speaker groups. For female speak-ers these parameters only distinguish patientswith an uncompensated adduction deficiencyfrom other speakers, as do APQ and SAPQ(also for male speakers).

The parameters for Tremor, SubharmonicMeasurements, Spectral Energy and VoiceBreaks mainly distinguish patients with anuncompensated adduction deficiency fromnormal speakers, and sometimes also fromspeakers with a compensated adduction defi-ciency.

Most Fundamental Frequency parametersare significantly different for female speakerswith an uncompensated adduction deficiencyfrom those for other female speakers. Theclearest tendency is found in the parametersreflecting variation in F0 (STD and PFR),which is at least 2 or 3 times greater for femalespeakers with an uncompensated adductiondeficiency than for other female speakers.

In summary, in the acoustic measures, asin the perceptual scores, there is a tendencyfor the three-way distinction between malespeakers to be reduced to a two-way dis-tinction between female speakers, wherenormal speakers and speakers with a compen-sated adduction deficiency cannot be distin-guished.

Correlates of Adduction Deficiency Folia Phoniatr Logop 2004;56:305–320 313

Table 4. Significant speaker group distinctions (indicated by a dash) for male (M) and female (F) speakers on thebasis of MDVP parameters

Category 1–2–3 1–2, 3 1, 2–3 1, 3–2 1–2 1–3 2–3

FundamentalFrequency

PFR (M) T0 (F)FLO (F)STD (M + F)PFR (F)

F0 (M) T0 (M)FLO (M)

F0 (F)

FrequencyPerturbation

JITA (M)JITT (M)RAP (M)PPQ (M)

JITA (F)JITT (F)RAP (F)PPQ (F)SPPQ (M + F)VF0 (M + F)

AmplitudePerturbation

SHDB (M)SHIM (M)

APQ (M) SHDB (F)SHIM (F)APQ (F)SAPQ (F)

SAPQ (M)VAM (M)

VAM (F)

Tremor FTRI (M) FTRI (F) FFTR (M) FATR (M)

SubharmonicMeasurements

DSH (M) NSH (M)

Spectral energy NHR (F) NHR (M)SPI (M)VTI (M)

Voice breaks DVB (F)DUV (F)DUV (M + F)NVB (F)

DVB (M)NVB (M)

NUV (M)

p ! 0.05.

Relationship between Perceptual Scoresand Acoustic MeasurementsGiven the strong correspondence between

the perception and production studies interms of the observed group distinctions, thequestion of course arises whether we can re-late the two (perceptual scores and acousticcharacteristics).

The perception study brought to light thatthe raters made virtually the same group dis-tinctions despite on average only moderateinterrater agreement. Therefore, the questionwe have to ask is whether the raters basedtheir RBH scores on the same acoustic signalproperties. In order to investigate this, two

approaches were compared. First, averageRBH scores were computed across the 8 rat-ers’ scores (R mean, B mean, H mean). Thisseems warranted, because each rater madealmost the same group distinctions. However,since at the same time the correlations be-tween the raters’ RBH scores were generallyonly moderate, the acoustic basis for the per-ceptual scores was also analysed separately foreach individual rater.

The predictive power of the acoustic pa-rameters derived by MDVP for the RBHscores was computed by means of linear re-gressions. Because many of the acoustic pa-rameters in our study correlate strongly, the

314 Folia Phoniatr Logop 2004;56:305–320 Koreman/Pützer/Just

data were analysed for collinearity. The mod-el with the greatest explained variance of theRBH score was chosen as a basis for interpre-tation, unless conservative threshold valuesfor tolerance and condition index were ex-ceeded (tolerance !0.1; condition index 110).Although the explained variance of the RBHscore in the chosen model is not as high as itcould be if the final model were always cho-sen, the selection of uncorrelated acoustic pa-rameters has the advantage that the predic-tors’ beta values can be interpreted physiolog-ically.

RoughnessFor the average roughness scores of male

voices (R mean), three parameters explain52% of the variance. The main predictor isAmplitude Perturbation, represented by thevariable SHIM. DUV, a Voice Breaks param-eter related to unvoiced segments, furthersupports the importance of Amplitude Pertur-bation for the perception of roughness, sincethis parameter indicates signal sections oflow-amplitude voicing in which the MDVPprogram was not able to detect phonation (asmentioned before, manual checks confirmedthat all vowels were voiced throughout). Thethird predictor for roughness is SPI, a SpectralEnergy parameter related to spectral tilt, but itis only relevant for 1 rater – and even for thisrater, its importance for the prediction ofroughness (beta value) is much lower thanthat of SHIM. In the analyses for individualraters, Amplitude Perturbation parameterswere the main predictors of the roughnessscores for 6 out of 8 raters. For the remaining2 raters, the roughness scores were mainlypredicted by DUV, which, as we explainedabove, is also physiologically related to ampli-tude perturbations. For these 2 raters, SHIMis an important predictor, too. The depen-dence of the perception of roughness in malevoices on amplitude perturbations in the sig-

nal is therefore a consistent pattern across allraters.

For female voices, the explained varianceis substantially higher than for male voices(73%). JITA, DSH and DVB are the mainpredictors of average roughness. DSH reflectsthe presence of subharmonic componentscaused by diplophonia [23] or vocal fry, sothat it underlines the importance of Frequen-cy Perturbation (JITA) for the perception ofroughness in female voices. DVB is the thirdpredictor of roughness for female voices. LikeDUV for male voices, it reflects sections oflow-amplitude voicing in the signal. TheSpectral Energy parameter SPI plays a minorrole in the perception of roughness in femalespeakers: rougher voices have somewhat lessspectral tilt. The acoustic parameters whichpredict the perception of roughness are ob-viously quite different from those predictingroughness in male voices. For the individualraters, Frequency Perturbation is the mainpredictor for 5 out of 8 raters’ perception ofroughness in female voices (JITA and/or therelated Subharmonic Measurements parame-ter DSH). DVB, reflecting low signal ampli-tude, plays a role for 4 out of 8 raters; forrater MP it is the main predictor of roughness.Spectral Energy parameters play more of arole in the judgement of female than of malevoices; for rater MJ it is the main predictor ofroughness in female voices (two Spectral En-ergy parameters indicating friction in the sig-nal predict rater MJ’s roughness scores: firstpredictor: NHR = noise-to-harmonic ratio;third predictor: VTI = voice turbulence index[20]).

BreathinessThe average perception of breathiness (B

mean) in male voices mainly depends on theFrequency Perturbations parameter PPQ(r2 = 57%). Looking at individual rating be-haviour, Frequency Perturbation (PPQ, VF0)

Correlates of Adduction Deficiency Folia Phoniatr Logop 2004;56:305–320 315

is the main predictor of breathiness for 5 rat-ers. Amplitude Perturbation parameters arethe main predictor of breathiness for 2 raters,but also play a role for 3 other raters. SpectralEnergy parameters, indicating spectral tilt(SPI, main predictor of breathiness for 1 rat-er) or the presence of friction in the signal(VTI), play a role for 4 raters. Frequency Per-turbation parameters are important predic-tors for all but 1 and clearly play a prominentrole.

For female speakers, 60% of the variancein the average breathiness scores is explainedmainly by the Frequency Perturbation param-eter PPQ (and PFR), but Spectral Energy(SPI) and Voice Breaks (NVB), indicating lowamplitude of the signal, are also relevant forpredicting breathiness scores. Amplitude Per-turbations do not play any role here. Individu-ally, for each of the raters the main predictorof breathiness in female voices is a Frequen-cy Perturbation parameter (while for malevoices, this was only the case for 5 out of8 raters). Spectral Energy predicts breathinessfor 6 out of 8 raters, while Voice Breaks do sofor half of them.

HoarsenessThe predictors for the average hoarseness

scores (H mean) in male voices (explaining67% of the variance) combine those impor-tant for the perception of roughness and thoseimportant for breathiness, although Frequen-cy Perturbation (VF0) and Spectral Energy(SPI) play a minor role compared to Ampli-tude Perturbation (SHIM) and Voice Breaks(DUV) related to stretches of low amplitudein the signal. Three of the predictors for Hmean are the same as for R mean, pointing ata relationship between the two. Individualraters differ in this respect, though.

For female voices, the main predictors foraverage hoarseness are identical with those ofbreathiness. Frequency Perturbation (PPQ),

Voice Breaks related to signal portions of lowamplitude (NVB) and Spectral Energy (SPI =spectral tilt) explain 72% of the variance inthe hoarseness scores. For individual ratersFrequency Perturbation is the most impor-tant predictor of hoarseness, but Voice Breaksand Spectral Energy also predict hoarseness.

As should be expected, the acoustic predic-tors for hoarseness can be related to a largeextent to the intrarater correlations whichwere observed between the RBH scores (ta-ble 3). For raters with a high correlation be-tween hoarseness and breathiness, acousticpredictors for breathiness are also importantpredictors for hoarseness – this is the mostgeneral tendency observable in our data. Forraters whose hoarseness scores correlatestrongly with roughness, these two perceptshave high beta values for similar acousticparameters. It must be pointed out, though,that the observed tendencies are weak.

Discussion

Interrater AgreementThe distinctions which raters make be-

tween two pathological and one normalspeaker group on the basis of RBH scores forsustained vowels are identical to those madeby acoustic parameters derived from thesevowels. In clinical practice, RBH scores cantherefore be a useful tool for voice evaluation.Moreover, individual raters make almostidentical group distinctions, even if the corre-lations between their RBH scores are onlymoderate on average. This seems to indicatethat interrater correlations are not the mostappropriate tool for evaluating perceptual im-pressions, as we shall argue below.

Although we find only moderate correla-tions between the raters, our conclusion dif-fers from that of Kreiman and Gerratt [15,pp. 1792–1793], who conclude from the large

316 Folia Phoniatr Logop 2004;56:305–320 Koreman/Pützer/Just

interrater differences in a multidimensionalscaling experiment that ‘observed differencesamong listeners in perceptual strategy were sogreat that the fundamental assumption of acommon perceptual space for pathologicalvoice quality must be questioned’. It must benoted that the dissimilarity ratings in theirstudy were not necessarily based on voicecharacteristics like roughness, breathinessand hoarseness alone. As Kreiman and Ger-ratt [24] themselves pointed out, ‘instead [ofmulti-dimensional scaling], it may be ade-quate to focus quality assessment on a limitednumber of clinically significant perceptual di-mensions’. This is the case in Nawka et al. [3],who found strong correlations between theaverage RBH scores of three listener groups.Besides differences in the experiment design,there is also an important difference in theanalysis compared to our experiment. Usingaverages [in ref. 3] has a double effect: besidescancelling out some of the noise in the data, itturns ordinal RBH scores into metrical data.Ordinal ratings of percepts on a four-pointscale may be too insensitive to bring similari-ties between raters to light or, more likely, dueto the large number of ties the statistics usedfor ordinal ratings may be too insensitive.This is why Dejonckere et al. [2] asked their2 raters to use (continuous) visual analoguescales for their ratings. They found high corre-lations of 0.87 for grade (hoarseness), 0.70for roughness and 0.69 for breathiness. Topresent our data in a more positive light, wecan point out that similarly high correlationsdo exist between the (ordinal) RBH scores forsome of our raters. Together with the consis-tent group distinctions which we found in ourstudy, these findings therefore seem to indi-cate that most listeners are able to scoreroughness, breathiness and hoarseness fairlyreliably, although our interrater correlations(table 1a, b) also show a few exceptions to thisrule.

Perceptual and Acoustic Differencesbetween Male and Female VoicesAs the results from both the perception and

the production tasks show, female speakerswith a compensated adduction deficiencycannot be distinguished from normal speak-ers, although both groups differ from speakerswith an uncompensated adduction deficien-cy. For male speakers, a three-way distinctioncan be made between the three speakergroups. The reason for this is probably that, incomparison to female speakers, ‘males tend tohave a more complete glottal closure, leadingto less energy loss at the glottis and less spec-tral tilt’ [7]. This may make listeners moretolerant to breathiness, roughness and hoarse-ness in female voices, so that less extremeRBH values are not necessarily considered astrong indication of a pathological voice. Thisshows not surprisingly (even if this is far fromcommon practice) how important it is tomake a distinction between male and femalespeakers in experimental voice pathologystudies.

Moreover, the results from the perceptionexperiment indicate that speech therapy willhave no effect on the perception of roughness,breathiness and hoarseness for female speak-ers with a compensated unilateral vocal foldparalysis, since they are indistinguishablefrom normal speakers anyway, at least in ourdata. This obviously does not imply that fe-male speakers with a compensated vocal foldparalysis do not need treatment – we fullyagree with Gerratt and Kreiman [1] as quotedin the introduction that the success of treat-ment depends on the laryngeal anatomy andphysiology of the speaker. Since this is stillclearly deviant from normal, it may lead tofurther voice dysfunction if the patients donot receive adequate treatment. Voice thera-py is therefore important to prevent compen-satory hyperfunctional voicing. Expert sing-ing lessons have been suggested to enhance

Correlates of Adduction Deficiency Folia Phoniatr Logop 2004;56:305–320 317

improvement of the voice [13]. Also, electro-therapy of the muscles that are not innervatedhas been suggested as a means of preventingatrophy, a prerequisite for achieving normalphonation if and when innervation of theparalysed vocal fold returns [14]. It is impor-tant to realise that patients with a compen-sated unilateral vocal fold paralysis possiblydo not perceive themselves as pathologicaland may therefore have their doubts about thenecessity and usefulness of treatment. Sincethe patient’s perception of her voice possiblycannot be relied on to guide and motivate her,support of the treatment for instance by vid-eostroboscopy of the vocal folds becomeseven more important.

Perceptual Trading Relations: RBH andTheir Acoustic CorrelatesIntrarater correlations between RBH

scores show that in general, roughness andbreathiness scores are independent, whilehoarseness scores mainly depend on per-ceived breathiness, although we also observeclear differences between the raters in theinterdependence between the percepts (ta-ble 3).

The low intrarater correlations betweenroughness and breathiness form an indicationthat the two percepts are based on differentacoustic characteristics. But when we look atthe results from linear regression analyses inwhich the influence of the acoustic parame-ters on the RBH scores is quantified, we findconsiderable overlap between the acousticpredictors of these percepts. The picture isfurther complicated by a dependency on thespeaker’s sex as well as on the individual rat-er. But at the same time, the role of sometimes(seemingly) contradictory acoustic propertieswhich have been highlighted in the literatureis confirmed by our results.

The fairly clear dependence of hoarsenesson breathiness in our data is not supported by

equally clear results from linear our regres-sions analyses: the acoustic predictors for thehoarseness scores are quite varied and diffi-cult to interpret. The description of hoarse-ness as a combination of breathiness androughness which is sometimes found in the lit-erature is supported for some raters in ourexperiment, but not for others.

RoughnessMillet and Dejonckere [9] define rough-

ness as ‘the voice quality related to [the]impression of irregular glottal pulses’. Wolfeet al. [4] found that frequency perturbationmeasures are more important in the evalua-tion of roughness than amplitude perturba-tion measures, while they quote other findingsshowing that the noise-to-harmonic ratio isthe main predictor of roughness [25, 26]. Therather diverse acoustic properties which havebeen related to the perception of roughnessare an indication of the possible presence ofperceptual trading relations. In our data, Am-plitude Perturbation was the main predictorof roughness in male voices, while for femalevoices Frequency Perturbation was more im-portant. If we look at the correlations betweenthese parameters, we find they are often high.Particularly, the main predictor for roughnessin male voices, SHIM, and the main predictorin female voices, JITA, show a correlation ofr = 0.79 for male and r = 0.78 for femalevoices. Seemingly different perceptual strate-gies can in this case at least partly be ex-plained by the high correlation2 betweendifferent acoustic dimensions, allowing for

2 For each linear regression carried out, collinearity waschecked and models with correlated predictors were discarded.For different linear regressions, this can lead to the selection ofdifferent, but correlated parameters as predictors.

318 Folia Phoniatr Logop 2004;56:305–320 Koreman/Pützer/Just

slightly varying trading relations betweenacoustic parameters underlying the roughnesspercept. At first sight fundamentally differentsignal characteristics turn out to be related,since the presence of Frequency Perturbationsoften also implies the presence of AmplitudePerturbations.

BreathinessWith regard to breathy voice, Laver [27]

states that ‘by comparison with modal voice,the mode of vibration of the vocal folds isinefficient, and is accompanied by slight audi-ble friction’. The presence of friction in brea-thy voices is supported by Wendler et al. [6],who define breathiness as ‘friction due to tur-bulence in the unmodulated airflow’. The roleof audible friction for predicting breathinessis also supported by our data, although Spec-tral Energy parameters (especially NHR andVTI) do not predict the breathiness scores forall raters. In fact, the breathiness scores aremainly predicted by Frequency Perturbationmeasures (particularly PPQ) [9]. Although theterm ‘inefficient’ in the quotation from Laver[27] above is probably not meant to indicateunstable vocal fold vibration and probablyrefers to the large volume of the unmodulatedairflow, a lack of adduction leads to poorerphonation conditions, which can also lead toirregular vocal fold vibration reflected instronger frequency perturbations. The twocomponents Frequency Perturbations andSpectral Energy, which predict breathiness inboth male and female speakers, are relativelyindependent of each other (correlations be-tween them are low or non-significant), sothat they constitute two complementary di-mensions which predict the perception ofbreathiness. The evidence for perceptual trad-ing relations between Spectral Energy andFrequency Perturbation parameters is fairlyweak: for only 1 rater SPI (a Spectral Energyparameter indicating spectral tilt) is the main

predictor for his breathiness judgements formale voices; in all other cases a FrequencyPerturbation parameter (and in 2 the Ampli-tude Perturbation parameter SHIM) is themain predictor for breathiness, while SpectralEnergy plays a complementary role in predict-ing breathiness for a large subset of the ratersfor both male and female voices.

The fact that mostly the Frequency Pertur-bation parameter PPQ is the best predictorfor breathiness scores for male voices, whilefor 2 raters the Amplitude Perturbation pa-rameter SHIM is the main predictor, can beexplained at least in part by slightly varyingtrading relations, given the high correlation ofr = 0.80 between SHIM and PPQ (cf. SHIMand JITA as predictors for hoarseness in maleand female voices, above).

HoarsenessWendler et al. [6] define hoarseness as a

combination of breathiness and roughness,with an ‘unmodulated airflow ... due to in-complete or absent glottal closure (voice brea-thy), or ... irregularities in the phonation con-cerning Frequency, Amplitude or Phase rela-tions’. The relationship of hoarseness withroughness and especially breathiness is sup-ported by our data, as discussed above, but itis certainly not clear-cut, with differences inthe evaluation of hoarseness for male andfemale voices in general as well as for individ-ual raters.

For male and female voices, hoarseness isrelated to Frequency Perturbation in the vow-els. Among the male voices, however, differ-ences can be observed between the raters: Forsome of them, the main predictors for hoarse-ness are the same as those for roughness (par-ticularly Amplitude Perturbation parame-ters), while for others they are the same asthose for breathiness (particularly FrequencyPerturbation parameters).

Correlates of Adduction Deficiency Folia Phoniatr Logop 2004;56:305–320 319

The fairly clear picture which emergesfrom the intrarater correlations between thethree percepts is therefore only partly main-tained when we try to relate the RBH scores toacoustic parameters. Part of the reason forthis lies in the high correlations between Fre-quency Perturbation and Amplitude Pertur-bation parameters, both of which are causedby the poor phonation conditions in patientswith a unilateral vocal fold paralysis, leadingto a general instability in the vocal fold vibra-tions. It remains an open question whetherthe correspondences between RBH scores onthe one hand and acoustic parameters reflect-ing frequency versus amplitude perturbationon the other are more differentiated for othervoice pathologies or whether these are intrin-sically connected through a common factor ofunstable vocal fold vibration.

Summary

This study confirms that RBH scores are useful inclinical practice. It also stresses the importance of dis-tinguishing between male and female voices in percep-tion and production studies, which is demonstrated bythe fact that, unlike for male speakers, female speakerswith a compensated vocal fold paralysis cannot be dis-tinguished from normal speakers. Finally, it is shownthat the raters sometimes differ in the acoustic param-eters which predict the RBH scores. In some cases,these differences indicate different perceptual strate-gies, while in others they indicate the existence of trad-ing relations.

Acknowledgements

The authors thank Bistra Andreeva, Bill Barry,Rosemarie MorganBarry and Markus Pospeschill fordiscussion and comments on an earlier version of thispaper.

References

1 Gerratt BR, Kreiman J: Theoreticaland methodological development inthe study of pathological voice qual-ity. J Phonet 2000;28:335–342.

2 Dejonckere PH, Remacle M, Fres-nel-Elbaz E, Woisard V, Crevier-Buchman L, Millet B: Differentiatedperceptual evaluation of pathologi-cal voice quality: Reliability andcorrelations with acoustic measure-ments. Rev Laryngol Otol Rhinol1996;117:219–224.

3 Nawka T, Anders C, Wendler J:Die auditive Beurteilung heisererStimmen nach dem RBH-System.Sprache Stimme Gehör 1994;18:130–133.

4 Wolfe V, Fitch J, David M: Acousticmeasures of dysphonic severityacross and within voice types. FoliaPhoniatr Logop 1997;49:292–299.

5 Hirano M, Hibi S, Terasawa R, Fu-jiu M: Relationship between aerody-namic, vibratory, acoustic and psy-choacoustic correlates in dysphonia.J Phonet 1986;14:445–456.

6 Wendler J, Seidner W, Kittel G,Eysholdt U: Lehrbuch der Phonia-trie und Pädaudiologie. Stuttgart,Thieme, 1996.

7 Hanson HM, Chuang ES: Glottalcharacteristics of male speakers:Acoustic correlates and comparisonwith female data. J Acoust Soc Am1999;106:1064–1077.

8 Dejonckere PH, Remacle M, Fres-nel-Elbaz E, Woisard V, Crevier L:Reliability and clinical relevance ofperceptual evaluation of pathologi-cal voices. Rev Laryngol Otol Rhi-nol 1998;119:247–248.

9 Millet B, Dejonckere PH: What de-termines the differences in percep-tual rating of dysphonia between ex-perienced raters? Folia Phoniatr Lo-gop 1998;50:305–310.

10 Koreman J, Pützer M: Finding cor-relates of vocal fold adduction defi-ciencies; in Barry WJ, Koreman J(eds): Phonus. Saarbrücken, Institutfür Phonetik, Universität des Saar-landes, 1997, vol 3, pp 155–178.

11 Pützer M, Koreman J: A Germandatabase of patterns of pathologicalvocal fold vibration; in Barry WJ,Koreman J (eds): Phonus. Saar-brücken, Institut für Phonetik, Uni-versität des Saarlandes, 1997, vol 3,pp 143–153.

12 Hirano M, Mori, K: Vocal fold pa-ralysis; in Kent RD, Ball MJ (eds):Voice Quality Measurement. SanDiego, Singular Publishing Group,2000, pp 385–395.

13 Sataloff R: Professional Voice: TheScience and Art of Clinical Care.San Diego, Singular PublishingGroup,1997.

14 Wirth G: Stimmstörungen. Köln,Deutscher Ärzteverlag Köln, 1995.

15 Kreiman J, Gerratt BR: The percep-tual structure of pathologic voicequality. J Acoust Soc Am 1996;100:1787–1795.

16 Nawka T, Anders LCh: Die auditiveBewertung heiserer Stimmen nachdem RBH-System (Doppel-Audio-CD mit Stimmbeispielen). Stuttgart,Thieme, 1996.

320 Folia Phoniatr Logop 2004;56:305–320 Koreman/Pützer/Just

17 Fairbanks G: Voice and Articula-tion Drillbook. New York, Harper &Brothers, 1940.

18 Titze IR, Winholtz WS: Effect of mi-crophone type and placement onvoice perturbation measurements. JSpeech Hear Res 1993;36:1177–1190.

19 Klingholz F: Jitter. Sprache StimmeGehör 1991;15:79–85.

20 Operations Manual ‘Multi-Dimen-sional Voice Program (MDVP),Model 4305’. Pine Brook, Kay Ele-metrics Corp, 1993.

21 Koreman J, Pützer M: The usabilityof perceptual ratings of voice quali-ty. Proc 6th Int Conf on Adv inQuantitative Laryngol, Voice andSpeech Research (AQL), Hamburg,2003.

22 Kreiman J, Gerratt BR: Sources oflistener disagreement in voice quali-ty assessment. J Acoust Soc Am2000;108:1867–1876.

23 Kreiman J, Gerratt BR, Precoda K,Berke GS: Perception of supraper-iodic voices. Proc 125th ASA Meet-ing, Ottawa, 1993.

24 Kreiman J, Gerratt BR: Measuringvocal quality; in Kent RD, Ball MJ(eds): Voice Quality Measurement.San Diego, Singular PublishingGroup, 2000, pp 73–101.

25 Martin D, Fitch J, Wolfe V: Patho-logic voice type and the acoustic pre-diction of severity. J Speech HearRes 1995;38:765–771.

26 Eskenazi L, Childers DG, HicksDM: Acoustic correlates of vocalquality. J Speech Hear Res 1990;33:298–306.

27 Laver J: The Phonetic Descriptionof Voice Quality. Cambridge, Cam-bridge University Press, 1980.