Interactions between auditory and somatosensory feedback for voice F 0 control

9
Exp Brain Res (2008) 187:613–621 DOI 10.1007/s00221-008-1330-z 123 RESEARCH ARTICLE Interactions between auditory and somatosensory feedback for voice F 0 control Charles R. Larson · Kenneth W. Altman · Hanjun Liu · Timothy C. Hain Received: 21 September 2007 / Accepted: 18 February 2008 / Published online: 14 March 2008 Springer-Verlag 2008 Abstract Previous studies have demonstrated the impor- tance of both kinesthetic and auditory feedback for control of voice fundamental frequency (F 0 ). In the present study, a possible interaction between auditory feedback and kines- thetic feedback for control of voice F 0 was tested by admin- istering local anesthetic to the vocal folds in the presence of perturbations in voice pitch feedback. Responses to pitch- shifted voice feedback were larger when the vocal fold mucosa was anesthetized than during normal kinesthesia. A mathematical model incorporating a linear combination of kinesthesia and pitch feedback simulated the main aspects of our experimental results. This model indicates that a feasible explanation for the increase in response magnitude with vocal fold anesthesia is that the vocal motor system uses both pitch and kinesthesia to stabilize voice F 0 shortly after a perturbation of voice pitch feedback has been perceived. Keywords Vocalization · Auditory feedback · Larynx · Somatosensory · Anesthetization Introduction The role of sensory feedback for the control of voice funda- mental frequency (F 0 ) has been extensively studied for many years. Somatosensory receptors located in and near the larynx provide the CNS with information on the fre- quency of vocal fold vibration, status of the vocal folds related to quality and eVort of voice production, muscle contraction, joint movements and voice intensity (Eyzagu- irre et al. 1966; Gozaine and Clark 2005; Shiba et al. 1999; Wyke 1974, 1983). Stimulation of the superior laryngeal nerve, laryngeal mucosa or cartilages results in reXexive contraction of laryngeal muscles (Andreatta et al. 2002; Eyzaguirre et al. 1966; Kirchner and Suzuki 1968; Ludlow et al. 1992; Sasaki and Masafumi 1976; Suzuki and Sasaki 1977; Wyke 1974, 1983). Although the functional signiW- cance of these reXexes remain uncertain, some of them may be important for control of voice F 0 . Support for the role of laryngeal somatosensory recep- tors in the control of voice F 0 has been provided through anesthetization procedures in humans. Application of local anesthetic to the vocal folds or superior laryngeal nerve leads to an increase in voice perturbation measures, a decrease in Wne control of voice F 0 and a deterioration in rapid adjustments of F 0 in a tone-matching task (Leonard and Ringel 1979; Sorensen et al. 1980; Sundberg et al. 1993; Tanabe et al. 1975). It is unlikely that the application of the anesthetic agent to the vocal fold mucosa could aVect the muscles since there are no such documented eVects of which we are aware. Although these Wndings suggest that laryngeal mucosal receptors are involved in voice control, the subjects were still able to perform the vocal tasks, sug- gesting other sensory receptors or auditory feedback is involved in the control of voice F 0 . C. R. Larson (&) · H. Liu Department of Communication Sciences and Disorders, Northwestern University, 2240 Campus Drive, Evanston, IL 60208, USA e-mail: [email protected] K. W. Altman Department of Otolaryngology, Mount Sinai School of Medicine, One Gustave L. Levy Pl., Box 1189, New York, NY 10029, USA T. C. Hain Departments of Neurology, Otolaryngology, and Physical Therapy/Human Movement Sciences, Northwestern University, 645 N. Michigan, Suite 1100, Chicago, IL 60611, USA

Transcript of Interactions between auditory and somatosensory feedback for voice F 0 control

Exp Brain Res (2008) 187:613–621DOI 10.1007/s00221-008-1330-z

123

RESEARCH ARTICLE

Interactions between auditory and somatosensory feedback for voice F0 control

Charles R. Larson · Kenneth W. Altman · Hanjun Liu · Timothy C. Hain

Received: 21 September 2007 / Accepted: 18 February 2008 / Published online: 14 March 2008! Springer-Verlag 2008

Abstract Previous studies have demonstrated the impor-tance of both kinesthetic and auditory feedback for controlof voice fundamental frequency (F0). In the present study, apossible interaction between auditory feedback and kines-thetic feedback for control of voice F0 was tested by admin-istering local anesthetic to the vocal folds in the presence ofperturbations in voice pitch feedback. Responses to pitch-shifted voice feedback were larger when the vocal foldmucosa was anesthetized than during normal kinesthesia. Amathematical model incorporating a linear combination ofkinesthesia and pitch feedback simulated the main aspects ofour experimental results. This model indicates that a feasibleexplanation for the increase in response magnitude withvocal fold anesthesia is that the vocal motor system usesboth pitch and kinesthesia to stabilize voice F0 shortly after aperturbation of voice pitch feedback has been perceived.

Keywords Vocalization · Auditory feedback · Larynx · Somatosensory · Anesthetization

Introduction

The role of sensory feedback for the control of voice funda-mental frequency (F0) has been extensively studied formany years. Somatosensory receptors located in and nearthe larynx provide the CNS with information on the fre-quency of vocal fold vibration, status of the vocal foldsrelated to quality and eVort of voice production, musclecontraction, joint movements and voice intensity (Eyzagu-irre et al. 1966; Gozaine and Clark 2005; Shiba et al. 1999;Wyke 1974, 1983). Stimulation of the superior laryngealnerve, laryngeal mucosa or cartilages results in reXexivecontraction of laryngeal muscles (Andreatta et al. 2002;Eyzaguirre et al. 1966; Kirchner and Suzuki 1968; Ludlowet al. 1992; Sasaki and Masafumi 1976; Suzuki and Sasaki1977; Wyke 1974, 1983). Although the functional signiW-cance of these reXexes remain uncertain, some of them maybe important for control of voice F0.

Support for the role of laryngeal somatosensory recep-tors in the control of voice F0 has been provided throughanesthetization procedures in humans. Application of localanesthetic to the vocal folds or superior laryngeal nerveleads to an increase in voice perturbation measures, adecrease in Wne control of voice F0 and a deterioration inrapid adjustments of F0 in a tone-matching task (Leonardand Ringel 1979; Sorensen et al. 1980; Sundberg et al.1993; Tanabe et al. 1975). It is unlikely that the applicationof the anesthetic agent to the vocal fold mucosa could aVectthe muscles since there are no such documented eVects ofwhich we are aware. Although these Wndings suggest thatlaryngeal mucosal receptors are involved in voice control,the subjects were still able to perform the vocal tasks, sug-gesting other sensory receptors or auditory feedback isinvolved in the control of voice F0.

C. R. Larson (&) · H. LiuDepartment of Communication Sciences and Disorders, Northwestern University, 2240 Campus Drive, Evanston, IL 60208, USAe-mail: [email protected]

K. W. AltmanDepartment of Otolaryngology, Mount Sinai School of Medicine, One Gustave L. Levy Pl., Box 1189, New York, NY 10029, USA

T. C. HainDepartments of Neurology, Otolaryngology, and Physical Therapy/Human Movement Sciences, Northwestern University, 645 N. Michigan, Suite 1100, Chicago, IL 60611, USA

614 Exp Brain Res (2008) 187:613–621

123

Many studies have documented a clear role for auditoryfeedback to control F0. Pre and postlingually deaf speakershave abnormal levels of F0 and variations in F0 (Binnieet al. 1982; Leder et al. 1987; Osberger and Hesketh 1988).Other studies demonstrated that masking of auditory feed-back led to a deterioration in Wne control of F0 and inaccu-racy in singing musical notes (Elliott and Niemoeller 1970;Mallard et al. 1978; Mürbe et al. 2002). Presentation ofsudden perturbations in voice auditory pitch feedback whilesubjects are speaking or sustaining a vowel sound havedemonstrated compensatory changes in voice F0 production(Burnett et al. 1998; Chen et al. 2007; Donath et al. 2002;Hain et al. 2000; Jones and Munhall 2002; Kawahara 1995;Larson et al. 2001; Natke and Kalveram 2001; Sapir et al.1983; Xu et al. 2004).

Given that both somatosensation and auditory feedbackact together to stabilize F0, the next logical questions thatarise are how relatively important are the two mechanisms,and how do they interact. In this regard, Mallard et al. 1978disrupted both kinesthesia and auditory feedback and mea-sured voice F0 while subjects attempted to match speciWctones and while they were reading sentences with speciWcintonation contours. In four experimental conditions: nor-mal/control condition, auditory masking, anesthetization ofthe vocal fold mucosa and masking combined with anesthe-sia; it was found that the greatest disruption occurred withmasking alone. It was concluded that when disruptionsoccur in both modalities at the same time, the subjects maychange their strategy and attend more closely to whateverminimal auditory cues are present.

While the explanation of Mallard and associates is feasi-ble, it is diYcult to conWrm, as changes in “strategy”—non-linear processing—are generally capable of explainingnearly any experimental result. We instead hypothesizedthat both auditory and kinesthetic feedback combine in asimple linear way (as in Fig. 7). This hypothesis leads toseveral straightforward predictions permitting experimentalveriWcation regarding interactions between audition andsomatosensation.

First consider the case of our previous work where voiceauditory pitch feedback was experimentally shifted whilekinesthesia was left intact (Burnett et al. 1998; Hain et al.2000; Larson et al. 2001; Xu et al. 2004). To the extent thatkinesthetic feedback contributes to stabilization of F0,experimental inferences that neglect kinesthetic feedbackshould result in an incorrectly low estimate of the couplingbetween auditory error and F0 error (Fgain and FTc inFig. 7). This occurs because we simulated and Wt theresponse to this external auditory perturbation by adjustingthe gain and time constant (Fgain and FTc in Fig. 7) relatingauditory-error and F0-error. Both a gain and time constantare necessary to Wt the response over time. In this paradigm,because kinesthesia is unaVected by the experimental

auditory perturbation, and because the experimental pertur-bation drives vocal F0 away from intended F0, kinestheticfeedback would recognize the perturbation as an error andwould oppose it. Thus kinesthetic feedback should reducethe response of laryngeal F0 to an external auditory pertur-bation.

Second, we considered the case that would occur ifpitch-shift stimuli were presented (F0 error) after kines-thetic feedback was removed. For this situation, responsesto the stimuli should be larger because there is a loss of thestabilizing inXuence of kinesthetic input. Furthermore, theFgain and FTc derived by Wtting to this condition should reX-ect the true coupling between auditory input and output, asthey are not contaminated by the opposing eVects of kines-thesia as in case 1.

The third and last case is the normal condition whereboth F0 and kinesthetic error change together. An exampleof this situation might be a change in F0 arising from amechanical change in the vocal cord tension associatedwith Xuctuations in muscle activity. In this situation,because auditory and kinesthetic feedback signals bothchange in the same direction, one would expect a largeroverall gain (the sum of Fgain and KFgain) and a larger com-pensatory F0 response than with either of the two experi-mental conditions outlined above. This condition is muchmore diYcult to test experimentally as it would require anendogenous stimulus, such as the sudden Xuctuation ofvoice F0 of a pubescent young man.

The present study was conducted to test the predictionmade in the second case above—that eliminating kinesthe-sia would be associated with a greater response to an exter-nal perturbation of auditory F0 than the response measuredwith kinesthesia intact. We measured responses to pitch-shifted voice feedback before and during temporary anes-thetization of the vocal fold mucosa. In this situation, asoutlined above, because auditory feedback and kinestheticfeedback should be opposed to each other, we hypothesizedthat removal of kinesthesia would result in strongerresponses to auditory perturbations. Results conWrmed ourhypothesis, and responses to pitch-shifted voice feedbackwere greater when the vocal folds were anesthetized com-pared to the non-anesthetized condition. We successfullysimulated this data by extending a previously postulatedmathematical model of auditory feedback control of F0 toinclude kinesthesia.

Methods

Nineteen normal, native English speaking adults (9 males,10 females, age 22–59) with no history of neurological orspeech disorders served as subjects. Data from two subjects(2 females) were later discarded because in one there were

Exp Brain Res (2008) 187:613–621 615

123

early signs of vocal nodules, and the other one had anabnormally rough voice. This study was approved by theNorthwestern University Institutional Review BoardHuman Subjects Committee.

Subjects were seated in a medical examination chairwith AKG boom-set headphones and attached microphone(model K 270 H/C) placed on their head. The microphonewas located 1 in from their lips. The microphone signal wasampliWed (Grace Design, model 101 mic ampliWer) andprocessed for pitch shifting through an Eventide EclipseHarmonizer. The pitch-shifted signal was ampliWed (PreS-onus HP4 ampliWer) to a gain of 10 dB greater than voiceamplitude and fed back to the subject over the headphones(Fig. 1). Acoustic calibrations were made with a B&K 2250sound level meter and model 4100 in-ear microphones. Alaboratory computer running MIDI software (Max/MSP;Cycling 74) was used to control the harmonizer. Micro-phone, headphone and control signals were low passWltered at 5 kHz and digitized (12 bit) at 10 kHz andrecorded on a second computer with Chart software (ADInstruments).

Subjects were Wrst instructed to produce 5-s vocaliza-tions of the /u/ vowel at a comfortable pitch and amplitudelevel. The exact F0 level diVered for each subject, but byhaving them vocalize at their customary conversationallevel, the relative contraction levels of laryngeal muscleswas relatively constant across individuals. Also, since dur-ing data analysis (see below), the F0 contours were con-verted to the cent scale, comparisons between diVerent F0levels in absolute frequency (i.e., Hz) can be made. Duringeach of eight vocalizations, the feedback signal to the sub-ject was shifted up or down Wve times in a randomized

sequence. The interval between successive stimuli within aset was randomized between 400 and 1,000 ms. Subjectspaused between each vocalization to take a breath. In theWrst set of vocalizations, the pitch-shift stimulus was100 cents (100 cents = 1 semitone), 200 ms duration and inthe second sequence, 50 cents, 200 ms duration. The stimu-lus output of the harmonizer was calibrated on the “cent”interval because this is a logarithmic scale related to a 12-tone musical scale and is not locked to speciWc frequencies.The 50 and 100-cent stimulus magnitudes were chosen toallow comparison with previous studies in our laboratory(Burnett et al. 1998).

After the Wrst sets of trials were completed, the subjects’vocal folds were anesthetized. For the Wrst two subjects,subjects were instructed to lean forward, tilt their headupward, open their mouth and stick out their tongue. Thephysician then held the tongue with gauze and inserted acurved tube into the oral pharynx aiming at the larynx andadministered a 2-s spray of a local anesthetic—14% benzo-caine, 2% butyl aminobenzoate, 2% tetracaine (TopicalCetacaine, Cetylite Industries Inc., Pennsauken, NJ, USA).Following this, the physician inserted a Wberoptic scopethrough the nose and by touching the vocal folds with theend of the scope without eliciting a cough, conWrmed thatthe vocal folds were anesthetized.

After testing the Wrst two subjects, we became concernedthat the application of the anesthetic by the spray techniquemay have been anesthetizing mucosa throughout the laryn-geal/pharyngeal region and not restricted to the vocal folds.We therefore switched to a new technique for the remainderof the subjects. These subjects had their vocal folds anes-thetized by spraying 2 cc of xylocaine 4% topical solution(Lidocaine, AstraZeneca Pharmaceuticals, Wilmington,DE, USA) directly onto the surface of the vocal folds. Thiswas done by Wrst inserting the Wberoptic scope through asleeve (Endosheath, Vision Sciences, Natick, MA, USA),which also contained a side-tube for the anesthetic agent.The subject’s nasal epithelium was Wrst sprayed with pon-tocaine to reduce discomfort from insertion of the Wberopticscope. A 3 cc syringe containing lidocaine was thenattached to the side-tube of the sleeve, and the scope withsleeve was gently inserted through the nose to the larynx.With the scope tip in position above the vocal folds, thesubject was instructed to produce the vowel /i/ at a highpitch. During this production, the physician applied 2 cc oflidocaine directly on to the upper surface of the vocal folds.After 30 s, the physician touched the vocal folds with theWberoptic scope and by failing to elicit a cough or othergagging type of response, conWrmed the anesthetization ofthe vocal folds in all subjects. Within 2 min of the anesthe-tization of the vocal folds, the sequence of pitch-shift test-ing was repeated. Four subjects were again tested 15 minafter anesthetization of the vocal folds. Scheduling conXicts

Fig. 1 Schematic illustration of apparatus used for the pitch-shiftingof voice auditory feedback

616 Exp Brain Res (2008) 187:613–621

123

prohibited testing of the other 14 subjects a third time. Twoof the subjects who had received the Wrst form of anesthesiadelivery were tested again with the second method.

Data were analyzed by transferring the digitized Wles to acomputer running Igor (Wavemetrics Inc.) software. Thevocal signal was then transferred to a program thatextracted pulses corresponding to each cycle of vocal foldvibration (Praat). The pulses were then transferred back tothe Igor program and converted to a waveform in whichvoltage corresponded to voice F0 measured in cents. Thiswaveform, an F0 contour, was then displayed on a computerscreen. Figure 2a provides a short sample of an F0 contourof the feedback to the subject and the subject’s vocal out-put. In this sample, one upward and three downward pitch-shift stimuli are seen in the feedback trace. For the purposeof obtaining the average F0 response across several stimu-lus presentations, F0 contours with a 200 ms pre- and500 ms post-stimulus window were time-aligned with astimulus pulse on the computer screen. Figure 2b illustratesa waterfall display of the F0 contours for each of the trialsfor one subject in one condition. Inspection of the water-falls was done prior to averaging the signals to eliminatetrials in which there was an unusually large deXectionrelated to an error in the extraction of the F0 contours or ifthere was some type of vocal interruption such as a coughduring one of the trials. Averaged F0 contours based on thetrials in the waterfall display, representing the F0 responseto the pitch-shifted voice feedback, were calculated foreach subject and for each of the separate conditions; 50 or100 cent pitch-shift in the up or down direction (Fig. 2c).

Following the averaging process, the mean and standarddeviation of the F0 contour during the pre-stimulus periodwas calculated. The criteria for deWning a response were achange in the F0 contour that exceeded 2 SDs of the pre-stimulus F0 that occurred at least 60 ms following the stim-ulus onset (response latency) and lasting at least 50 msbefore returning to a value within the 2 SD boundaries ofthe pre-stimulus F0 (see Fig. 2c). Response magnitude wasdeWned as the greatest value of the averaged F0 contour fol-lowing the response onset. Measures of response magnitudeand latency were taken from the averaged waveforms andsubmitted to signiWcance testing with a repeated measuresANOVA using SPSS (v. 11.0). Assumptions of compoundsymmetry and circularity for a repeated measures ANOVAwere met. We did not test for a diVerence between the 100and 50-cent stimuli because we always tested 100 cent Wrst,and a diVerence between them could be attributed to anorder eVect.

As a test of a possible eVect of changes in voice F0 levelbetween the pre- and post-anesthesia conditions onresponse magnitude, the average F0 level for all experimen-tal conditions was measured from an FFT of the vocalwaveform for the entire set of vocalizations from all thesubjects. A Pearson product correlation coeYcient was cal-culated between this diVerence in the average F0 level(average post-anesthesia F0 level—average pre-anesthesiaF0 level) and the pitch-shift response measures.

We extended a linear model of F0 control (Hain et al.2000) using auditory feedback to include kinesthetic input.This model topology is structured in a similar way to thespeech production model of Guenther et al. (2006), havingboth a feedforward and feedback pathway, and auditory andkinesthetic feedback. The model of Guenther and associatesis a neural network model and due to the array of neuralweights intrinsic to such constructs, it contains many non-linearities and free parameters corresponding to individualsimulated neurons. Our model contains only a small num-ber of parameters, but they are suYcient to simulate a widevariety of experimental paradigms involving perturbedauditory feedback. The feasibility of our extended modelwas established by simulating our data.

Results

All the subjects reacted to the pitch-shift perturbation witha change in voice F0. Of the 136 possible responses, 121were in the direction opposite to that of the stimulus andwere categorized as opposing responses. There were tenresponses in the same direction as the stimulus (“follow-ing” responses) and Wve that did not meet our criteria of aresponse (non-response). Four of the non-responses wereproduced under the anesthesia condition.

Fig. 2 a F0 contour (bottom trace) and auditory feedback (top trace).Pitch-shift perturbations are illustrated in feedback trace. b F0 contoursfor many individual trials. The pitch-shift stimulus for each trace (notshown) were aligned at time = 0, as shown on scale at bottom. c A sin-gle averaged F0 response. Horizontal dotted lines indicate §2SDs ofthe prestimulus mean averaged F0. Vertical dashed lines aYxed toaveraged F0 response indicate onset and oVset times of the response.Horizontal dashed line indicates response magnitude. Time and direc-tion of the pitch-shift stimulus is indicated by square trace at top

Exp Brain Res (2008) 187:613–621 617

123

Figures 3 and 4 display exemplar responses for one sub-ject in the control and anesthesia conditions for bothupwards and downwards stimulus directions for 50 (Fig. 3)and 100 cent (Fig. 4) perturbations. Although there aresome diVerences in the form of the responses, all are in theopposing direction and have latencies between 100 and200 ms. Figure 5 displays the averaged responses across allsubjects and conditions along with the 95 conWdence inter-vals (shaded). Average response curves in the “anesthetize”condition are slightly larger than in the “normal” condition.For the 50-cent stimuli, responses were less variable thanwith the 100-cent stimuli. Much of the variability forresponses with the 100-cent down stimuli is attributed tovariations in response magnitude and timing of theresponses (not shown). Figure 6 displays boxplots ofresponse magnitude for the normal and anesthetic condi-tions for both upwards and downwards stimuli (opposingresponses only). There was a signiWcant main eVect forresponse magnitude between the pre and post anestheticcondition [F(1,16) = 10.453, P = 0.005] but not for stimu-lus direction [F(1,16) = 0.024, P = 0.878]. The overallmean of response magnitude in the anesthetized condition(25 § 15 cents) was larger than for the non-anesthetizedcondition (20 § 17 cents). There was no signiWcant interac-tion between stimulus direction and anesthesia state[F(1,16) = 0.144, P = 0.709]. Response magnitudes in thefour subjects that were re-tested after anesthetization proce-dures, returned to the pre-anesthetic levels with the Wnaltesting.

There was a signiWcant main eVect for stimulus directionon response latency [F(1,16) = 20.249, P = 0.000], inwhich longer latencies were associated with the upwardstimuli (147 § 72 ms) compared to the downward stimuli(104 § 49 ms). There were no signiWcant eVects for theanesthetic condition [F(1,16) = 0.804, P = 0.383] onresponse latency or interactions between anesthetic condi-tion and stimulus direction [F(1,16) = 0.534, P = 0.476].

Fig. 3 Averaged voice F0 contours in normal condition (left) and withanesthesia (right) following 50 cent pitch-shift stimuli. Solid lines rep-resent experimental F0 responses. Dotted lines are simulations usingthe mathematical model of Fig. 7. Square brackets represent timingand direction of pitch-shifted feedback. Stimulus onset is at time 0.0

Fig. 4 Voice F0 contours in normal and anesthetic condition following100 cent pitch-shift stimuli. Figure is otherwise formatted as in Fig. 3

Fig. 5 Composite averages of all responses for all subjects and condi-tions; 50 cent up, 50 cent down, 100 cent up, 100 cent down, normaland anesthetize. Solid lines represent average F0 contour. Gray shad-ing represents §95% conWdence intervals of averaged traces. Verticalshaded rectangles represent time of the pitch-shift stimulus. Horizon-tal axis is time in seconds, and vertical axis is frequency in cents

618 Exp Brain Res (2008) 187:613–621

123

Results of the correlation analysis between the F0 leveldiVerences of the pre- and post-anesthesia conditions andthe mean reXex magnitudes yielded an r < 0.0001. Thus,there was no correlation between the overall voice F0 leveland the magnitude of the response to pitch-shifted feed-back.

In an attempt to provide a quantitative hypothesis, weextended a previously presented model of F0 control incor-porating auditory feedback (Hain et al. 2000), to includekinesthetic feedback. With our model we simulated our

data and showed feasibility for a simple linear feedbackimplementation.

In our previous model, we used an auditory negativefeedback loop, incorporating suitable delays to simulateauditory feedback data, which has a delay of about 100 ms.In this adaptation, shown in Fig. 7, we added a kinestheticfeedback loop. The kinesthetic feedback loop was designedas follows. We assumed that the organization was similar tothat of the auditory feedback loop, but that the gain anddelay might diVer. Although there are substantial diVer-ences in auditory and kinesthetic signal processing, never-theless in both instances, for feedback to be accomplishedan F0 error signal must be developed and used to change thedrive to F0. We chose the simplest possible model design—feedback through a delay and simply sought to establishfeasibility.

Our earlier auditory feedback model incorporated anauditory processing delay, a low pass Wlter, and a feedbackgain. In the extended model designed to model the presentdata set, we assumed that the Wlter was part of the outputpathway and shared between the auditory and kinestheticfeedback loops. The delay and gain of the kinesthetic path-way, KFdelay and KFgain in Fig. 7, were the unknowns thatwere identiWed.

Using the optimization package of Matlab (Nantick,MA, USA), we Wrst Wt our previous model of F0, withoutany kinesthetic input, to data collected under anesthesia.This model has only two free parameters, a gain (Fgain) andtime constant (FTc), as anesthesia eliminates the kinestheticfeedback loop. A gain of 0.273, and a time constant of

Fig. 6 Boxplots illustrating response magnitudes with and withoutanesthesia of the vocal folds. Hatched bars are for downward stimuliand open bars for upward stimuli

Fig. 7 Model of audio-vocal system incorporating auditory and kines-thetic feedback. “Desired F0” is converted through a “black box” rep-resenting the entire central vocal production system (summing junctionlabeled F0 drive) into two F0 signals. “Larynx F0” is the F0 perceivedthrough kinesthesia. “Auditory F0” is the F0 perceived by the auditorysystem. An eVerent copy of desired F0 is the reference signal for both

auditory and kinesthetic feedback loops. Separate delay elements pro-cess inputs for auditory and kinesthetic input. Kinesthetic and auditoryerror is computed from the delayed diVerences between desired F0 andperceived larynx or auditory F0 as appropriate. Kinesthetic and audi-tory error are low-pass Wltered and scaled for both loops, limited to amaximum of 50 cents, and added into the F0 drive signal

Exp Brain Res (2008) 187:613–621 619

123

0.271 s provided the best Wts. Next, with these two parame-ters set to their optimal values, we optimized the kinestheticpart of the model to experimental data in which kinesthesiawas present. We allowed the optimizer to adjust the gain ofkinesthetic feedback, KFgain, and kinesthetic delay, KFdelay,as mentioned above. A low pass Wlter identical to that usedfor F0 was also applied to kinesthetic error. The data werebest Wt by KFgain = 0.80, and KFdelay = 0.020. This emer-gent result for a kinesthetic delay of only 20 ms, muchshorter than the 100 ms delay needed to simulate the resultsof auditory perturbations (Hain et al. 2000), is consistentwith known neurophysiology. The delay between sensorystimulation of the larynx and laryngeal muscle responses ison the order of 18–25 ms (Ludlow et al. 1992). The combi-nation of a higher gain and shorter delay for kinesthesia asopposed to auditory feedback parameters, suggests thatover this short time frame, kinesthetic error is weightedmore heavily than auditory error for controlling F0.

Figures 3 and 4 show experimental data for the 50 and100 cents stimuli overlaid with simulations (dashed lines).For this experimental paradigm where kinesthesia and audi-tory input are in opposition, the simulated responses withkinesthesia absent, are 1.65 times larger than those wherekinesthesia is present.

Discussion

The present study has demonstrated that temporary anes-thetization of the mucosa of the vocal folds results in alarger response to pitch-shifted voice auditory feedbackcompared with the pre-anesthetic condition. The Wndingssuggest that both kinesthesia and auditory feedback areused for the control of vocalization. Furthermore, our math-ematical simulations suggest that early in the response, kin-esthesia alone provides feedback control, but after about100 ms, auditory feedback also participates.

There have been several previous studies that have alsostudied the eVects of anesthesia of laryngeal nerves ormucosal tissues on voice F0 (Mallard et al. 1978; Sorensenet al. 1980; Sundberg et al. 1995; Tanabe et al. 1975; Yangand Chen 2005). The main eVects of anesthesia were anincrease in F0 variability and/or a decrease in accuracy ofproducing a speciWc F0. None of the study designs weredirectly comparable to the present study, but their resultsare compatible with our proposed kinesthetic feedbackloop.

To explain our results, we proposed a simple linearextension of a previous mathematical model of F0 control(Hain et al. 2000). Our model is structured in a similar wayto the speech production model of Guenther et al (2006),having both a feedforward and feedback pathways, but ourprevious implementation did not include kinesthetic input.

To simulate this set of experimental data, we added a kines-thetic negative feedback loop, conWgured to stabilize F0against external perturbations. When auditory input is per-turbed, but kinesthesia remains unchanged, kinestheticinput and auditory input are in opposition to each other, andthe stabilizing response is small due to their interference.When kinesthetic input is removed, the response to audi-tory input is not reduced by kinesthesia, and therefore it islarger. We call this the “linear interaction hypothesis”. Ourmodeling demonstrates the feasibility of this particulartopology and set of parameters but it does not exclude thepossibility that other topologies or parameter sets might besuYcient to Wt the same data.

An alternative and/or complementary hypothesis thatmight also explain our results is a central non-linear inter-action. Our data documented larger magnitude responses topitch-shifted feedback in the presence of vocal fold anes-thesia suggesting greater reliance on auditory feedback.While the simplest explanation is that there simply was adecrease in the interfering kinesthetic input, it is also possi-ble that there was an additional increase in auditory gain. Inthe non-linear interaction hypothesis, the system increasesauditory feedback gain to control voice F0 in response to adisruption of kinesthetic feedback. Changes of gain areintrinsically non-linear operations. In the context of ourmodel, this could be implemented by an increase in the gainof the F0 auditory feedback loop, Fgain. To the extent thatthis additional non-linear response was present, our simula-tions would not require as large a contribution from kines-thesia to explain our results.

While the more complex non-linear control mechanismis not necessary to explain our results, our data do notexclude it. Although generally in modeling, simpler expla-nations are preferred, this idea is worth considering for sev-eral reasons. Previous work with other senses has shownthat it is common for diVerent sensory modalities to interactnon-linearly (Horak et al. 2001; Li et al. 1999; Shimojo andShams 2001). In a system where multiple sensory modali-ties are involved in control, if one modality is lost, dis-rupted or unreliable, remaining functional modalities can bereweighted (Horak et al. 1994). These same types of inter-actions in the phonatory control system could allow theweighting of diVerent sensory modalities to be modiWed toobtain optimal regulation of voice F0. A similar non-linearinteraction involving upweighting of kinesthetic inputcould explain the ability of singers to hold a precise noteeven in the presence of high levels of competing auditoryfeedback from other voices or instruments (Sundberg1987).

The main argument against the non-linear interactionhypothesis is that it is not necessary. Our simulations showthat the simpler linear interaction hypothesis is feasible byitself. Nevertheless, a combination of both of these expla-

620 Exp Brain Res (2008) 187:613–621

123

nations is also viable. One might be able to further test thenon-linear interaction hypothesis by performing similarexperiments to those of Mallard et al. (1978) using contem-porary methodology.

Either explanation requires one to accept that there is akinesthetic signal encoding F0. Previous work suggests thatkinesthetic information is indeed available from mucosalmechanoreceptors sensitive to laryngeal vibration andmovement of the cartilages (Gozaine and Clark 2005; Shibaet al. 1999). Discharge from these receptors could providethe speaker with information related to the state of musclecontraction, position of the vocal folds and a direct estimateof the voice F0.

If we accept that both auditory and kinesthetic feedbackare used to control F0, then the question arises naturally oftheir relative importance. Auditory feedback is clearly veryimportant—the onset of deafness leads to a rapid deteriora-tion in the ability to control voice F0 and amplitude, andmuch greater eVects on F0 level control and variations in F0have been observed in deaf speakers (Binnie et al. 1982;Leder et al. 1987; Osberger and Hesketh 1988; Svirskyet al. 1992) than the results previously discussed for anes-thesia. On the other hand, our results suggest that kinesthe-sia is even more important than auditory feedback, as thebest Wt for the gain for kinesthetic error (0.803) is morethan twice that for auditory error (0.273). Although kines-thesia may be relatively more important than auditoryinput, nevertheless it is part of a redundant negative feed-back control system. As long as correct auditory input isavailable, partial loss of the kinesthetic loop gain shouldnot cause critical disruptions to pitch control. Redundantcontrol loops are common in critical biological systems, asthey make the overall system more robust.

Furthermore, the importance of auditory and kinestheticfeedback may vary with respect to the time from onset ofphonation. Mallard et al. (1978) suggested that auditory andkinesthetic control may be used to a diVerent extent as wellas at diVering times with respect to the onset of phonationand for diVering frequency ranges. Supporting this idea,unlike auditory information, kinesthetic information couldallow the speaker to know if the cartilages were in the cor-rect position, and the vocal folds had the correct length andstiVness prior to the onset of vocalization. Such informationmay provide the necessary substrate for “prephonatory tun-ing” proposed by Wyke (1974, 1983). Prephonatory tuningrefers to the preparation undertaken by a singer prior tovocalization. It involves contraction of muscles to correctlyposition the laryngeal cartilages, and to adjust vocal foldtension to produce the desired note.

In contrast, auditory feedback can provide feedback onvoice F0, quality and loudness, which the laryngeal muco-sal receptors could not. Thus, auditory and somatosensoryprovide complimentary, but in some cases diVerent types of

feedback for the control of the voice. Moreover, auditoryfeedback, being in the acoustic domain, could also be inte-grated with other acoustical variables related to the envi-ronment.

Consistent with this idea, our data and modeling sug-gested that kinesthetic error acts with a shorter latency(20 ms) compared to auditory error (100 ms). Logically, themore rapidly available kinesthetic feedback should be moreheavily weighted over the short term, while auditory feed-back should eventually dominate. In fact, because auditoryfeedback requires 100 ms, for the Wrst 100 ms of phonation,all the control should logically be kinesthetic.

In conclusion, we have shown that anesthesia of thevocal folds increases the response to an externally imposedauditory perturbation. We have also shown that a simplemodiWcation of a linear feedback model of F0 control,incorporating kinesthetic feedback, is a feasible explanationfor this eVect. Although our simulations demonstrate that acompletely linear explanation is feasible, a plausible alter-native or additional explanation of this eVect is a non-linearinteraction where auditory feedback is upweighted, whenkinesthetic input is unavailable or simply less reliable.Overall, combining our data and the literature, it seemslikely that there are diVerent roles for these two sensorychannels: auditory feedback may be used for gross controlof F0 while kinesthesia is used when auditory feedback isnot available (i.e. prephonatory tuning or singing in a verynoisy environment), and for Wne rapid control of F0.

Acknowledgment This study was supported by a grant from NIHGrant No. DC006243-01A1. We thank Dr. David Conley for his assis-tance in administering anesthetic and Mr. Chun Liang Chan for com-puter programming.

References

Andreatta RD, Mann EA, Poletto CJ, Ludlow CL (2002) MucosalaVerents mediate laryngeal adductor responses in the cat. J ApplPhysiol 93:1622–1629

Binnie CA, DaniloV RG, Buckingham HW (1982) Phonetic disintegra-tion in a Wve-year-old following sudden hearing loss. J SpeechHear Disord 47:181–189

Burnett TA, Freedland MB, Larson CR, Hain TC (1998) Voice F0 re-sponses to manipulations in pitch feedback. J Acoust Soc Am103:3153–3161

Chen SH, Liu H, Xu Y, Larson CR (2007) Voice F0 responses to pitch-shifted voice feedback during English speech. J Acoust Soc Am121:1157–1163

Donath TM, Natke U, Kalveram KT (2002) EVects of frequency-shift-ed auditory feedback on voice F0 contours in syllables. J AcoustSoc Am 111:357–366

Elliott L, Niemoeller A (1970) The role of hearing in controlling voicefundamental frequency. Int Audiol IX:47–52

Eyzaguirre C, Sampson S, Taylor JR (1966) The motor control ofintrinsic laryngeal muscles in the cat. In: Granit R (ed) Nobelsymposium I: muscular aVerents and motor control. Wiley, NewYork, pp 209–225

Exp Brain Res (2008) 187:613–621 621

123

Gozaine TC, Clark KF (2005) Function of the laryngeal mechanore-ceptors during vocalization. Laryngoscope 115:81–88

Guenther FH, Ghosh SS, Tourville JA (2006) Neural modeling andimaging of the cortical interactions underlying syllable produc-tion. Brain Lang 96:280–301

Hain TC, Burnett TA, Kiran S, Larson CR, Singh S, Kenney MK(2000) Instructing subjects to make a voluntary response revealsthe presence of two components to the audio-vocal reXex. ExpBrain Res 130:133–141

Horak FB, Earhart GM, Dietz V (2001) Postural responses to combi-nations of head and body displacements: vestibular–somatosen-sory interactions. Exp Brain Res 141:410–414

Horak FB, Shupert CL, Dietz V, Horstmann G (1994) Vestibular andsomatosensory contributions to responses to head and body dis-placements in stance. Exp Brain Res 100:93–106

Jones JA, Munhall KG (2002) The role of auditory feedback during pho-nation: studies of Mandarin tone production. J Phon 30:303–320

Kawahara H (1995) Hearing voice: transformed auditory feedbackeVects on voice pitch control. In: Computational auditory sceneanalysis and International joint conference on artiWcial intelli-gence, Montreal

Kirchner JA, Suzuki M (1968) Laryngeal reXexes and voice produc-tion. In: Annals of New York Academy of Sciences, pp 98–109

Larson CR, Burnett TA, Bauer JJ, Kiran S, Hain TC (2001) Compari-sons of voice F0 responses to pitch-shift onset and oVset condi-tions. J Acoust Soc Am 110:2845–2848

Leder SB, Spitzer JB, Kirchner JC (1987) Speaking fundamental fre-quency of postlingually profoundly deaf adult men. Ann OtolRhinol Laryngol 96:322–324

Leonard RJ, Ringel RL (1979) Vocal shadowing under conditions ofnormal and altered laryngeal sensation. J Speech Hear Res22:794–817

Li Z, Morris KF, Baekey DM, Shannon R, Lindsey BG (1999) Re-sponses of simultaneously recorded respiratory-related medullaryneurons to stimulation of multiple sensory modalities. J Neuro-physiol 82:176–187

Ludlow C, Van Pelt F, Koda J (1992) Characteristics of late responsesto superior laryngeal nerve stimulation in humans. Ann Otol Rhi-nol Laryngol 101:127–134

Mallard AR, Ringel RL, Horii Y (1978) Sensory contributions to controlof fundamental frequency of phonation. Folia Phoniatr 30:199–213

Mürbe D, Pabst F, Hofmann G, Sundberg J (2002) SigniWcance ofauditory and kinesthetic feedback to singers’ pitch control. JVoice 16:44–51

Natke U, Kalveram KT (2001) EVects of frequency-shifted auditoryfeedback on fundamental frequency of long stressed and un-stressed syllables. J Speech Lang Hear Res 44:577–584

Osberger MJ, Hesketh LJ (1988) Speech and language disorders relat-ed to hearing impairment. In: Lass NJ, McReynolds LV, NorthernJL, Yoder DE (eds) Handbood of speech-language pathology andaudiology. B.C. Decker, Philadelphia, pp 858–886

Sapir S, McClean M, Luschei ES (1983) EVects of frequency-modu-lated auditory tones on the voice fundamental frequency in hu-mans. J Acoust Soc Am 73:1070–1073

Sasaki C, Masafumi S (1976) Laryngeal reXexes in cat, dog, and man.Arch Otolaryngol 102:400–402

Shiba K, Miura T, Yuza J, Sakamoto T, Nakajima Y (1999) LaryngealaVerent inputs during vocalization in the cat. Neuroreport10:987–991

Shimojo S, Shams L (2001) Sensory modalities are not separatemodalities: plasticity and interactions. Curr Opin Neurobiol11:505–509

Sorensen D, Horii Y, Leonard R (1980) EVects of laryngeal topicalanesthesia on voice fundamental frequency perturbation. J SpeechHear Res 23:274–283

Sundberg J (1987) The science of the singing voice. Northern IllinoisUniversity Press, Dekalb

Sundberg J, Iwarsson J, Billström A-MH (1993) SigniWcance of me-chanoreceptors in the subglottal mucosa for subglottal pressurecontrol in singers. In: 22nd annual symposium care of the profes-sional voice, Philadelphia

Sundberg J, Iwarsson J, Billstrom AH (1995) SigniWcance of mechano-receptors in the subglottal mucosa for subglottal pressure controlin singers. J Voice 9:20–26

Suzuki M, Sasaki C (1977) EVect of various sensory stimuli on reXexlaryngeal adduction. J Otol Rhinol Laryngol 86:30

Svirsky MA, Lane H, Perkell JS, Wozniak J (1992) EVects of short-term auditory deprivation on speech production in adult cochlearimplant users. J Acoust Soc Am 92:1284–1300

Tanabe M, Kitajima K, Gould W (1975) Laryngeal phonatory reXex.The eVect of anesthetization of the internal branch of the superiorlaryngeal nerve: acoustic aspects. Ann Otol Rhinol Laryngol84:206–212

Wyke B (1974) Laryngeal myotatic reXexes and phonation. Folia Pho-niatr 26:249–264

Wyke B (1983) Neuromuscular control systems in voice production.In: Bless DM, Abbs JH (eds) Vocal fold physiology. College-Hill,San Diego, pp 71–76

Xu Y, Larson C, Bauer J, Hain T (2004) Compensation for pitch-shift-ed auditory feedback during the production of Mandarin tone se-quences. J Acoust Soc Am 116:1168–1178

Yang CC, Chen SH (2005) Impact of topical anesthesia on acousticcharacteristics of voice during laryngeal telescopic examination.Otolaryngol Head Neck Surg 132:110–114