Intonation and Expressivity: A Single Case Study of Classical Western Singing

8
Intonation and Expressivity: A Single Case Study of Classical Western Singing *Johan Sundberg, Filipa M. B. L~ a, and Evangelos Himonides, *Stockholm, Sweden, yAveiro, Portugal, and zLondon, United Kingdom Summary: Previous studies have shown that singers tend to sharpen phrase-peak tones as compared with equally tem- pered tuning (ETT). Here we test the hypothesis that this can serve the purpose of musical expressivity. Data were drawn from earlier recordings, where a professional baritone sang excerpts as void of musical expression as he could (Neutral) and as expressive as in a concert (Concert). Fundamental frequency averaged over tones was examined and compared with ETT. Phrase-peak tones were sharper in excited examples, particularly in the Concert versions. These tones were flattened to ETT using the Melodyne software. The manipulated and original versions were presented pairwise to a mu- sician panel that was asked to choose the more expressive version. By and large, the original versions were perceived as more expressive, thus supporting the common claim that intonation is a means for adding expressivity to a performance. Key Words: Intonation–Sharpening–Expressivity–Western classical singing–Phase-peak tones. INTRODUCTION Intonation is commonly regarded as an important aspect of mu- sic performance. Already in the 1930s, Seashore 1 measured fundamental frequency (F 0 ) in recordings of renowned singers and revealed considerable departures from equally tempered tuning, henceforth ETT. He observed that ‘‘long notes tend to begin slightly flat and are gradually corrected, as if by hear- ing. ’’ 1 (p. 265) Although he termed the frequencies of the ETT the ‘‘true’’ values, he also assumed that these deviations have an ‘‘aesthetic value in phrasing and harmonic balancing. ’’ 1 (p. 267) Rapoport 2 further developed this idea using Fast Fourier Transform analysis on selected tones sung in commercial re- cordings by famous vocal artists. He found momentary devia- tions from the mean F 0 during a tone, ‘‘microintonation,’’ and classified them into a system of ‘‘singing mode categories.’’ He hypothesized that ‘‘the singer uses microintonation in a very methodical way as his or her means of expressiveness be- yond the written score’’ (Rapoport, 1996, pp. 112–113). Singers’ F 0 accuracy was further explored in a study of 10 re- nowned singers’ commercial recordings of the song Ave Maria by Franz Schubert. 3,4 The mean F 0 values of tones were compared with the ETT of the accompaniment. The results revealed deviations from ETT, henceforth DETT, as large as ±40 cent, thus exceeding almost by an order of magnitude the just-noticeable difference for frequency discrimination. An ex- pert panel was asked to identify all tones that sounded out of tune in this material. The result showed that (1) for most tones, the tolerance zone for ‘‘in tune’’ was about ±10 cent, that is, tones outside this zone tended to be heard as out of tune and (2) the mean F 0 value of certain tones, perceived as in tune, could deviate from ETT by as much as +35 cent or 20 cent, thus widely exceeding the tolerance zone. 4 Other commercial recordings of premier singers’ perfor- mances were recently analyzed in some detail. 5 Figure 1 shows an example taken from the first Kyrie movement of Verdi’s Messa da Requiem as performed by the late tenor Jussi Bjorling. The singer started the long, top note of the phrase, henceforth phrase-peak note, at ETT, but gradually sharpened it, ending at more than +50 cent. Further measurements of other world famous tenors’ intona- tion showed that such sharpening is not specific to Bjorling. 6 For example, sharp intonation of a phrase-peak tone was noted also in Carlo Bergonzi’s, Nicolai Gedda’s, and Luciano Pavar- otti’s performances of an excerpt from Verdi’s opera Aida. The result showed a descending octave interval clearly exceeding a 2:1 ratio. Such octave stretch has been found to be preferred by musically trained listeners. 7,8 Of course, DETT in professional music performances are not specific to singers (for reviews, see refs. 9,10 ; http://www. speech.kth.se/music/performance/download/dm-download.html). Also, in the Director Musices system of musical performance rules, derived from analysis-by-synthesis, one of the formulated principles is to sharpen contrasts, and this can concern not only durational patterns but also sharpening of the pitch. 11 Although some of singers’ DETT might be made to achieve timbral effects, 12 most authors assume that such deviations are performed for the sake of expressiveness. 13 However, this assumption has not yet been formally tested. The aim of the present study was to find out to what extent intonation in singing can be perceived as contributing to expressivity. Two experiments were carried out: in the first, F 0 was ex- tracted in a material of sung examples and the averages were compared with ETT and in the second, a listening test was carried out, presenting paired stimuli with intonation differences to a panel of music experts who were asked to rate expressiveness. DATA SELECTION The material was collected from an earlier study on expressivity in singing. 14 Opera baritone H akan Hageg ard was recorded while performing a set of song excerpts in two different ways (1) as void of musical expression as he could (Neutral version) Accepted for publication November 27, 2012. From the *Department of Speech, Music and Hearing, School of Computer Science and Communication, KTH, Stockholm, Sweden; yDepartment of Communication and Arts, INET-MD, University of Aveiro, Aveiro, Portugal; and the zCulture, Communication and Media, Institute of Education, University of London, London, United Kingdom. Address correspondence and reprint requests to Johan Sundberg, Department of Speech, Music and Hearing, KTH, Se-10044, Stockholm, Sweden. E-mail: [email protected] Journal of Voice, Vol. 27, No. 3, pp. 391.e1-391.e8 0892-1997/$36.00 Ó 2013 The Voice Foundation http://dx.doi.org/10.1016/j.jvoice.2012.11.009

Transcript of Intonation and Expressivity: A Single Case Study of Classical Western Singing

Intonation and Expressivity: A Single Case Study of

Classical Western Singing

*Johan Sundberg, †Filipa M. B. L~a, and ‡Evangelos Himonides, *Stockholm, Sweden, yAveiro, Portugal, and zLondon, UnitedKingdom

Summary: Previous studies have shown that singers tend to sharpen phrase-peak tones as compared with equally tem-

AccepFrom t

CommunINET-MDand MedAddre

Music anJourna0892-1� 201http://d

pered tuning (ETT). Here we test the hypothesis that this can serve the purpose of musical expressivity. Data were drawnfrom earlier recordings, where a professional baritone sang excerpts as void of musical expression as he could (Neutral)and as expressive as in a concert (Concert). Fundamental frequency averaged over tones was examined and comparedwith ETT. Phrase-peak tones were sharper in excited examples, particularly in the Concert versions. These tones wereflattened to ETT using theMelodyne software. The manipulated and original versions were presented pairwise to a mu-sician panel that was asked to choose the more expressive version. By and large, the original versions were perceived asmore expressive, thus supporting the common claim that intonation is a means for adding expressivity to a performance.Key Words: Intonation–Sharpening–Expressivity–Western classical singing–Phase-peak tones.

INTRODUCTION

Intonation is commonly regarded as an important aspect of mu-sic performance. Already in the 1930s, Seashore1 measuredfundamental frequency (F0) in recordings of renowned singersand revealed considerable departures from equally temperedtuning, henceforth ETT. He observed that ‘‘long notes tendto begin slightly flat and are gradually corrected, as if by hear-ing.’’1 (p. 265) Although he termed the frequencies of the ETT the‘‘true’’ values, he also assumed that these deviations have an‘‘aesthetic value in phrasing and harmonic balancing.’’1 (p. 267)

Rapoport2 further developed this idea using Fast FourierTransform analysis on selected tones sung in commercial re-cordings by famous vocal artists. He found momentary devia-tions from the mean F0 during a tone, ‘‘microintonation,’’ andclassified them into a system of ‘‘singing mode categories.’’He hypothesized that ‘‘the singer uses microintonation ina very methodical way as his or her means of expressiveness be-yond the written score’’ (Rapoport, 1996, pp. 112–113).

Singers’ F0 accuracy was further explored in a study of 10 re-nowned singers’ commercial recordings of the song Ave Mariaby Franz Schubert.3,4 The mean F0 values of tones werecompared with the ETT of the accompaniment. The resultsrevealed deviations from ETT, henceforth DETT, as large as±40 cent, thus exceeding almost by an order of magnitude thejust-noticeable difference for frequency discrimination. An ex-pert panel was asked to identify all tones that sounded out oftune in this material. The result showed that (1) for most tones,the tolerance zone for ‘‘in tune’’ was about ±10 cent, that is,tones outside this zone tended to be heard as out of tune and(2) the mean F0 value of certain tones, perceived as in tune,could deviate from ETT by as much as +35 cent or �20 cent,thus widely exceeding the tolerance zone.4

ted for publication November 27, 2012.he *Department of Speech, Music and Hearing, School of Computer Science andication, KTH, Stockholm, Sweden; yDepartment of Communication and Arts,, University of Aveiro, Aveiro, Portugal; and the zCulture, Communication

ia, Institute of Education, University of London, London, United Kingdom.ss correspondence and reprint requests to Johan Sundberg, Department of Speech,d Hearing, KTH, Se-10044, Stockholm, Sweden. E-mail: [email protected] of Voice, Vol. 27, No. 3, pp. 391.e1-391.e8997/$36.003 The Voice Foundationx.doi.org/10.1016/j.jvoice.2012.11.009

Other commercial recordings of premier singers’ perfor-mances were recently analyzed in some detail.5 Figure 1 showsan example taken from the first Kyrie movement of Verdi’sMessa da Requiem as performed by the late tenor JussiBj€orling. The singer started the long, top note of the phrase,henceforth phrase-peak note, at ETT, but gradually sharpenedit, ending at more than +50 cent.

Further measurements of other world famous tenors’ intona-tion showed that such sharpening is not specific to Bj€orling.6

For example, sharp intonation of a phrase-peak tone was notedalso in Carlo Bergonzi’s, Nicolai Gedda’s, and Luciano Pavar-otti’s performances of an excerpt from Verdi’s opera Aida. Theresult showed a descending octave interval clearly exceedinga 2:1 ratio. Such octave stretch has been found to be preferredby musically trained listeners.7,8

Of course, DETT in professional music performances arenot specific to singers (for reviews, see refs.9,10; http://www.speech.kth.se/music/performance/download/dm-download.html).Also, in the Director Musices system of musical performancerules, derived from analysis-by-synthesis, one of the formulatedprinciples is to sharpen contrasts, and this can concern not onlydurational patterns but also sharpening of the pitch.11

Although some of singers’ DETT might be made to achievetimbral effects,12 most authors assume that such deviationsare performed for the sake of expressiveness.13 However,this assumption has not yet been formally tested. The aim ofthe present study was to find out to what extent intonation insinging can be perceived as contributing to expressivity.Two experiments were carried out: in the first, F0 was ex-tracted in a material of sung examples and the averageswere compared with ETT and in the second, a listening testwas carried out, presenting paired stimuli with intonationdifferences to a panel of music experts who were asked torate expressiveness.

DATA SELECTION

Thematerial was collected from an earlier study on expressivityin singing.14 Opera baritone H�akan Hageg�ard was recordedwhile performing a set of song excerpts in two different ways(1) as void of musical expression as he could (Neutral version)

FIGURE 1. F0 pattern of Jussi Bj€orling’s rendering of the first Kyrie

in Giuseppe Verdi’s Messa da Requiem from the recording DECCA

467 119-2made in 1960. The heavy horizontal lines show the F0 values

according to the ETT.

Journal of Voice, Vol. 27, No. 3, 2013391.e2

and (2) as in a public performance (Concert version). The ex-cerpts are listed in Table 1.

In the study, these recordings were presented to two expertpanels. The task for the first panel was to identify the pairsthat differed most with respect to expressivity. The excerptsthat were perceived as most differing were selected for thetest with the second panel. The aim here was to identify the pre-vailing emotional colors of the performed excerpts. The alterna-tives were Angry, Happy, Loving, Hateful, Sad, Solemn, andSecure. Five excerpts were perceived as Loving, Sad, or Secure,thus sharing a peaceful character, whereas six perceived as An-gry, Solemn, or Happy were assumed to share an excited mood.

Acoustic analysis of the Neutral and Concert versions re-vealed a number of differences, both in the excited and in thepeaceful examples: loudness, tempo, phonation type, and rateof change of sound level. However, F0 was not examined.Therefore, in the first experiment of the present study, F0 wasmeasured in the excited and peaceful examples and comparedbetween the Neutral and Concert versions.

TABLE 1.

Excited and Peaceful Examples Analyzed in the Previous Invest

Composer Piece

Excited

G. Mahler Lieder eines fahrenden Gesellen,

song # 3, b 5–11

F. Schubert Erlk€onig, b 72–79

R. Schumann Liederkreis XII, b 18–26

R. Schumann Dichterliebe VII, b 12–18

R. Strauss Zueignung, b 21–29

G. Verdi Falstaff, Ford’s monologue, b 24–31

Peaceful

F. Schubert Du bist die Ruh, b 8–15

F. Schubert Wanderers Nachtlied, b 3–14

F. Schubert N€ahe des Geliebten, b 3–8

R. Schumann Dichterliebe VI, b 31–42

F. Mendelsohn Paulus, Aria # 18, b 5–13

EXPERIMENT 1—F0 EXTRACTION

Method

The Corr module of the Soundswell signal workstation (HitechDevelopment, Solna, Sweden) was used for F0 extraction. Thismodule applies an autocorrelation strategy and shows in termsof a gray scale the correlation between the signal in two adja-cent analysis windows.15 The output is an smp file containingthe F0 curve.Using the Histogram module of the Soundswell software, the

average F0 value was determined in all examples for all tonesthat had a duration exceeding one complete vibrato period(about 170 milliseconds). Each measurement included a se-quence of complete vibrato periods. The values obtained werethen expressed in semitones using 440 Hz for the pitch of A4as the reference. However, it turned out that the singer often de-viated slightly from this reference. Therefore, the frequencythat yielded a mean DETT of 0 cent across the entire excerptwas used as a new reference for ETT. For example, ifF0A4 ¼ 440Hz yielded a mean DETT of +15 cent, the 440 Hzreference was replaced by a 15 cent sharper reference (ie,443.83 Hz). This reference was used for measuring the averageDETT also in the Neutral version.

Results

Figure 2 compares, in terms of a scatter plot, the DETT forall tones in the Neutral (DETTNeutral) and in the Concert(DETTConcert) versions. DETTConcert ranged between +56and �76 cent and DETTNeutral between +53 and �77 cent.The coefficient of determination for the data points wasR2¼ 0.106.A paired sample t test was used to assess which of the Con-

cert and Neutral versions showed a significant difference inDETT. The Falstaff example was the only one for which a sig-nificant difference was found (t(15)¼ 4.781; P < 0.05).A closer examination of the phrase-peak tones revealed a sys-

tematic trend. Table 2 lists their durations and DETT values in

igation3; Letter b Refers to Bar Number

Section Acronym

Ich hab’ ein gl€uhend Messer . Me(sser)

Mein Vater, mein Vater . Vater

Und der Mond . Sie(ist dein)

Wie Du auch strahlst . Herz(ensnacht)

Und beschworst darin die B€osen . He(ilig)

Laudata sempre sia . Cor

Du bist die Ruh . Frie(de)€Uber allen Gipfeln ist Ruh . All(e)

Ich denke dein . Mee(re)

Es schweben Blumen und Englein . Lie(be)

Gott sei mir gn€adig . Syn(der)

FIGURE 2. Comparison of the DETT for all tones measured in the

Neutral and in the Concert versions. The equation refers to the trend

line.

Johan Sundberg, et al Intonation and Expressivity 391.e3

the two versions. Tone durations did not differ significantly, butsystematic differences were observed for DETT.

Given these results, two comparisons seemed relevant foreach of the examples: (i) between excited and peaceful exam-ples and (ii) between the pairs of Concert and Neutral versions.

Table 3 lists the results of the DETT for the excited andpeaceful examples of the concert version productions. SmallerDETT were found for the peaceful examples (Figure 3): meanDETTExcited¼ 29.1 cent, standard deviation (SD)¼ 29.4 centand mean DETTPeaceful¼ 13.1 cent, SD¼ 27.6 cent; however,a paired sample t test showed that both these differences failedto reach significance.

With respect to the Concert and Neutral comparison(Figure 3), DETT were significantly greater for Concert ver-sions of the excited examples only (one sample t test:t(9)¼ 2.94; P¼ 0.017). This seemed to support the hypothesisthat these deviations were related to the expressivity of theperformance.

TABLE 2.

Durations and Mean Deviation From ETT (DETT) of the Phrase-

Concert Versions

Example Specified Word

Ford’s monologue Cor

Zueignung Hei(lig)

Ich hab’ ein gl€uhend Messer Me(sser)

Wie Du auch strahlst Herz(ensnacht)

Und der Mond final Sie (ist dein)

Syllables in bold refer to the phrase-peak tones analyzed.

EXPERIMENT 2—LISTENING TEST

Method

To test the last mentioned hypothesis that DETT between Con-cert and Neutral versions of the excited examples is related tothe expressivity of the performance, a listening experimentwas carried out. The strategy was to explore the perceptual ef-fect of artificially flattening, to ETT, the sharp phrase-peaktones in the five excited examples. The flattening was achievedby means of the Melodyne software (Celemony SoftwareGmbH, Munich, Germany), which allows the changing of F0

in audio recordings (http://en.wikiaudio.org/Melodyne: Tools,accessed January 20, 2012).

Special care was taken to find out to what extent Melodyneaffected not only F0 but also other acoustic parameters. Thiswas tested in the following way. A vowel was synthesized onthe custom made Madde software (by Svante Granqvist); itsF0, formant frequencies, and bandwidths are listed in Table 4.

The synthesized vowel was then manipulated using Melo-dyne. Two versions were thus obtained, the original and thema-nipulated. In the latter, F0 (110 Hz) was lowered by 6 Hz (or 80cent). The formant frequencies and bandwidths of these ver-sions were analyzed by inverse filtering, using the DeCapcustom-made software (by Svante Granqvist), described else-where.16 The inverse filtering analysis returned the original set-tings within an error that ranged from �4 to +20 cent forformants and �37 to +14 cent for bandwidths (Table 4). How-ever, Melodyne was found to shift not only F0 but also the for-mants; F0 shift was measured as�97 cent (rather than 80 cent),and F1 and F2 were shifted by a similar amount (89 and 88cent). Such formant shifts, which could not be avoided, wouldhave a slight effect on voice timbre. In fact, this is also implic-itly acknowledged in the Melodyne manual (http://en.wikiaudio.org/Melodyne: Tools), although with reference tothe effect of an increase rather than a decrease of F0: ‘‘Some-times in Melodyne when you move a note, say for example toa higher pitch, it may begin to sound ‘chipmonk-ish.’ Movingthe formant back down a little while keep the note in its new po-sition can alleviate this unwanted characteristic.’’ Such a ‘‘chip-monkish’’ timbre would be associated with an upward shift ofthe formant frequencies.

Peak Tones in the Excited Examples, for the Neutral and

Tone Duration [s] DETT [cent]

Neutral Concert Neutral Concert

2.01 1.90 �4 31

0.92 1.55 9 45

0.42 0.34 �7 44

0.40 0.25 51 82

0.28 0.46 28 61

TABLE 3.

F0 Data for Phrase-Peak Tones

Word Pitch F0 in ETT [Hz] Mean F0 [Hz] SD [Hz] DETT [cent]

Excited

Cor F#4 369.6 376.2 13.4 31

Me(sser) F4 350.9 359.9 17.8 44

Herz(ensnacht) B3 245.6 257.4 13.8 82

Sie (ist dein) D#4 305.0 316.0 16.9 61

H€o(rest) C#4 268.3 269.6 14.0 8

Hei(lig) F#4 363.8 373.5 13.4 45

Peaceful

Frie(den) Bb3 235.6 236.2 4.8 4

All(e) B 242.5 241.3 3.6 �9

Mee(re) G#3 205.5 206.9 5.0 12

Syn(der) D4 289.1 290.4 6.2 7

Lie(be) C4 257.8 265.2 7.5 49

‘‘Word’’ lists theword in the lyrics onwhich the phrase-peak tone appeared. Column ‘‘F0 in ETT’’ presents the F0 values of these tones in ETT; columns ‘‘Mean F0’’and ‘‘SD’’ refer to themean and its standard deviation of F0 in the singer’s Concert version. Column ‘‘DETT’’ shows howmuchMean F0 deviated fromETT, which

in the case of the excited examples equals the size of the F0 shift produced by the Melodyne manipulation.

Journal of Voice, Vol. 27, No. 3, 2013391.e4

Melodyne was then used for flattening the F0 of the phrase-peak tones in the concert versions of the excited examples,such that they matched those of ETT. The amounts of flatteningused for the different excited examples are listed in Table 3.

The original andmanipulated versions of the five concert ver-sions were then arranged in pairs. The order within the pair wasrandomized. The pairs were presented in a listening test inwhich expert listeners were asked to judge which version ineach pair sounded more expressive.

A preliminary version of this test was piloted with a group of10 musicians. It contained a total of 20 presentations of pairedstimuli, original and manipulated. Each of the five stimuluspairs was presented four times: two with the original and twowith the manipulated versions as the first in the pair.

The listeners were given a written instruction on a responsesheet, asking them to decide if they found the first or the second

FIGURE 3. DETTof the mean F0 of phrase-peak tones in the excited and p

Neutral versions, respectively.

version in the pair more expressive: ‘‘From the paired examplesin the following presentations, please tick the one that you per-ceive as most expressive.’’ The feedback from these listeners re-vealed that they found it extremely difficult to determine whichversion was more expressive.In the subsequent final version of the test, which included

a total of 20 stimulus pairs (four presentations of each of fiveexamples, two with original first and two with manipulatedfirst), 42 expert listeners, master students, and music teacherswith more than 5 years of teaching experience were asked topay special attention to the peak-phrase tone, which was givenin the response sheet in terms of the corresponding word in thelyrics. For each stimulus pair, there were two boxes in the re-sponse sheet, and the listeners gave their responses by tickingthe box corresponding to the version that they perceived asmore expressive. No scores were given. The same stimuli order

eaceful examples. Filled and unfilled columns refer to the Concert and

TABLE 4.

Effects of the Melodyne Manipulation on Formant Frequencies (Fn) and Bandwidths (Bn)

Madde Synthesis After Melodyne Modification

Fn Bn Fn Bn

Used

(Hz)

Measured

(Hz)

Difference

(Hz)

Used

(Hz)

Measured

(Hz)

Difference

(Hz)

Measured

(Hz)

Difference

(Hz)

Difference

(Cent)

Measured

(Hz)

Difference

(HZ)

F0 110 110 0 — — — 104 �6 �97 — —

F1 700 709 9 47 28 �19 665 �35 �89 40 �7

F2 1050 1057 7 55 32 �23 998 �52 �88 38 �17

F3 2500 2496 �4 114 77 �37 2480 �20 �14 80 �34

F4 2700 2702 2 108 88 �20 2699 �1 �1 87 �21

F5 2900 2920 20 116 130 14 2900 0 0 122 6

F0 refers to fundamental frequency, and F1, F2, F3, F4, F5, to formants 1, 2, 3, 4, 5. Columns ‘‘Used’’ lists the frequencies of Fn and their Bn used in the Madde

synthesis. Columns ‘‘Measured’’ show the values obtained from the inverse filter analysis and columns ‘‘Difference’’ refer to the difference between the values

listed in the ‘‘Used’’ and the ‘‘Measured’’ columns, for both Fn and Bn. The values obtained after the Melodyne F0 modification refer to those observed after the

Madde synthesis had been modified by the Melodyne software.

Johan Sundberg, et al Intonation and Expressivity 391.e5

was used for all listeners. At the end of the response form, anopen question was included asking the listeners what character-istic they found most relevant to their choices.

Results

As the data were nominally dichotomous, the results of the lis-tening test were first evaluated by means of a one-sample bino-minal test (Table 5). The underlying assumption was that theproportion of responses was the same for the original and ma-nipulated versions, that is, that the proportion would not deviatesignificantly from 0.5.

TABLE 5.

Results of One-Sample Binominal Test, Showing the Votes for

Example Appearance of Pair in Test Order in

Cor 1st Original fi

Manipula

2nd Original fi

Manipula

He(ilig) 1st Original fi

Manipula

2nd Original fi

Manipula

Me(sser) 1st Original fi

Manipula

2nd Original fi

Manipula

Herz(ensnacht) 1st Original fi

Manipula

2nd Original fi

Manipula

Sie(ist dein) 1st Original fi

Manipula

2nd Original fi

Manipula

Test value¼ 0.5. From left to right, the columns represent: examples; where the stim

first in the pair (Order in the pair); number of votes for the original as being more

* Significance level P < 0.05; n¼ 41.

Significant deviations were found in all examples, except for‘‘Sie(ist dein).’’ However, the order of the versions withinthe pair seemed relevant to the responses, see Figure 4. Whenthe original appeared as the second in the pair, it was consis-tently rated as being more expressive for ‘‘Cor’’ and ‘‘Hei(lig),’’ for both presentations of these pairs. For ‘‘Me(sser),’’ sig-nificant preference for the original version was observed onlythe first time it appeared in the test. The trend to prefer the sec-ond version in the pair was so strong in the case of ‘‘Herz(ensnacht)’’ that the second version in the pair was preferrednomatter if it presented themanipulated or the original versions.

the Original Version Being More Expressive

the Pair n Proportion [%] P (Two Tailed)

rst 21 51 1.000

ted first 28 68 0.028*

rst 23 56 0.533

ted first 29 71 0.012*

rst 15 37 0.117

ted first 37 90 <0.001*

rst 18 44 0.533

ted first 36 88 <0.001*

rst 26 63 0.117

ted first 35 85 <0.001*

rst 23 56 0.533

ted first 27 66 0.06

rst 11 27 0.004*

ted first 34 83 <0.001*

rst 12 29 0.012*

ted first 29 71 0.012*

rst 20 49 1

ted first 19 46 0.755

rst 16 39 0.211

ted first 25 61 0.211

ulus pair appeared in the test (Appearance of pair in test); what versionwas

expressive (n); proportion of these votes (Proportion); and P values.

FIGURE 4. Percentage of votes for the original version being more

expressive, when it appeared as the first and the second stimulus in the

pair.

Journal of Voice, Vol. 27, No. 3, 2013391.e6

To investigate whether there was an overall preference for theoriginal version in the pairs, the proportions of votes for theoriginal version were averaged (1) for all excerpts and (2) foreach individual excerpt. A one sample t test was applied to as-sess whether these means differed significantly from 0.5, ata confidence level of a¼ .05%. With respect to the averageacross all excerpts, significant proportion differences werefound showing that the original version was perceived as beingmore expressive than the manipulated. With respect to individ-ual excerpts, the original versions of the examples ‘‘Me(sser),’’’’Hei(lig),’’ and ’’Cor’’ were rated as significantly more expres-sive. For ‘‘Herz(ensnacht)’’ and ‘‘Sie (ist dein),’’ the preferencefor the original versions failed to reach significance. These re-sults are listed in Table 6.

As mentioned, the listeners were given the opportunity tospecify what determined their choice of version. Several re-sponses mentioned the following: ‘‘support,’’ ‘‘energy,’’ ‘‘em-phasis,’’ and ‘‘stress.’’ No responses mentioned intonation ortimbral differences.

TABLE 6.

Results of the One Sample t Test (t) and Associated P Values fo

Example Mean (SD)

All examples 0.59 (0.11)

Cor 0.61 (0.23)

Hei(lig) 0.63 (0.23)

Me(sser) 0.67 (0.21)

Herz(ensnacht) 0.52 (0.18)

Sie (ist dein) 0.48 (0.25)

Test value¼ 0.5. SD refers to the standard deviations of the proportion of votes g

* Significance level P < 0.05; n¼ 41.

DISCUSSION

Twomain findings have emerged from our investigation: (1) theF0 measurements revealed that the singer sharpened phrase-peak tones in excited but not in peaceful examples and (2) thelistening test demonstrated that expert listeners perceived thissharpening as adding to the expressivity. However, it is impor-tant to realize that the investigation is limited in certainrespects.This study was based on the recordings of a single singer, thus

calling into question the generality of the observations. On theone hand, the subject was an internationally touring opera andconcert soloist. Furthermore, as mentioned, the trend to sharpenphrase-peak tones has been observed also in commercial record-ings of other internationally renowned singers.5,6 This supportsthe assumption that our results possess some general validity.On the other hand, it would seem unlikely that all singerswould always use exactly the same expressive means ina given musical context. Rather, each performer may interpreta given piece using his or her own personal expressive code.The phrase-peak tones in the Concert versions of the excerpts

were flattened. An alternative way to test the assumption thatsharpening of phrase-peak tones adds to the expressiveness ofthe excited examples would have been to raise F0 of these tonesin the Neutral versions. However, also the Neutral versionsshowed a sharpening of phrase-peak tones, though to a lesserextent. Hence, the effect of the sharpening would then probablyhave been too subtle to produce a noticeable difference betweenthe versions.A crucial question for the validity of the listening test obvi-

ously is whetherMelodyne introduced not only the transpositionof F0 but also other audible effects. Our analyses revealed thatfor an F0 shift of 6 Hz, the software shifted also F1 and F2 by40 or 50 Hz. This is larger than the just noticeable differencefor F1 and F2, which has been reported to vary between 10and 30 Hz.17 Hence, the manipulated versions may have beenperceived as differing from the original ones with respect to tim-bre. On the other hand, our expert panel was asked to select themore expressive version in each stimulus pair; a timbral differ-ence caused by a lowering of F1 and F2 is not very likely to beperceived as an expressivity increase of excited examples. Also,no listener mentioned timbral differences as the characteristicon which they based their choice of version.

r the Five Excerpts

One Sample t Test

t(40) P

4.99 <0.001

3.04 <0.001*

3.61 <0.001*

5.36 <0.001*

0.85 0.4

�0.31 0.76

iven to each of the five original versions.

Johan Sundberg, et al Intonation and Expressivity 391.e7

Expressiveness is generally regarded as a most important,though subjective, quality of performed music. Our panel con-sisted of qualified musicians, so it is reasonable to expect thatthey were skilled in recognizing expressivity. Also, the risk ofreceiving random responses in a listening test was substantiallyreduced because (i) the test was previously piloted; (ii) the lis-teners’ attention was cued to the specific word in the lyricswhere the phrase-peak tone appeared; and (iii) these toneswere presented in a musical context.

To our knowledge, this is the first time that expressive effectsof intonation in singing are formally analyzed bymeans of a per-ceptual evaluation by expert musicians. The results support thefrequently made assumption that intonation is used as an ex-pressive mean in music performance.10 It is noteworthy thatall phrase-peak tones in the excited examples were sharpened,particularly in the Concert versions. This seems to be in accor-dance with previous measurements of musicians’ intonation.1

Incidentally, it also seems to agree with the jargon sometimesheard among musicians ‘‘Better to be sharp than out-of-tune.’’

If sharpening of phrase-peak tones constitutes a means to in-crease expressiveness, one might ask why this particular acous-tic code is used for this purpose. According to the DirectorMusices grammar of music performance, a general principleunderlying musicians’ tuning strategies is to sharpen high-pitched tones and flatten low-pitched tones.11 Although thesharpening produced by that grammar is much smaller thanthe one observed in the present study, the current findingsmay be regarded as a qualitative corroboration of this rule. Itmay also be relevant that elevated speaking F0 is a typicalsign of agitation. According to Fyk.13 players of bowedinstrument tend to expand ascending intervals, and all phrase-peak tones are obviously preceded by ascending intervals.

Seashore1 noted that the mean F0 in singing is often flattenedor sharpened, particularly for short tones, but did not specifyhow much. In the investigation of 10 singers’ intonations oflong tones in Schubert’s Ave Maria, it was found that expert lis-teners accepted some tones as ‘‘in tune,’’ although they departedfrom ETT by +35 to �20 cent.14 The greatest DETT value, 82cent was found in ‘‘Herz(ensnacht),’’ which, however, con-cerned a short tone, no more than 259 milliseconds long. Per-haps duration is a relevant factor in determining theboundaries of ‘‘out-of-tune.’’ This possibility may belong tothe explanation why significant response differences betweenoriginal and manipulated versions were obtained for only fourof the five examples.

There was a strong trend in the panel to perceive the originalversion as more expressivewhen it appeared as the second stim-ulus in the pair. A possible explanation could be that the firstversion that appeared in the pair was compared with the sub-ject’s internal tuning reference, whereas when it appeared asthe second in the pair, it was compared with the first versionthat immediately preceded it. In any event, expectations playan important role in music listening.18

When singing is perceived as out-of-tune, sample-accuratepitch manipulation softwares, such as Celemony Melodyne, An-tares Autotune (Antares Audio Technologies, Scotts Valley,CA), and Roland V-Vocal (Cakewalk by Roland, Boston,

MA), are often used either to purely control the final vocal out-put for accuracy in intonation or, perhaps, to evoke novel aes-thetic experiences. Therefore, it is risky to resort tocommercial recordings when investigating intonation in musicperformance because the F0 parameter may have beenmanipulated.

CONCLUSION

This investigation has shown that a professional singer maysharpen phrase-peak tones, sometimes even more than 50cent, an observation in agreement with previous research. Theresponses to the open question at the end of the listening testsuggested that most listeners actually failed to notice the into-nation differences as a pitch effect. Thus, they did not appearto perceive pitch as a guide to rate expressiveness. This studysuggests that such sharpening may contribute to expressiveness;the singer sharpened tones more when he sang the examples asexpressive as in a concert than when he sang them as void ofexpressivity as he could. Also, expert listeners perceived origi-nal versions as more expressive than those in which the intona-tion was in accordance with ETT. This observation strengthensthe commonly made claim that intonation can be used as an ex-pressive mean in singing.

Acknowledgments

The authors acknowledge their great gratitude to H�akan Ha-geg�ard, who brilliantly provided the data analyzed in the pres-ent study. The authors are also indebted to the musicians whoparticipated in the listening tests.

REFERENCES1. Seashore CE. Psychology of Music. New York: Dover Publications, Inc.;

1938.

2. Rapoport E. Emotional Expression Code in Opera and Lied Singing. J New

Music Res. 1996;25:109–49.

3. Prame E. Vibrato extent and intonation in professional Western lyric sing-

ing. J Acoust Soc Am. 1997;102:616–621.

4. Sundberg J, Prame E, Iwarsson J. Replicability and accuracy of pitch pat-

terns in professional singers. In: Davis P, Fletcher N, eds. Vocal Fold Phys-

iology, Controlling Complexity and Chaos. San Diego: Singular Publising

Group; 1996:291–306.

5. Sundberg J. Some observations on operatic singers’ intonation. Interdiscip

Stud Musicol. 2011;10:47–60.

6. Sundberg J. On the expressive code in music performance. In: Lang P, ed.

Klangfarbe: Vergleichend-systematische und musikhistorische Perspek-

tiven. Frankfurt am Main: Internationaler Verlag der Wissenschaften;

2011:113–127.

7. Sundberg J. In tune or not? A study of fundamental frequency in music

practise. STL-QPSR. 1982;23:49–78.

8. Sundberg JEF, Lindqvist J. Musical octaves and pitch. J Acoust Soc Am.

1973;54:922–929.

9. Gabrielson A. The performance of music. In: Deutsch D, ed. The Psychol-

ogy of Music. San Diego: Academic Press; 1999:501–602.

10. Morrison SJ, Fyk J. Intonation. In: Parncutt R,McPhersonGE, eds. The Sci-

ence and Psychology of Music Performance. Oxford: Oxford University

Press; 2002:183–197.

11. Friberg A. A Quantitative Rule System for Musical Expression, Music

Acoustics. Stockholm: KTH Royal Institute of Technology; 1995.

12. Vurma A, RajuM, Kuuda A. Does timbre affect pitch?: Estimations by mu-

sicians and non-musicians. Psychol Music. 2010;39:291–306.

13. Fyk J. Melodic Intonation, Psychoacoustics, and the Violin. Zielona G�ora:Organ Publishing House; 1995.

Journal of Voice, Vol. 27, No. 3, 2013391.e8

14. Sundberg J, Iwarsson J, Hageg�ard J. A singer’s expression of emo-

tions in sung performance. In: Fujimura O, Hirano M, eds. Vocal

Fold Physiology: Voice Quality Control. San Diego: Singular Press;

1995.

15. Granqvist S, Hammarberg B. The correlogram: a visual display of period-

icity. J Acoust Soc Am. 2003;114:2934–2945.

16. Sundberg J, L~a FMB, Gill BP. Professional male singers’ formant strategies

for the vowel a. Logopedics Phoniatrics Vocology. 2011;36:156–167.

17. Stevens KN. Acoustic Phonetics. Cambride, Massachusetts: The MIT

Press; 1998.

18. Huron D. Sweet Anticipation: Music and the Psychology of Expectation.

Cambridge: MIT Press; 2006.