Temporal variability in speech segments of Spanish: context and speaker related differences

17
Temporal variability in speech segments of Spanish: context and speaker related differences E. Mendoza a, * , G. Carballo a , A. Cruz a , M.D. Fresneda a , J. Mu~ noz a , V. Marrero b a Departamento de Personalidad, Facultad de Psicolog ıa, Evaluaci on y Tratamiento Psicol ogico, Universidad de Granada, Campus de Cartuja, s/n 18071 Granada, Spain b Departamento de Lengua Espa~ nola, U.N.E.D., 28040 Madrid, Spain Received 7 November 2000; received in revised form 30 November 2001; accepted 21 May 2002 Abstract This article reports on segmental duration measurements of eight selected consonants (voiceless obstruents, nasals and liquids) and three vowels in 192 disyllabic (CVCe) nonsense words with stress on the first syllable, spoken in isolation by 12 Spanish speakers. Durations as measured based on acoustic discontinuities are discussed along with speaker variability. The intrinsic and context-dependent duration of consonants /f, h, x, s, m, n, l, r/ and vowels /a, i, u/, as well as the inter-speaker variability of these phonemes were analysed. Results show sizable differences in the duration of consonants (voiceless fricatives are longer than voiced fricatives) and vowels (/a/ has a longer duration than /i/ and /u/). With regard to contextual effects, there is a remarkable decrease and increase in vowel durations preceding voiceless fricatives and sonorants, respectively. These effects are present in all speakers. Our results on durational effects indicate that (a) the initial consonants /x, s/ and /r/ show larger differences among speakers; (b) effects for the vowel /a/ are greater than for the vowels /i/ and /u/; and (c) voiceless fricative consonants in medial position show greater intra- speaker idiosyncrasy than voiced consonants. The effects of anticipatory consonant-to-vowel coarticulation are dis- cussed, as well as differences in segmental duration among speakers. Ó 2002 Elsevier Science B.V. All rights reserved. 1. Introduction The duration of speech segments depends on a large number of factors. Intrinsic differences exist according to the type of segment (Crystal and House, 1988a,b; House and Crystal, 1997; OÕShaughnessy, 1981, 1984; Quilis et al., 1979; Mart ınez Celdr an, 1989, among others); to pho- netic context (van Santen, 1992; van Santen et al., 1992); to stress, to the final position in the utter- ance, and so on. Lehiste (1970), Umeda (1977), Crystal and House (1988a,b) and van Santen (1992) offer a comprehensive study of the factors involved in temporal variation of American En- glish speech segments. Similar studies in French have been done by OÕShaughnessy (1981, 1984) and Bartkova (1988), while Laeufer (1992) has compared the duration of segments in English and French. Farnetani and Recasens (1993) have analysed Italian speech, Dutch has been studied by van den Heuvel et al. (1994) and Jongman (1998), and German has been studied by––among * Corresponding author. E-mail address: [email protected] (E. Mendoza). 0167-6393/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. doi:10.1016/S0167-6393(02)00086-9 Speech Communication 40 (2003) 431–447 www.elsevier.com/locate/specom

Transcript of Temporal variability in speech segments of Spanish: context and speaker related differences

Temporal variability in speech segments of Spanish:context and speaker related differences

E. Mendoza a,*, G. Carballo a, A. Cruz a, M.D. Fresneda a, J. Mu~nnoz a,V. Marrero b

a Departamento de Personalidad, Facultad de Psicolog�ııa, Evaluaci�oon y Tratamiento Psicol�oogico, Universidad de Granada,

Campus de Cartuja, s/n 18071 Granada, Spainb Departamento de Lengua Espa~nnola, U.N.E.D., 28040 Madrid, Spain

Received 7 November 2000; received in revised form 30 November 2001; accepted 21 May 2002

Abstract

This article reports on segmental duration measurements of eight selected consonants (voiceless obstruents, nasals

and liquids) and three vowels in 192 disyllabic (CVCe) nonsense words with stress on the first syllable, spoken in

isolation by 12 Spanish speakers. Durations as measured based on acoustic discontinuities are discussed along with

speaker variability. The intrinsic and context-dependent duration of consonants /f, h, x, s, m, n, l, r/ and vowels /a, i, u/,as well as the inter-speaker variability of these phonemes were analysed. Results show sizable differences in the duration

of consonants (voiceless fricatives are longer than voiced fricatives) and vowels (/a/ has a longer duration than /i/ and

/u/). With regard to contextual effects, there is a remarkable decrease and increase in vowel durations preceding

voiceless fricatives and sonorants, respectively. These effects are present in all speakers. Our results on durational effects

indicate that (a) the initial consonants /x, s/ and /r/ show larger differences among speakers; (b) effects for the vowel /a/

are greater than for the vowels /i/ and /u/; and (c) voiceless fricative consonants in medial position show greater intra-

speaker idiosyncrasy than voiced consonants. The effects of anticipatory consonant-to-vowel coarticulation are dis-

cussed, as well as differences in segmental duration among speakers.

� 2002 Elsevier Science B.V. All rights reserved.

1. Introduction

The duration of speech segments depends on a

large number of factors. Intrinsic differences exist

according to the type of segment (Crystal and

House, 1988a,b; House and Crystal, 1997;

O�Shaughnessy, 1981, 1984; Quilis et al., 1979;Mart�ıınez Celdr�aan, 1989, among others); to pho-netic context (van Santen, 1992; van Santen et al.,

1992); to stress, to the final position in the utter-

ance, and so on. Lehiste (1970), Umeda (1977),

Crystal and House (1988a,b) and van Santen

(1992) offer a comprehensive study of the factors

involved in temporal variation of American En-

glish speech segments. Similar studies in French

have been done by O�Shaughnessy (1981, 1984)and Bartkova (1988), while Laeufer (1992) has

compared the duration of segments in English and

French. Farnetani and Recasens (1993) have

analysed Italian speech, Dutch has been studied

by van den Heuvel et al. (1994) and Jongman

(1998), and German has been studied by––among

*Corresponding author.

E-mail address: [email protected] (E. Mendoza).

0167-6393/02/$ - see front matter � 2002 Elsevier Science B.V. All rights reserved.

doi:10.1016/S0167-6393(02)00086-9

Speech Communication 40 (2003) 431–447www.elsevier.com/locate/specom

others––Braunschweiler (1997) and Hertrich and

Ackermannn (1995, 1999). For Spanish, data are

available in pioneering studies by Navarro Tom�aas(1918) and in the cross-linguistic studies byZimmerman and Sapon (1958) in English and

Spanish, and Delattre (1965) in English, German,

French and Spanish. Also, Quilis et al. (1979),

Borzone and Signorini (1983), Mart�ıınez Celdr�aan(1989) and more recently, Gusp�ıı Saiz (1993),Mar�ıın (1994–1995), Cuenca (1996–1997), and DelBarrio and Torner (1999) have reported further

data about segmental duration in Spanish.Studies of temporal aspects of speech segments

have focussed on the determination of the dura-

tion of each segment and the effect of coarticula-

tion, or ‘‘the influence of one speech segment upon

the other’’ (Daniloff and Hammarberg, 1973, p.

239). In general, research into temporal aspects of

coarticulation has concentrated on anticipatory

consonant-to-vowel coarticulation and, more spe-cifically, on the temporal influence of stop conso-

nants on the preceding vowel. Furthermore, nearly

all the evidence reveals a decrease in duration of

vowels preceding a voiceless stop, and an increase

in duration of vowels preceding a voiced stop.

Two hypotheses have been proposed to explain

the lengthening effect in vowels preceding voiced

stop consonants and the inverse phenomenon. Thefirst hypothesis, sometimes known as ‘‘temporal

compensation’’ (Port et al., 1980) describes a

tendency towards a constant duration of the

vowelþ stop closure (Kohler, 1984), assuming arelatively fixed VC duration in the context of both

voiced and voiceless consonants. Let us remember

that voiced (lenis) are on the whole shorter than

voiceless (tense) ones in coda position. The speakeradjusts the duration of the vowel and closure ac-

cording to the sonority of the consonant, so that

the total duration of the syllable remains similar.

The second hypothesis attributes these effects to

phonological rules or to auditory feedback pro-

cesses used actively by speakers in order to gen-

erate differences (Kluender et al., 1988; Walsh and

Parker, 1981; Braunschweiler, 1997). Daniloffet al. (1980) consider both hypotheses to be viable

in the coarticulation process; the passive adjustment

of phonetic accommodation operating in carry-

over, perseverative or left-to-right articulation,

and the higher levels of phonological processing

being responsible for anticipatory or right-to-left

coarticulation.

These studies may be of interest both for pho-netics and for a more psychological approach to

speaker identity and the variations which may oc-

cur in such common phenomena between different

speakers. To integrate phonetic and psychological

research, we wonder whether stop consonants are

the most appropriate consonantal contexts for

studying the lengthening of the preceding vowel,

given that, due to their shorter duration, they re-veal less inter-speaker variance (Johnson et al.,

1984; O�Shaughnessy, 1987).van den Heuvel et al. (1994) have studied the

duration of vowels /a, i, u/ preceding or following

consonants /p, t, k, d, s, m, n, r/, in an attempt to

determine contextual effects and to identify which

vocalic segments are realised with more or less

speaker-specificity within a given phonetic context(isolated/CVCc/nonsense words). The interest oftheir study goes beyond a strict analysis of speaker

idiosyncrasy, with the authors claiming that such

idiosyncrasy can have a significant effect on re-

search results.

The present study is similar in approach to that

by van den Heuvel et al. (1994), but uses different

consonantal contexts and follows different objec-tives. We have selected isolated /CVCe/ nonsense

words, retaining the vowels /a, i, u/ but modifying

the consonants used by van den Heuvel et al.

(1994) to a study of /f, h, x, s, m, n, l, r/. Theprincipal reason for selecting these phonemes has

been their potential value for speaker identifica-

tion; therefore, we have selected those phonemes

which, according to previous studies, present themost speaker-dependent realisation. We have se-

lected fricatives /f, h, x/, and /s/ because they areconsonants of long duration, and according to

various authors (for example, O�Shaughnessy,1984; van den Heuvel et al., 1994), long-duration

sounds may be more speaker-dependent. These

sounds present few acoustic discontinuities and are

therefore easy to identify and analyse (Hoole et al.,1993).

According to O�Shaughnessy (1987), the morespeaker-dependent sounds are vowels, nasals and

fricatives, in descending order. For this reason we

432 E. Mendoza et al. / Speech Communication 40 (2003) 431–447

have selected two of the Spanish nasal consonants:

/m, n/. /r/ is selected for several reasons: (a) the

apicoalveolar realisation of the phoneme in

Spanish (Carballo, 1995; Carballo et al., 1997);and (b) the lengthening of the vowels preceding /r/

(Nooteboom and Slis, 1972), a phenomenon con-

sidered by van den Heuvel et al. (1994) to be an

obligatory rule in Dutch.

Finally, we have included the liquid consonant

/l/ in order to differentiate two groups of conso-

nants: the voiceless /f, h, x/ and /s/ with respect tothe voiced /m, n, l/ and /r/, and within this lastgroup, the nasal sounds /m, n/ with respect to the

liquid sounds /l/ and /r/. Of the five vowels in

Spanish, /a, e, i, o, u/, we have selected only /a, i,

u/, partly because they are the most extreme, and

partly to follow the design of van den Heuvel et al.

(1994) rather more closely. The palatal consonants

(/ , ffi/ and /ð/) have been excluded because theypresent much longer transitions than other Span-ish phonemes. We have centered our study on a

syllabic structure CV in order to follow the pro-

cedure of van den Heuvel et al. (1994).

Objectives of the study were as follows: (1) to

determine intrinsic duration of the vowels /a, i, u/

and of the consonants /f, h, x, s, m, n, l, r/, and theduration of these consonants relative to their po-

sition in a word (initial position, C1 and medialposition, C2 respectively), (2) to study the existence

of coarticulation: if the duration of a phoneme is

affected by the phoneme or phonemes preceding it

(carryover coarticulation) or following it (antici-

patory coarticulation), (3) to determine whether

the absolute or relative duration of phonemes is

similar in different speakers or whether, on the

contrary, there exist cues which would enable us todistinguish between different speakers, and (4) to

show which phonemes in different positions pre-

sent greater variance between speakers; or to put

it another way, which phonemes show greater

speaker idiosyncrasy.

An objection to many of the studies we have

reviewed concerning the duration of speech seg-

ments and coarticulation is that they are based ona small number of subjects or just a few observa-

tions. If variance exists between different speakers

and in intra-speaker stability, hypothesis on which

studies of speaker identity are based, a larger

number of subjects and observations are needed in

order to achieve confirmation of such identity. For

this reason we have worked with more subjects

and, above all, with many more observations andmeasurements in order to address the question of

variance/speaker identity with greater precision.

2. Method

2.1. Subjects

Twelve subjects took part in the study (six maleand six female), with ages ranging from 26 to 46

years old. All subjects were native to Granada

(Spain) and had lived in Granada all their lives.

No subject reported a history of speech or audi-

tory problems and at the time of recording the

experiment no subject was suffering from a cold or

infection of the respiratory tracts.

2.2. Speech sample

The speech sample consisted of 192 phonetic

sequences /C1VC2e/ (24 words and 168 nonsense

words), with the following break-down:Eight initial-position consonants (C1): /f, h, x, s,

m, n, l/ and /r/. Three vowels (V): /a, i/ and /u/.

Eight medial-position syllable-initial consonants

(C2): /f, h, x, s, m, n, l, r/. Multiplying the possiblecombinations (8�3�8) gives a total of 192 words ornon-words with stress on the first syllable. All se-

quences ended in the vowel /e/ (e.g., ruse, jane, lase,

jine. . .). Once the combinations were formed theywere randomised and presented to the subjects as

shown in Appendix A. Each consonant appears 24

times, both in C1 and in C2, and each vowel ap-

pears 64 times.

2.3. Procedure

Each subject was required to read some filler

words before the 192 experimental words and with

a short pause (2 or 3 s approximately) between

each one. The function of the filler words was to

familiarise the subjects with the actual corpus used

in the experiment and to train them to avoid thetonal descent characteristic of endings, the ‘‘list

E. Mendoza et al. / Speech Communication 40 (2003) 431–447 433

effects’’ since the target words were read in isola-

tion. The words were arranged in columns (1 col-

umn per page and 15 words in each column), typed

with a 24 point font size. Subjects were asked toread at their normal rhythm and with normal in-

tonation. If they committed a reading error they

were asked to repeat the misread word.

Speech samples were recorded using a AKG D

222 EB microphone with flat response, and a Sony

77 ES digital audiotape recorder with a sampling

frequency of 48 kHz. Volume was set between )30and )20 dB. All recordings took place in the VoiceLaboratory of the Psychology Department of the

University of Granada. Speech samples were dig-

itised at a 48 kHz sampling frequency, and later

sent to a device CSL 4300b (Computer Speech

Lab. Kay Elemetric Corp.) for its subsequent

analysis and interpretation.

2.4. Acoustic analysis

Acoustic analysis focused on the temporal seg-mentation of segments C1, V and C2. The vowel /e/

ending all the words in the corpus, was excluded

from the analysis. Segmentation was done using a

methodology similar to that described by van den

Heuvel et al. (1994), involving simultaneous use of

the waveform of the signal, the spectrogram, four

formants fitted by LPC analysis and the intensity

profile. Time cursors were used to select segmentsof interest; and were auditory controlled. The

measurement of their duration was taken and their

value (in ms) was labelled by hand. Transitions

C1V and VC2 were assumed to form part of the

vowel. An example of the segmentation procedure

is plotted in Fig. 1.

The vowel durations as well as the duration of

the consonants C1 and C2 were submitted to ananalysis of variance (ANOVA) since we were in-

terested in the two segment-bound factors in our

data set (namely, intrinsic duration and phonetic

context).

3. Results

This section is divided into three subsections,corresponding to the planned objectives of this

paper. First, we describe the durations found for

the speech segments within each context and their

corresponding duration relative to its position inthe context. Second, we show our findings re-

garding the contextual effects in duration in order

to analyse the existence of coarticulation. Third,

we give the results that we have obtained about the

speaker dependence of the durations of the pho-

nemes considered in this work. Finally, we de-

scribe the phonemes which seem to have a greater

speaker idiosyncrasy, that is which have a strongerspeaker dependence.

3.1. Segment durations in each context

Tables 1–4 and Figs. 2–6 show mean and

standard deviations (in ms) of the segment dura-

tions across contexts, and of durations in relation

to context. Specifically we have considered the

vowels /a, i, u/ and the consonants /f, h, x, s, m, n,l/ and /r/. Three two-factor ANOVAs were carriedout for the following variables: (a) duration of

initial position consonant, with factors C1 (8 lev-

els) and speaker (S) (12 levels); (b) duration of

vowel, with 3 levels in factor V and 12 in factor S;

(c) duration of C2, with 8 levels in factor C2 and 12

levels in factor S. Speakers were randomized in all

analyses. Results were as follows:

ANOVA 1: Significant differences were foundin variables C1 (F ð7; 77Þ ¼ 16:683, p < 0:001,

Fig. 1. Spectrographic representation of the non-word ‘‘sile’’.

434 E. Mendoza et al. / Speech Communication 40 (2003) 431–447

g2 ¼ 0:603) and S (F ð11; 77Þ ¼ 26:604, p < 0:001,g2 ¼ 0:792), and also in the interaction C1�S (F ð77;2208Þ ¼ 4:421, p < 0:001, g2 ¼ 0:134). A TukeyHSD post-hoc comparison (a ¼ 0:05) establishedthe following homogeneous subsets: [l, m, n] and [x,

r, h, f, s], with longer durations in subset 2.

ANOVA 2: Vowel duration showed differences

in factors V (F ð2; 22Þ ¼ 35:551, p < 0:001, g2 ¼0:764), S (F ð11; 22Þ ¼ 75:789, p < 0:001, g2 ¼0:974) and in the interaction of both factors

(F ð22; 2268Þ ¼ 1:900, p < 0:01, g2 ¼ 0:018). TukeyHSD post-hoc comparison (a ¼ 0:05) revealed two

Table 1

Means and standard deviations of the initial consonant (C1) duration in absolute and related-to-the-following-vocal values (in ms)

Absolute duration Related to /a/ Related to /i/ Related to /u/

Mean SD Mean SD Mean SD Mean SD

C11=f= 165.56 68.03 162.78 93.97 178.06 55.77 155.84 42.21

C12=h= 162.57 44.16 157.49 39.97 161.29 46.70 168.94 45.20

C13=x= 160.56 46.02 157.61 40.74 175.40 51.78 148.66 41.01

C14=s= 171.37 39.80 164.45 46.93 172.41 44.89 177.25 44.93

C15=m= 128.95 35.17 130.64 41.08 136.17 43.52 120.05 32.69

C16=n= 134.35 37.61 132.87 39.85 129.87 32.49 140.29 39.65

C17=l= 127.05 35.17 123.11 34.09 129.57 36.35 128.47 35.07

C18=r= 161.82 46.72 169.17 43.85 161.37 49.15 154.92 46.42

C1Total 151.52 49.32 149.77 53.28 155.52 49.36 149.30 44.78

Table 2

Means and standard deviations of the medial consonant (C2) duration in absolute and related-to-the-preceding-vocal values (in ms)

Absolute duration Related to /a/ Related to /i/ Related to /u/

Mean SD Mean SD Mean SD Mean SD

C21=f= 183.43 34.40 187.41 34.13 183.53 33.46 179.33 35.46

C22=h= 184.48 36.02 189.27 37.20 183.46 36.78 180.69 33.85

C23=x= 169.07 33.96 173.99 31.98 169.17 33.95 164.06 35.48

C24=s= 179.48 39.79 183.13 35.94 177.52 43.05 177.77 40.22

C25=m= 111.47 23.24 107.60 17.81 119.35 27.33 107.47 21.74

C26=n= 103.41 54.14 105.55 88.63 102.15 20.89 102.52 23.54

C27=l= 98.71 19.29 97.41 18.29 100.85 20.42 97.87 19.14

C28=r= 146.42 26.87 149.17 29.37 146.17 25.97 143.93 25.11

C2Total 147.05 49.40 149.20 56.20 147.78 45.67 144.21 45.51

Table 3

Means and standard deviations of the vowel duration (V) in absolute and related-to-the-preceding-consonant (C1) values (in ms)

Absolute duration C11=f= C12=h= C13=x= C14=s= C15=m= C16=n= C17=l= C18=r=

V1=a= 143.54 145.97 147.25 142.38 148.17 139.88 144.00 137.65 143.05

29.72 32.66 29.68 30.55 32.35 27.77 25.70 28.98 29.00

V2=i= 131.21 134.53 134.14 132.45 131.82 125.24 128.78 131.34 131.36

27.61 32.65 27.58 27.42 27.41 25.38 25.44 25.35 28.80

V3=u= 129.39 129.01 130.36 130.94 131.53 122.39 133.38 126.90 130.64

26.83 28.35 25.93 26.63 28.66 23.18 27.62 27.38 26.23

VTotal 134.71 136.50 137.24 135.25 137.17 129.16 135.39 131.96 135.02

28.77 31.97 28.61 28.61 30.45 26.55 26.95 27.54 28.52

bold (means), italic (SD). Preceding consonants: C11 . . .C18.

E. Mendoza et al. / Speech Communication 40 (2003) 431–447 435

homogeneous subsets: [u, i] and [a], with longerdurations for /a/.

ANOVA 3: Duration of C2 showed significant

differences in factors C2 (F ð7; 77Þ ¼ 76:778,p < 0:001, g2 ¼ 0:875), S (F ð1; 77Þ ¼ 18:418, p <0:001, g2 ¼ 0:725), and the interaction C2�S(F ð77; 2208Þ ¼ 4:556, p < 0:001, g2 ¼ 0:137). Tu-key HSD post-hoc comparison (a ¼ 0:05) gave the

following homogeneous subsets, in order ofshorter to longer duration––subset 1: /l, n/; subset

2: /n, m/; subset 3: /r/; subset 4: /x/ and subset

5: /s, f, h/.Applying Bonferroni t-test (p < 0:05), conso-

nants /f, h/ and /x/ show longer duration in C2 thanin C1, while the opposite occurs with consonants

/n/ and /l/. No durational differences between the

two contexts were found with consonants /s, m, r/.

Fig. 2. Comparison of the initial consonant (C1) durations (in

ms) related to the following vowels /a/, /i/ and /u/.

Fig. 3. Durational values (in ms) of the medial consonant (C2)

related to the preceding vowels /a/, /i/ and /u/.

Table 4

Means and standard deviations of the vowel duration (V) in absolute and related-to-the-following-consonant (C2) values (in ms)

Absolute duration C21=f= C22=h= C23=x= C24=s= C25=m= C26=n= C27=l= C28=r=

V1=a= 143.54 133.68 132.53 130.86 137.88 148.63 156.52 157.60 150.66

29.72 26.59 25.31 23.02 25.81 30.24 33.36 31.20 26.90

V2=i= 131.21 122.96 119.19 123.88 125.55 133.94 137.65 140.33 146.14

27.61 22.07 23.39 24.77 25.94 27.17 28.71 28.43 28.37

V3=u= 129.39 121.78 123.38 122.43 124.32 135.09 137.19 137.17 133.78

26.83 23.01 24.29 24.03 22.47 29.74 28.04 29.16 27.53

VTotal 134.71 126.14 125.03 125.72 129.25 139.22 143.79 145.04 143.52

28.77 24.48 24.89 24.15 25.45 29.74 31.35 30.85 28.42

bold (means), italic (SD). Following consonants: C21 . . .C28.

Fig. 4. Comparison between the durational values (in ms) of

the vowel /a/ related to the preceding (C1) and following (C2)

consonants /f, h, x, s, m, n, l, r/.

Fig. 5. Comparison between the durational values (in ms) of

the vowel /i/ related to the preceding (C1) and following (C2)

consonants /f, h, x, s, m, n, l, r/.

436 E. Mendoza et al. / Speech Communication 40 (2003) 431–447

3.2. Contextual effects in duration: temporal coar-

ticulation

Effects of C1 on vowel duration (V): Table 5

shows principal effects and interactions of factorsS, C1 and V. Factors C1 and V were considered

fixed, the speaker was treated as random factor. C1was nested in V. All the three principal factors

were shown to be significant (p < 0:001), as well asthe interaction S�V (p < 0:01). Interaction S�C1was not significant. The correlation ratio g2, givenby the formula g2 ¼ SSfactor=SStotal, was used to

determine the strength of (association for) each

factor.

As may be seen from the table, the greatest ef-

fect corresponds to factor V, followed by factor S,while C1 shows the smallest effect. Tukey HSD

post-hoc comparisons (a ¼ 0:05) were used to de-termine homogeneous subsets in vowel duration

according to the preceding consonant. This re-

sulted in the following two subsets, in order of

shorter to longer duration––subset 1: /m, l, r, x, n/

and subset 2: /l, r, x, n, f, s, h/. A large overlapexists between the two subsets.

Effects of C2 on vowel duration. Table 6 shows

principal effects and interactions of factors S, C2and V. Factors C2 and V were considered fixed,

while the speaker was treated random. C2 was

nested in V. The three principal factors were

shown to be significant (p < 0:001), as were theinteractions S�V (p < 0:001) and S�C2 (p < 0:05).The greatest effect corresponds to factor V,followed by factor C2 and factor S. Tukey HSD

post-hoc comparisons (a ¼ 0:05) established twohomogeneous subsets comprising the following

consonants, in order of shorter to greater vowel

duration––subset 1: / h, x, f, s/ and subset 2: /m, r,

Fig. 6. Comparison between the durational values (in ms) of

the vowel /u/ related to the preceding (C1) and following (C2)

consonants /f, h, x, s, m, n, l, r/.

Table 5

Degrees of freedom, F-ratios and g2-values for the speaker (S), initial consonant (C1) and vowel (V) factors and their interactions onthe vowel duration variable

Effect dfeffect SMeffect dferror SMerror F g2

S 11 88,729.28 2016 600.690 147.71*** 0.4463

C1 21 1640.36 231 657.806 2.49*** 0.1848

V 2 41,621.41 22 1170.737 35.55*** 0.7637

S�C1 231 657.81 2016 600.690 1.10 0.1112

S�V 22 1170.74 2016 600.690 1.95** 0.0208

**p < 0:01, ***p < 0:001.

Table 6

Degrees of freedom, F-ratios and g2-values for the speaker (S), final consonant (C2), vowel (V) factors and their interactions on thevowel duration variable

Effect dfeffect SMeffect dferror SMerror F g2

S 11 88,729.29 2016 525.176 168.95*** 0.4796

C2 21 9146.85 231 634.427 14.41*** 0.5672

V 2 41,621.41 22 1170.737 35.55*** 0.7637

S�C2 231 6344.30 2016 525.176 1.21* 0.1215

S�V 22 1170.74 2016 525.176 2.23*** 0.0237

*p < 0:05, ***p < 0:001.

E. Mendoza et al. / Speech Communication 40 (2003) 431–447 437

n, l/. This effect is similar in all vowels except with

nasals: the vowel /a/ is made longer before /n/ than

before /m/.

Effects of the vowel on C1 duration: Table 7shows principal effects and interactions of factors

S, C1 and V. Again, factors C1 and V were con-

sidered fixed factors, and S was treated as random.

V was nested in C1. The three principal factors

were shown to be significant (p < 0:001), as wasthe interaction S�C1 (p < 0:001). Interaction S�Vwas not significant. The greatest effect corresponds

to factor C1, followed by factor S and factor V.Tukey HSD post-hoc comparisons (a ¼ 0:05)established two homogeneous subsets in the du-

ration of C1 related to the vowel which fol-

lows––subset 1: /u, a/ and subset 2: /a, i/. Duration

of the initial consonant is shorter in subset 1. It

was observed that the vowel /i/ may lengthen the

duration of the preceding consonant relative to /u/,

in spite of the fact that the two vowels have asimilar intrinsic duration. This effect was observed

with the consonants /f, x, m/ only.

Effects of the vowel on C2 duration: Table 8

shows the principal effects and interactions of

factors S, C2 and V. Factors C2 and V were con-

sidered fixed, the speaker was treated as random.

V was nested in C2. Only the factors speaker and

vowel were shown to be significant (p < 0:001), asalso the interaction of the two factors (p < 0:05).The effect of the vowel on the duration of C2 was

not shown to be significant.

3.3. Inter-speaker differences

A specific analysis of variables which showed

significant interaction with the factor speaker wasundertaken. We have considered three interactions

related to intrinsic durations of the analysed seg-

ments (interaction S�C1 on the duration of C1,interaction S�V on the duration of the V, and in-teraction S�C2 on the duration of C2), and twointeractions related to contextual effects (interac-

tion S�V on the duration of C1 and interactionS�C2 on the duration of V).The following results were obtained:

(i) Interaction S�C1 on duration of C1: Table 9

shows values of F and g2 to estimate the magnitudeof the effect. Both statistical significance of

Table 7

Degrees of freedom, F-ratios and g2-values for the speaker (S), initial consonant (C1), vowel (V) factors and their interactions on theinitial consonant duration variable

Effect dfeffect SMeffect dferror SMerror F g2

S 11 1,51,173.40 2016 1210.890 124.84*** 0.4051

C1 7 94,796.80 77 5682.318 16.68*** 0.6026

V 16 6588.80 176 1657.137 3.98*** 0.2654

S�C1 77 5682.30 2016 1210.890 4.69 0.1519

S�V 176 1657.10 2016 1210.890 1.37** 0.1062

**p < 0:01, ***p < 0:001.

Table 8

Degrees of freedom, F-ratios and g2-values for the speaker (S), final consonant (C2), vowel (V) factors and their interactions on the finalconsonant duration variable

Effect dfeffect SMeffect dferror SMerror F g2

S 11 94,606.00 2016 85.78*** 0.3188

C2 7 3,94,428.80 77 5136.575 76.79*** 0.8746

V 16 1658.90 176 1361.293 1.22 0.0997

S�C2 77 5136.60 2016 1102.915 4.66*** 0.1510

S�V 176 1361.30 2016 1102.915 1.23* 0.0972

*p < 0:05, *** p < 0:001.

438 E. Mendoza et al. / Speech Communication 40 (2003) 431–447

F ðp < 0:05Þ and the effect of magnitude is very lowin S7, indicating that this speaker presents few

differences in duration of the initial consonants inour study. The greatest effects and hence the

greatest temporal differentiation in initial-position

consonants correspond to S4 and S8. Analysis of

durational data of each speaker reveals that S5

presents a very long duration in /r/ (�xx ¼ 242 ms,SD ¼ 32:51 ms); the opposite is true for S9

(�xx ¼ 171:03 ms, SD ¼ 43:37 ms). In the post-hocanalysis for each speaker, only S4 was shown to

conform to the general duration expectation of

initial-position consonants.

(ii) Interaction S�V on duration of V: Table 10

shows values of F and g2 for vowel duration in thedifferent speakers studied. S5 and S7 do not show

significant differences, indicating that in these

speakers duration of the vowels /a, i, u/ is similar.

The greatest effects correspond to S8 and S9. Post-

hoc analysis of subjects with significant differences

revealed that they all conform to the general ex-

pectation of vowel duration: the vowel /a/ is longer

than both /i/ and /u/.(iii) Interaction S�C2 on duration of C2: Table 11

shows F and g2 values for each subject. As may beobserved, differences are significant in all speakers.

Post-hoc analysis of each speaker showed that

none conformed to the general durational expec-

tation, although they all show a similar tendency.

(iv) Interaction S�V on duration of C1: As can be

seen in Table 12, initial consonant duration issimilar for all vowels in all speakers, except in S2.

Post-hoc analysis of S2 has shown that the dura-

tion of /u/ is significantly bigger than that of /a/. S9

has a tendency towards longer duration of C1 be-

fore /i/ than before /u/ and /a/. This effect is sig-

nificant for consonants /f, x, m/ only.

(v) Interaction S�C2 on duration of V: Table 13

shows values of F and g2 for each subject�s vocalicduration in relation to the following consonant. In

all speakers significant differences were found in

the duration of the vowel related to C2. In general,

Table 9

F-ratios and g2 values for each speaker on the initial consonant(C1) duration

Speaker F ð7; 184Þ g2

S1 15.468*** 0.370

S2 10.273*** 0.281

S3 9.273*** 0.259

S4 29.315*** 0.527

S5 24.077*** 0.478

S6 7.404*** 0.220

S7 2.111* 0.074

S8 29.165*** 0.526

S9 11.448*** 0.303

S10 12.745*** 0.327

S11 5.432*** 0.171

S12 10.281*** 0.281

*p < 0:05, ***p < 0:001.

Table 11

F-ratios and g2 values for each speaker on the medial consonant(C2) duration

Speaker F ð7; 184Þ g2

S1 181.905*** 0.874

S2 161.187*** 0.860

S3 4.483*** 0.146

S4 98.447*** 0.789

S5 134.665*** 0.837

S6 159.004*** 0.858

S7 155.558*** 0.855

S8 209.590*** 0.889

S9 72.929*** 0.735

S10 86.436*** 0.767

S11 70.744*** 0.729

S12 100.047*** 0.792

***p < 0:001.

Table 10

F-ratios and g2 values for each speaker on the vowel (V) du-ration

Speaker F ð2; 189Þ g2

S1 9.537*** 0.092

S2 7.944*** 0.078

S3 5.780*** 0.058

S4 19.946*** 0.174

S5 2.547 0.026

S6 6.325*** 0.063

F ð7; 184ÞS7 2.757 0.028

S8 36.913*** 0.281

S9 23.285*** 0.198

S10 10.746*** 0.102

S11 16.683*** 0.150

S12 19.571*** 0.172

***p < 0:001.

E. Mendoza et al. / Speech Communication 40 (2003) 431–447 439

duration of vowels is shortened before voiceless

fricative consonants and increases before voiced

consonants in all speakers. We note that duration

of vowel /a/ may be longer when preceded by /n/than when preceded by /m/, although this effect

was shown only in two (S9, S12) out of twelve

subjects.

Analysis of the effect of lengthening in each

vowel in relation to the ensuing consonant gave

the following results:

Vowel /a/: Anticipatory coarticulation was not

present in S3. Greatest effects were shown in S6, S9and S12. Vowel /i/: No coarticulation effect was

shown in S1, nor did post-hoc analysis establish

differences in S2 or in S9. Greatest effects were

shown in S6, S8 and S12. Vowel /u/: No coarticu-

lation effect was shown in S1, S4, S7 or S11. Post-

hoc analysis did not establish differences in S5.

Except in S6, effects were lesser than in the other

two vowels.

3.4. Analysis of temporal segments presenting

greatest inter-speaker differentiation

Table 14 shows one-factor ANOVA values F

and g2 for the following variables: duration of C1,duration of V and duration of C2 respectively. The

speaker factor is considered as an independentvariable. With regard to C1 duration we note that

the greatest effects correspond to consonants /x/,

/s/ and /r/. In vowel duration, greatest effects cor-

respond to the vowel /a/, followed by /i/. With re-

gard to duration of medial consonants, the table

Table 12

F-ratios and g2-values for each speaker on the related-to-the-following-vowel duration of C1

Speaker F ð2; 189Þ g2

S1 0. 280 0.003

S2 3.124� 0.032

S3 0.023 0.000

S4 2.589 0.027

S5 2.619 0.075

S6 1.174 0.012

S7 0.282 0.003

S8 2.280 0.024

S9 3.296� 0.034

S10 0.402 0.004

S11 0.192 0.002

S12 0.608 0.005

* p < 0:05.

Table 13

F-ratios and g2-values for each speaker on the related-to-the-following-consonant duration of the vowel

Speaker F ð7; 184Þ g2

S1 5.065��� 0.162

S2 10.237��� 0.280

S3 6.062��� 0.187

S4 4.399��� 0.143

S5 6.645��� 0.202

S6 22.415��� 0.460

S7 4.485��� 0.146

S8 7.968��� 0.233

S9 5.969��� 0.185

S10 10.539��� 0.286

S11 5.767��� 0.180

S12 11.5597��� 0.305

*** p < 0:001.

Table 14

F-ratios and g2-values for each analysed temporal segment (C11 . . .C18; V1 . . .V3; C21 . . .C28) on the differentiation among subjects

Initial consonant (C1) Vowel (V) Medial consonant (C2)

F ð11; 276Þ g2 F ð11; 756Þ g2 F ð11; 276Þ g2

C11=f= 6.274*** 0.200 V1=a= 100.047*** 0.593 C21=f= 103.233*** 0.804

C12=h= 18.016*** 0.416 V2=i= 83.715*** 0.549 C22=h= 85.686*** 0.774

C13=x= 37.315*** 0.598 V3=u= 22.351*** 0.245 C23=x= 48.860*** 0.661

C14=s= 34.674*** 0.580 C24=s= 88.449*** 0.779

C15=m= 21.792*** 0.465 C25=m= 3.064** 0.109

C16=n= 17.370*** 0.409 C26=n= 2.726** 0.98

C17=l= 20.644*** 0.451 C27=l= 19.289*** 0.435

C18=r= 32.025*** 0.561 C28=r= 3.578*** 0.125

**p < 0:01, ***p < 0:001.

440 E. Mendoza et al. / Speech Communication 40 (2003) 431–447

shows that voiceless fricative consonants present

greater inter-speaker differentiation, or in other

words, a more idiosyncratic realisation in each

speaker, with greater effects. Voiced consonantsshow a lower differential value, with effects of nasal

consonants and the trill being particularly reduced.

4. Discussion

The following findings result from our study.

Significant differences exist in the duration of

consonants /f, h, x, s, m, n, l, r/ in initial and medialpositions of nonsense words. In both positions,

longest durations correspond to the voiceless fric-atives /f, h, x, s/ and to the trill /r/, although dif-ferences are greater in medial-position consonants.

Temporal differences were also detected in the

vowels /a, i, u/: the vowel /a/ has a longer duration

than /i/ and /u/. There exists significant inter-

speaker variance in the duration variables studied,

as well as in speaker/segment interaction.

With regard to contextual effects, there is anotable decrease in vocalic duration preceding

voiceless fricative consonants, and an increase in

vocalic duration preceding voiced consonants; this

effect is similar for all the vowels. With regard to

nasals, our results show that the vowel /a/ may be

longer before /n/ than before /m/; however, this

effect was produced in speakers S9 and S12 only.

Effects of the initial consonant on vowel durationare weaker, with a strong overlap in the subsets

obtained. Nevertheless, it is interesting to note that

vowel /u/ was longer in duration when followed by

/n/ than when followed by /m/, although this effect

was present in S5 only.

The study shows that, relative to /u/, the vowel

/i/ may lengthen the duration of preceding /f/, /x/,

and /m/, in spite of the fact that the intrinsic du-rations of the two vowels are very similar. This

effect was present (p < 0:05) only in S2 and S9. Novocalic effects on following consonants were

found.

Concerning inter-speaker variability, our data

show differences in the duration of the initial

consonant, although only S4 conforms to the

general model for duration of initial-positionconsonants ([l, m, n] [x, r, h, s]). In vowel duration,

all speakers except S8 and S9 pronounce /a/ with

longer duration than the other vowels. All speak-

ers conform to the general durational model for

medial-position consonants.Anticipatory consonant-to-vowel coarticulation

in the duration of the vowel related to the conso-

nant following it, is present in all speakers; dura-

tion of vowels preceding voiced consonants is

greater; duration of vowels followed by voiceless

fricatives is shorter, although this effect is not

similar in all the vowels. We may state that the

effect is greatest for the vowel /a/, as it was shownin all subjects except S3. Next comes vowel /i/,

while vowel /u/ shows the least effect. speaker 6

presents the greatest coarticulation effect in all

vowels, while in S1 it is present only for vowel /a/.

Turning to the effect of duration of each seg-

ment on the degree of inter-speaker differentiation,

our results indicate that (a) the initial-position

consonants /x, s, r/ show most differences betweenspeakers; (b) effects are greater for the vowel /a/

than for the vowels /i, u/; (c) voiceless fricative

consonants in medial position show greater inter-

speaker variability than voiced consonants.

Generally speaking, the results obtained are in

accord with previously published findings. We

have detected intrinsic differences in the duration

of different speech segments, and also an antici-patory consonant-to-vowel coarticulation effect. In

addition, we have found notable differences be-

tween speakers with regard to the temporal vari-

ables in our analysis. The most significant of these

findings are discussed below.

4.1. Intrinsic durations of segments C1, V and C2

The comparison of our data with the durational

values of the corresponding consonants or vowels

found by previous works in Spanish (NavarroTom�aas, 1918; Borzone and Signorini, 1983;

Mart�ıınez Celdr�aan, 1989; Gusp�ıı Saiz, 1993; Carb-allo, 1995; Del Barrio and Torner, 1999) and in

other European languages (Laeufer, 1992; Farne-

tani and Recasens, 1993; Antoniades and Strube,

1984; van den Heuvel et al., 1994) is not very clear

because the intrinsic durations of speech segments

depend on a large number of factors, includingcontextual (position in the word), speaking style

E. Mendoza et al. / Speech Communication 40 (2003) 431–447 441

(reading versus spontaneous speech, reading

words versus reading texts, reading with or with-

out carrier phrase, etc.), and the language con-

cerned. Nevertheless, keeping this in mind, let ushighlight the observations done by a number of

pertinent works.

Del Barrio and Torner (1999) have studied the

durations of the consonants /f, x, h, s, r/ for twoSpanish speakers reading a text. They found values

somewhat shorter than our present data for the

same items, as one would expect because the words

in our analysis are read in isolation. Carballo(1995) finds a longer duration for the initial-posi-

tion trill /r/ in children. The difference may again

be explained by differences in procedure: she deals

with a group of children who had to name a

drawing of an object containing the phoneme in

initial position (e.g., rana/frog), while in our study

duration was determined through reading isolated

nonsense words.For Spanish vowels, Mar�ıın (1994–1995) and

Cuenca (1996–1997) found that /a/ is longer than

/i/, and /i/ is longer than /u/ as well. The data of the

present work show the same durational order for

these three vowels but our values for the duration

of the corresponding vowels are longer.

Farnetani and Recasens (1993) have noted

shorter duration of vowels in connected speechthan in isolated words in Italian, and we may as-

sume that the same phenomenon occurs with

consonants. van den Heuvel et al. (1994) gave

temporal values in Dutch speakers for some of the

consonants we have analysed, (namely /s, m, n/)

obtaining lower durations than us for the corre-

sponding Spanish consonants. In French voiceless

fricatives Laeufer (1992) found a similar averageduration to the value we obtained. For the Ger-

man vowels /a, i, u/, Antoniades and Strube (1984)

gave longer durations than our study gives for the

corresponding Spanish vowels.

4.2. Context-related duration of speech segments

Our findings indicate an influence of phonetic

context, or coarticulation, principally of anticipa-

tory coarticulation. Effects of carryover coarticu-

lation are small: they were detected in only onespeaker (S5), in one vowel (/u/), and in one initial

position consonant (/n/) relative to /m/. Hoole

et al. (1993) suggest that carryover coarticulation

is more readily shown in spectral measurements,

while temporal measurements such as ours aremore sensitive to anticipatory coarticulation. It is

true that van den Heuvel et al. (1996), following

the same experimental procedure as van den

Heuvel et al. (1994), find consonant-to-vowel car-

ryover coarticulation, but this involves taking the

spectral measurement of F2 as a dependent vari-

able. Given that all the measurements we have

used indicated only a minimal effect in one speakeralone, we may conclude that our temporal mea-

surements are not sufficiently sensitive to isolate

this effect.

The chief finding has been a contextual effect of

anticipatory coarticulation in the duration of the

vowel related to the consonant following it. In

general, vowels increase their duration if they are

followed by voiced consonants and lessen theirduration if they are followed by voiceless fricative

consonants. This finding may be explained in two

ways: (1) Anticipatory coarticulation corresponds

to a compensation effect, given that voiceless fri-

cative consonants, which reduce the duration of

the preceding vowel, are longer than voiced con-

sonants, which lengthen the duration of the vowel;

(2) By the same token, anticipatory coarticulationcorresponds to an effect of sonority. The first case

posits an automatic temporal adjustment mecha-

nism (Port et al., 1980; Kohler, 1984; Farnetani

and Recasens, 1993); the second case brings us to

the hypothesis of the existence of phonological

rules (such as the voiced/voiceless contrast) oper-

ating in the phenomenon of anticipatory coartic-

ulation (Daniloff et al., 1980; Kluender et al., 1988;Walsh and Parker, 1981; Braunschweiler, 1997).

Both hypotheses might appear to be viable, until

we come to consider the phoneme /r/.

The spectrogram of the phoneme /r/ presents

successive trilled movements (generally two or

three), formed by periods of closure or silence, and

by periods of aperture, or vocalic elements, in

which formants can be seen (Carballo and Men-doza, 2000). It is a voiced phoneme which, owing

to the sequence of closing and opening periods,

presents a relatively long duration. According to

our findings (Table 2), the duration of /r/ in medial

442 E. Mendoza et al. / Speech Communication 40 (2003) 431–447

position is closer to the duration of the voiceless

fricatives than to other voiced phonemes.

According to the temporal compensation hy-

pothesis, we would expect vowels preceding thetrill /r/ to shorten their duration. However, this

does not happen. Instead, vowels preceding /r/ are

lengthened, as also occurs with vowels preceding

the other voiced consonants (see Table 2). Fur-

thermore, as we can see from the Tukey HSD post-

hoc analysis of pre-consonantal vowel duration for

each speaker, the trill /r/ is found in subsets which

most increase the duration of the preceding vowelin speakers S1, S3, S4, S6, S7, S8, S11 and S12.

Our data do not, therefore, support the hypothesis

of temporal compensation, but rather the phono-

logical hypothesis, based on the factor of voice/

absence of voice, which the speaker has to antici-

pate in preceding sounds. That is, the voicing of

the consonant is the important factor, driven by

the necessity to change the global setting to pro-duce an upcoming voiceless consonant.

As Braunschweiler (1997) states, prior to the

emission of a voiced consonant, the speaker must

have some kind of information enabling him to

execute the motor programme for adaptation to

the characteristics of the consonant, rather than

acting automatically as a mechanism of temporal

compensation, as claimed by Farnetani and Re-casens (1993). The /r/ effect proves that not all long

consonants shorten the duration of the preceding

vowel: in spite of its long duration, /r/ lengthens

the duration of the preceding vowel. As stated by

Daniloff et al. (1980), ‘‘Anticipatory coarticulation

can occur only if the speaker can �look ahead� intime and anticipate oncoming sound. RL (right-to-

left) coarticulation must reflect a high-level, centraltype of phonological-phonetic processing, since an

entire utterance must be scanned in order for an-

ticipation to be deliberately programmed.’’ (p.

324). See also €OOhman (1966); Wahlen (1990);Fowler and Brancazio (2000).

With a methodology very similar to ours, van

den Heuvel et al. (1994) find that vowels preceding

the phoneme /�rr/ (as in ‘‘tirc’’ or ‘‘turc’’) lengthentheir duration. The trill /r/ is a phoneme of greater

duration than /r/ owing to its cyclic repetition of

periods of closure and aperture, yet in spite of the

difference in duration between the two phonemes,

the effects of lengthening the preceding vowel are

similar. Again, we may interpret this fact as sup-

porting the argument that anticipatory coarticu-

lation corresponds to the characteristics of thephonemes /�rr/ and /r/ and not to their duration.Our data therefore amount to a confirmation of

the existence of anticipatory vowel-to consonant

coarticulation, not depending specifically on the

duration of the consonant, nor upon a mechanism

of temporal compensation supposedly automatic

in character and related to motor control, but to a

higher-level phonological-phonetic processing.Hertrich and Ackermann (1999) and Fowler

and Brancazio (2000) have confirmed the existence

of anticipatory coarticulation in ataxic patients

with deteriorated motor control. Similarly, Baum

(1998) has found that anticipatory articulation

remains intact both in fluent aphasics and in non-

fluent aphasics whose motor control is affected.

The study of anticipatory coarticulation and itsmechanisms offers a highly interesting line of re-

search into various pathologies of speech and

reading. We may speculate that, as it is maintained

in speech pathologies where the motor control

presents alterations, it would be diminished or

perhaps not even present in phonological dyslex-

ics, where the phonological access route to the

lexicon is damaged; see e.g. Defior (1996). In fu-ture studies it would be interesting to confirm this

point, which would lend further support to the

hypothesis of anticipatory coarticulation as an

aspect of phonological-phonetic processing.

Although it is less consistent, we have also de-

tected an anticipatory coarticulation effect on the

preceding consonant (C1) in two speakers: S2 and

S9. As previously described, this effect involves alengthening of the consonants /f, x, m/ preceding

the vowel /i/ as compared with /u/. It is not easy to

interpret this data, particularly as most previous

studies have concentrated on consonant-to-vowel

anticipatory coarticulation, with both segments

corresponding to different syllables, and not on

two segments comprising the same syllable. It

would be interesting to study the phenomenon indepth; the interpretation we tentatively suggest

here requires confirmation.

In accordance with the hypothesis of the effect

of ‘‘articulatory distance’’ on duration, Farnetani

E. Mendoza et al. / Speech Communication 40 (2003) 431–447 443

and Recasens (1993) consider that the shorter

duration of the vowel /i/ as compared with /a/ may

be viewed as an automatic consequence of the very

short articulatory distance from the configurationof /i/ to the configuration of the coronal conso-

nants used in their research: /t, d, z, �, l/. It is alsoprobable that during the production of a vowel,

this has to match the characteristics of the con-

sonant with which it forms a syllable, in order to

shorten its duration and carry out rapid articula-

tory adjustments. This is certainly the case for

coronal consonants (h, s, n, l, r) in our study. Yet itis precisely with the non-coronal consonants /f, x,

m/ that the effect is significant and where the

lengthening of the consonant previous to the vowel

/i/ is produced. This phenomenon requires further

research.

4.3. Temporal segments with greatest speaker-

idiosincrasy

All the speech segments in our analysis are

realised in a specific way by different speakers, allof whom show significant effects. For this reason,

our discussion here centres on greater or lesser

differentiation, rather than on its presence or

absence. In general, our study does not totally

confirm previous findings that longer-duration

segments present a more speaker-specific realisa-

tion, as has been suggested by, among others,

O�Shaughnessy (1984) and van den Heuvel et al.(1994). We have found this to be true only in

medial-position voiceless fricatives. In this posi-

tion fricative consonants present the most idio-

syncratic realisation for each speaker, while the

effect is reduced for nasal consonants and for /r/, in

spite of the relatively long duration of this pho-

neme in Spanish. This could be specific for Span-

ish.Although initial position consonants also pre-

sent inter-speaker variance, the effects are smaller,

particularly in /f/. The lesser differentiation be-

tween speakers for this phoneme may be due to a

measurement error, given that it presents very little

energy in initial position, which on occasions may

have made it difficult to determine the signal�sinitiation. Table 1 shows that the standard devia-tion in the duration of this phoneme is greater than

in the others, which may be due to this circum-

stance. With regard to the vowels, /u/ is realised

least specifically to each speaker, in spite of the

fact that its intrinsic duration is similar to /i/.In our view, the relations between ‘‘speaker-

specificity’’ and ‘‘segment duration’’ are highly

complex, and can never be established by means of

intrinsic durations of each segment, but rather by

considering coarticulation effects. Thus the vowel

/u/, whose realisation is least speaker-specific,

presents the least anticipatory coarticulation effect.

The opposite occurs with vowel /i/, with an in-trinsic duration similar to /u/ yet presenting a

greater coarticulation effect.

4.4. Inter-speaker variance in segment duration

Our findings indicate that all speakers partici-

pating in the study produce both initial-position

and medial consonants with different intrinsic du-

rational values, with the duration of voiceless fri-

cative consonants being longer than the voiced

consonants except in /r/. The same is not true ofvowel duration, given that S5 and S7 emit the

three vowels analysed with the same duration. We

may consider that intrinsic duration of consonants

is relatively stable among the speakers, while this

does not apply to the duration of vowels in

Spanish.

With regard to contextual effects, our findings

indicate that all speakers show anticipatory con-sonant-to-vowel coarticulation, but not in all the

vowels. Extent and type vary considerably from

speaker to speaker. The coarticulation effect is

most stable for the vowel /a/, given that it is shown

in a greater number of speakers, and less stable for

/i/ and /u/. This fact has been observed previously

by Crystal and House (1988c), who state that an-

ticipatory coarticulation is much smaller in shortvowels than long ones. The finding cannot be ex-

actly extrapolated to Spanish, which does not have

long or short vowels with phonological value;

however it does have vowels of greater duration

(such as /a/) and of lesser duration, (such as /i/ and

/u/).

The design and results of our study show that

the speaker factor is very strong in the variablesanalysed and in some of the interactions found.

444 E. Mendoza et al. / Speech Communication 40 (2003) 431–447

We can see that certain temporal characteristics

exist in speech segments, whether phonetic or

phonological in nature, operating in some speakers

only: for example, intrinsic differences in the du-ration of Spanish vowels––/a/ being longer than /i,

u/––which are not shown in S5 and S7; or the nasal

phoneme coarticulation differential in S9 and S12.

Others are operative in all speakers, such as the

effect of anticipatory consonant-to-vowel coartic-

ulation. However, even in this last case, the mag-

nitude and direction of effect differ between

speakers, or in other words, not all speakers pre-sent the same magnitude and direction in coartic-

ulation. We believe this finding to be of great

interest as an indicator of speaker idiosincrasy.

Our interest here has been to prove the exis-

tence of temporal differences in speech segments

related to phonetic context and speaker, with the

hope that future studies will proceed with a

methodology permitting a more precise analysis ofthe differences and interactions found. With 192

items and 12 subjects, measuring three segments in

each, our total of 6912 measurements renders the

identification of a ‘‘temporal profile’’ for each

speaker excessively complex. However, such a

profile would be of great interest, particularly in

the field of forensic acoustics.

What we do wish to emphasis here is the factthat some phonetic studies are carried out with too

few subjects or using very few observations. It is

highly probable that many standardised criteria

concerning duration, configuration, distance of

formant frequencies, transitions and so on, would

be different if they were obtained using more

speakers. As Pisoni (1990) states: ‘‘Linguistic the-

ory, with its primary emphasis on speech as anidealised representation abstracted away from the

physical medium, has basically ignored the prob-

lem of talker variability� � � One of the traditionalways of coping with stimulus variability in speech

has been to simply view it as ‘‘noise’’ in the signal

that needs to be stripped away in order to get at

the symbolic representation of the linguistic mes-

sage that has been encoded in the speech wave-form’’ (p. 171).

Studying the ‘‘noise’’, or speaker-idiosyncrasy,

forces us to reconsider many of the so-called

‘‘linguistic universals’’ and shows that many

obligatory rules may in fact be optional, not ob-

served by all speakers in all contexts.

5. Summary and conclusions

We have examined context (contiguous pho-neme) and speaker influences on the duration of

speech segments of eight selected consonants

(voiceless obstruents, nasals and liquids) and three

vowels in a set of 192 disyllabic words and non-

words in Spanish with stress on the first syllable,

spoken as isolated citation forms by 12 speakers of

a southern variety of Peninsular Spanish.

The contextual effects upon segment durationwere analysed in terms of anticipatory coarticula-

tion effects, and compared with similar data of

other European languages. Speaker differences are

focussed for the potential of segment duration as

an index of speaker identity for applications such

as forensic phonetics.

We have yielded a body of data about contex-

tual effects due to adjacent phoneme types as wellas about intrinsic segmental durations. We would

like to contribute to find a systematic principle

which may be at work governing the intrinsic du-

rational pattern, if any, in relation to the speaker

strategy of temporal organization of utterances

(Ladefoged, 1993). Our work illustrates that one

needs a substantially larger body of data for

understanding speaker-to-speaker variability andcontextual effects of various utterance factors.

Methodologically we have followed the lines of

van den Heuvel et al. (1994) although with no re-

peated measurements on items across testing oc-

casions. However, we have tested a greater number

of items from all possible combinations of conso-

nants and vowels in the CVCe frame.

A number of research problems which wouldnaturally continue the present work include (i) the

study of the potential effects of the difference be-

tween familiar or known words and non-words,

and (ii) the spectral analysis (formants, center of

gravity) of the segments of our speech sample be-

cause this information might be more appropriate

for studying speaker-specific aspects of articula-

tion such as carryover coarticulation (Hoole et al.,1993). Temporal measurements are much more

E. Mendoza et al. / Speech Communication 40 (2003) 431–447 445

sensitive to anticipatory coarticulation, although

they might be more prone to style and rate effects.

The raw data of this work are available for the

interested researchers under request.

Acknowledgement

This work was partially supported by the Juntade Andaluc�ııa (HUM-605).

Appendix A. Relation of words and non-words

utilized in the experiment

References

Antoniades, Z., Strube, H.W., 1984. Untersuchungen zur

spezifischen Dauer deutscher vokale. Phonetica 41, 72–87.

Bartkova, K., 1988. On the use of segmental duration in

speaker-independent speech recognition systems. In: Pro-

ceedings of the 7th FASE symposium, Edinburg, pp. 763–

770.

Baum, S.R., 1998. Anticipatory coarticulation in aphasia:

effects of utterance complexity. Brain and Language 63,

357–380.

Borzone, A.M., Signorini, A., 1983. Segmental duration and

rhythm in Spanish. Journal of Phonetics 11, 117–128.

Braunschweiler, N., 1997. Integrated cues of voicing and vowel

length in german: a production study. Language and Speech

40, 353–376.

Carballo, G., 1995. Estudio de las adquisiciones fonol�oogicas.

An�aalisis ac�uustico del fonema /�rr/. Unpublished doctoraldissertation. University of Granada, Spain, pp. 103–107.

Carballo, G., Mendoza, E., Valencia-Naranjo, N., 1997.

Interobserver agreement of perceived intelligibility of /�rr/ inchildren. Perceptual and Motor Skills 84, 1099–1104.

Carballo, G., Mendoza, E., 2000. Acoustic characteristics of

trill productions by groups of Spanish children. Clinical

Linguistics & Phonetics 14 (8), 587–601.

Crystal, Th.M., House, A.S., 1988a. The duration of American-

English vowels: an overview. Journal of Phonetics 16, 263–

284.

Crystal, Th.M., House, A.S., 1988b. The duration of American-

English stop consonants: an overview. Journal of Phonetics

16, 285–294.

Crystal, Th.M., House, A.S., 1988c. Segmental durations in

connected-speech signal. Journal of the Acoustical Society

of America 85, 1553–1573.

Cuenca, M.H., 1996–1997. An�aalisis instrumental de la duraci�oon

de las vocales en espa~nnol. Philologia Hispalensis 11, 295–307.

RUSE JURRE JUFE FASE

FUME LARRE MICE FUSE

RAFE FUCE SILE NAME

SUFE SUCE SAFE JILEZARRE JAME MUNE ZUSE

NANE MALE ZASE JULE

NARRE NIFE RURRE ZUCE

NASE RILE MUME SINE

NIJE MAME ZUNE JARRE

LISE NURRE NUCE NINE

ZULE RINE LUME RISE

JINE SURRE SIFE SACEFIME FULE JIFE LIJE

MASE NICE LALE NULE

MUFE CINE RARRE JUSE

MIFE RIRRE FICE FIJE

RUNE ZUME JAFE FALE

NUJE LUJE JACE MURRE

LICE LUSE NISE FISE

SISE NACE FAFE ZURRESIRRE SIME JIJE SASE

RUFE JISE SIJE LUNE

FILE FINE RALE JAJE

SUNE NUNE CIRRE SALE

MIRRE RASE CIME MANE

JANE SAJE JASE JICE

FAJE NUME JUME FURRE

MINE LAME LAFE LURREMARRE JUNE LIME LILE

MUJE NAFE MIME MIJE

JIME MAJE ZAJE MISE

FUJE NALE FARRE RIJE

FUNE NAJE RUCE ZAFE

JALE LASE FUFE RUME

MAFE LUFE JUJE RANE

SAME FAME RICE FANE

ZACE JIRRE FIRRE NUSE

RUJE ZAME CIJE LUCELIFE ZALE NIRRE MUSE

NUFE SUSE LINE MILE

CISE RAJE LACE JUCE

LANE CICE LIRRE LAJE

SANE SARRE SULE RIME

CIFE NIME ZUJE RACE

MUCE LULE FIFE MULE

FACE SUJE CILE RULESUME NILE ZUFE SICE

RIFE ZANE MACE RAME

446 E. Mendoza et al. / Speech Communication 40 (2003) 431–447

Daniloff, R.G., Hammarberg, R.E., 1973. On defining coartic-

ulation. Journal of Phonetics 1, 239–248.

Daniloff, R., Schuckers, G., Feth, L., 1980. The Physiology of

Speech and Hearing. Prentice-Hall, Inc., New Jersey, pp.

219–366.

Del Barrio, L., Torner, S., 1999. La duraci�oon conson�aantica encastellano. Ling€uu�ııstica Espa~nnola Actual, XXI 1, 99–126.

Defior, S., 1996. Las Dificultades de Aprendizaje: Un Enfoque

Cognitivo. M�aalaga, Aljibe, pp. 63–107.

Delattre, P., 1965. Comparing the Phonetic Features of English,

German, Spanish and French. Julius Groos Verlag, Heidel-

berg.

Farnetani, E., Recasens, D., 1993. Anticipatory consonant-to-

vowel coarticulation in the production of VCV sequences in

italian. Language and Speech 36, 279–302.

Fowler, C.A., Brancazio, L., 2000. Coarticulation resistance of

American English consonants and its effects on transconso-

nantal vowel-to-vowel coarticulation. Language and Speech

43, 1–41.

Gusp�ıı Saiz, M., 1993. Estudi de la duraci�oo de les consonants en

el context de final i principi de paraules en castell�aa i encatal�aa. Estudios de Fon�eetica Experimental, V (Barcelona),

189–221.

Hertrich, I., Ackermannn, H., 1995. Coarticulation in slow

speech: durational and spectral analysis. Language and

Speech 38, 157–187.

Hertrich, I., Ackermann, H., 1999. Temporal and spectral

aspects of coarticulation in ataxic dysarthria: an acoustic

analysis. Journal of Speech, Language and Hearing Re-

search 42, 367–381.

Hoole, P., Nguyen-Trong, N., Hardcastle, W., 1993. A com-

parative investigation of coarticulation in fricatives: elec-

tropalatographic, electromagnetic and acoustic data.

Language and Speech 36, 235–260.

House, A.S., Crystal, Th.M., 1997. A note on the durations of

American English consonants. In: Kiritani, A., Hirose, H.,

Fujisaki, H. (Eds.), Speech Production and Language: In

Honor of Osamu Fujimura. Mouton de Gruyter, Berlin.

Johnson, Ch., Hollien, H., Hicks, J.W., 1984. Speaker identi-

fication utilizing selected temporal speech features. Journal

of Phonetics 12, 319–326.

Jongman, A.J., 1998. Effects of vowel length and syllable

structure on segmental duration in Dutch. Journal of

Phonetics 26, 207–222.

Kluender, K., Diehl, R., Wright, B., 1988. Vowel-length

differences before voiced and voiceless consonants: an

auditory explanation. Journal of Phonetics 16, 153–169.

Kohler, K.J., 1984. Phonetic explanation in phonology. The

feature fortis/lenis. Phonetica 41, 150–174.

Ladefoged, P., 1993. A Course in Phonetics. Harcourt Brace,

New York.

Laeufer, Ch., 1992. Patterns of voicing-conditioned vowel

duration in French and English. Journal of Phonetics 20,

411–440.

Lehiste, I., 1970. Suprasegmentals. MIT Press, Princeton.

Mar�ıın, R., 1994–1995. La duraci�oon voc�aalica en espa~nnol.Estudios de Ling€uu�ııstica 10, 213–226.

Mart�ıınez Celdr�aan, E., 1989. Cantidad e intensidad en los

sonidos obstruyentes del castellano: Hacia una caracteriz-

aci�oon ac�uustica de los sonidos aproximantes. Estudios deFon�eetica Experimental, I (Barcelona), 73–129.

Navarro Tom�aas, T., 1918. Diferencias de duraci�oon entre las

consonantes espa~nnolas. Revista de Filolog�ııa Espa~nnola, V,

367–393.

Nooteboom, S.G., Slis, I.H., 1972. The phonetic feature of

vowel length in Dutch. Language and Speech 15, 301–316.€OOhman, S., 1966. Coarticulation in VCV utterances: spectro-

graphic measurements. Journal of the Acoustical Society of

America 39, 151–168.

O�Shaughnessy, D., 1981. A study of French vowel and

consonant durations. Journal of Phonetics 9, 385–406.

O�Shaughnessy, D., 1984. A multispeaker analysis of durationsin French paragraphs. Journal of the Acoustics Society of

America 76, 1664–1672.

O�Shaughnessy, D., 1987. Speech Communication. Human andMachine. Addison-Wesley Publishing Company., pp. 39–

127.

Pisoni, D., 1990. Effects of talker variability on speech

perception: implications for current research and theory.

Research on Speech Perception. Progress Report, 16.

Indiana University, pp. 169–191.

Port, R.F, Al-Anis, S., Maeda, S., 1980. Temporal compensa-

tion and universal phonetics. Phonetica 37, 235–252.

Quilis, A., Esgueva, M., Guti�eerrez, M.L., Cantarero, M., 1979.

Caracter�ıısticas ac�uusticas de las consonantes laterales

espa~nnolas. Ling€uu�ııstica Espa~nnola Actual 1, 233–343.Umeda, N., 1977. Consonant duration in American English.

Journal of Acoustical Society of America 61, 846–858.

van den Heuvel, H., Cranen, B., Rietveld, T., 1996. Speaker

variability in the coarticulation of /a, i, u/. Speech Commu-

nication 18, 113–130.

van den Heuvel, H., Rietveld, T., Cranen, B., 1994. Method-

ological aspects of segment- and speaker-related variability.

A study of segmental durations in Dutch. Journal of

Phonetics 22, 389–406.

van Santen, J.P.H., 1992. Contextual effects on vowel duration.

Speech Communication 11, 513–546.

van Santen, J.P.H., Coleman, J.S., Randolph, M.A., 1992.

Effects of postvocalic voicing on the time course of vowels

and diphthongs. Journal of the Acoustical Society of

America 2, 2444.

Wahlen, D.H., 1990. Coarticulation is largely planned. Journal

of Phonetics 18, 3–35.

Walsh, T., Parker, F., 1981. Vowel length and ‘‘voicing’’ in a

following consonant. Journal of Phonetics 9, 305–308.

Zimmerman, S.A., Sapon, S.M., 1958. Note on vowel duration

seen cross-linguistically. Journal of the Acoustical Society of

America 30, 152–153.

E. Mendoza et al. / Speech Communication 40 (2003) 431–447 447