Post on 21-Feb-2023
Temporal variability in speech segments of Spanish:context and speaker related differences
E. Mendoza a,*, G. Carballo a, A. Cruz a, M.D. Fresneda a, J. Mu~nnoz a,V. Marrero b
a Departamento de Personalidad, Facultad de Psicolog�ııa, Evaluaci�oon y Tratamiento Psicol�oogico, Universidad de Granada,
Campus de Cartuja, s/n 18071 Granada, Spainb Departamento de Lengua Espa~nnola, U.N.E.D., 28040 Madrid, Spain
Received 7 November 2000; received in revised form 30 November 2001; accepted 21 May 2002
Abstract
This article reports on segmental duration measurements of eight selected consonants (voiceless obstruents, nasals
and liquids) and three vowels in 192 disyllabic (CVCe) nonsense words with stress on the first syllable, spoken in
isolation by 12 Spanish speakers. Durations as measured based on acoustic discontinuities are discussed along with
speaker variability. The intrinsic and context-dependent duration of consonants /f, h, x, s, m, n, l, r/ and vowels /a, i, u/,as well as the inter-speaker variability of these phonemes were analysed. Results show sizable differences in the duration
of consonants (voiceless fricatives are longer than voiced fricatives) and vowels (/a/ has a longer duration than /i/ and
/u/). With regard to contextual effects, there is a remarkable decrease and increase in vowel durations preceding
voiceless fricatives and sonorants, respectively. These effects are present in all speakers. Our results on durational effects
indicate that (a) the initial consonants /x, s/ and /r/ show larger differences among speakers; (b) effects for the vowel /a/
are greater than for the vowels /i/ and /u/; and (c) voiceless fricative consonants in medial position show greater intra-
speaker idiosyncrasy than voiced consonants. The effects of anticipatory consonant-to-vowel coarticulation are dis-
cussed, as well as differences in segmental duration among speakers.
� 2002 Elsevier Science B.V. All rights reserved.
1. Introduction
The duration of speech segments depends on a
large number of factors. Intrinsic differences exist
according to the type of segment (Crystal and
House, 1988a,b; House and Crystal, 1997;
O�Shaughnessy, 1981, 1984; Quilis et al., 1979;Mart�ıınez Celdr�aan, 1989, among others); to pho-netic context (van Santen, 1992; van Santen et al.,
1992); to stress, to the final position in the utter-
ance, and so on. Lehiste (1970), Umeda (1977),
Crystal and House (1988a,b) and van Santen
(1992) offer a comprehensive study of the factors
involved in temporal variation of American En-
glish speech segments. Similar studies in French
have been done by O�Shaughnessy (1981, 1984)and Bartkova (1988), while Laeufer (1992) has
compared the duration of segments in English and
French. Farnetani and Recasens (1993) have
analysed Italian speech, Dutch has been studied
by van den Heuvel et al. (1994) and Jongman
(1998), and German has been studied by––among
*Corresponding author.
E-mail address: emendoza@ugr.es (E. Mendoza).
0167-6393/02/$ - see front matter � 2002 Elsevier Science B.V. All rights reserved.
doi:10.1016/S0167-6393(02)00086-9
Speech Communication 40 (2003) 431–447www.elsevier.com/locate/specom
others––Braunschweiler (1997) and Hertrich and
Ackermannn (1995, 1999). For Spanish, data are
available in pioneering studies by Navarro Tom�aas(1918) and in the cross-linguistic studies byZimmerman and Sapon (1958) in English and
Spanish, and Delattre (1965) in English, German,
French and Spanish. Also, Quilis et al. (1979),
Borzone and Signorini (1983), Mart�ıınez Celdr�aan(1989) and more recently, Gusp�ıı Saiz (1993),Mar�ıın (1994–1995), Cuenca (1996–1997), and DelBarrio and Torner (1999) have reported further
data about segmental duration in Spanish.Studies of temporal aspects of speech segments
have focussed on the determination of the dura-
tion of each segment and the effect of coarticula-
tion, or ‘‘the influence of one speech segment upon
the other’’ (Daniloff and Hammarberg, 1973, p.
239). In general, research into temporal aspects of
coarticulation has concentrated on anticipatory
consonant-to-vowel coarticulation and, more spe-cifically, on the temporal influence of stop conso-
nants on the preceding vowel. Furthermore, nearly
all the evidence reveals a decrease in duration of
vowels preceding a voiceless stop, and an increase
in duration of vowels preceding a voiced stop.
Two hypotheses have been proposed to explain
the lengthening effect in vowels preceding voiced
stop consonants and the inverse phenomenon. Thefirst hypothesis, sometimes known as ‘‘temporal
compensation’’ (Port et al., 1980) describes a
tendency towards a constant duration of the
vowelþ stop closure (Kohler, 1984), assuming arelatively fixed VC duration in the context of both
voiced and voiceless consonants. Let us remember
that voiced (lenis) are on the whole shorter than
voiceless (tense) ones in coda position. The speakeradjusts the duration of the vowel and closure ac-
cording to the sonority of the consonant, so that
the total duration of the syllable remains similar.
The second hypothesis attributes these effects to
phonological rules or to auditory feedback pro-
cesses used actively by speakers in order to gen-
erate differences (Kluender et al., 1988; Walsh and
Parker, 1981; Braunschweiler, 1997). Daniloffet al. (1980) consider both hypotheses to be viable
in the coarticulation process; the passive adjustment
of phonetic accommodation operating in carry-
over, perseverative or left-to-right articulation,
and the higher levels of phonological processing
being responsible for anticipatory or right-to-left
coarticulation.
These studies may be of interest both for pho-netics and for a more psychological approach to
speaker identity and the variations which may oc-
cur in such common phenomena between different
speakers. To integrate phonetic and psychological
research, we wonder whether stop consonants are
the most appropriate consonantal contexts for
studying the lengthening of the preceding vowel,
given that, due to their shorter duration, they re-veal less inter-speaker variance (Johnson et al.,
1984; O�Shaughnessy, 1987).van den Heuvel et al. (1994) have studied the
duration of vowels /a, i, u/ preceding or following
consonants /p, t, k, d, s, m, n, r/, in an attempt to
determine contextual effects and to identify which
vocalic segments are realised with more or less
speaker-specificity within a given phonetic context(isolated/CVCc/nonsense words). The interest oftheir study goes beyond a strict analysis of speaker
idiosyncrasy, with the authors claiming that such
idiosyncrasy can have a significant effect on re-
search results.
The present study is similar in approach to that
by van den Heuvel et al. (1994), but uses different
consonantal contexts and follows different objec-tives. We have selected isolated /CVCe/ nonsense
words, retaining the vowels /a, i, u/ but modifying
the consonants used by van den Heuvel et al.
(1994) to a study of /f, h, x, s, m, n, l, r/. Theprincipal reason for selecting these phonemes has
been their potential value for speaker identifica-
tion; therefore, we have selected those phonemes
which, according to previous studies, present themost speaker-dependent realisation. We have se-
lected fricatives /f, h, x/, and /s/ because they areconsonants of long duration, and according to
various authors (for example, O�Shaughnessy,1984; van den Heuvel et al., 1994), long-duration
sounds may be more speaker-dependent. These
sounds present few acoustic discontinuities and are
therefore easy to identify and analyse (Hoole et al.,1993).
According to O�Shaughnessy (1987), the morespeaker-dependent sounds are vowels, nasals and
fricatives, in descending order. For this reason we
432 E. Mendoza et al. / Speech Communication 40 (2003) 431–447
have selected two of the Spanish nasal consonants:
/m, n/. /r/ is selected for several reasons: (a) the
apicoalveolar realisation of the phoneme in
Spanish (Carballo, 1995; Carballo et al., 1997);and (b) the lengthening of the vowels preceding /r/
(Nooteboom and Slis, 1972), a phenomenon con-
sidered by van den Heuvel et al. (1994) to be an
obligatory rule in Dutch.
Finally, we have included the liquid consonant
/l/ in order to differentiate two groups of conso-
nants: the voiceless /f, h, x/ and /s/ with respect tothe voiced /m, n, l/ and /r/, and within this lastgroup, the nasal sounds /m, n/ with respect to the
liquid sounds /l/ and /r/. Of the five vowels in
Spanish, /a, e, i, o, u/, we have selected only /a, i,
u/, partly because they are the most extreme, and
partly to follow the design of van den Heuvel et al.
(1994) rather more closely. The palatal consonants
(/ , ffi/ and /ð/) have been excluded because theypresent much longer transitions than other Span-ish phonemes. We have centered our study on a
syllabic structure CV in order to follow the pro-
cedure of van den Heuvel et al. (1994).
Objectives of the study were as follows: (1) to
determine intrinsic duration of the vowels /a, i, u/
and of the consonants /f, h, x, s, m, n, l, r/, and theduration of these consonants relative to their po-
sition in a word (initial position, C1 and medialposition, C2 respectively), (2) to study the existence
of coarticulation: if the duration of a phoneme is
affected by the phoneme or phonemes preceding it
(carryover coarticulation) or following it (antici-
patory coarticulation), (3) to determine whether
the absolute or relative duration of phonemes is
similar in different speakers or whether, on the
contrary, there exist cues which would enable us todistinguish between different speakers, and (4) to
show which phonemes in different positions pre-
sent greater variance between speakers; or to put
it another way, which phonemes show greater
speaker idiosyncrasy.
An objection to many of the studies we have
reviewed concerning the duration of speech seg-
ments and coarticulation is that they are based ona small number of subjects or just a few observa-
tions. If variance exists between different speakers
and in intra-speaker stability, hypothesis on which
studies of speaker identity are based, a larger
number of subjects and observations are needed in
order to achieve confirmation of such identity. For
this reason we have worked with more subjects
and, above all, with many more observations andmeasurements in order to address the question of
variance/speaker identity with greater precision.
2. Method
2.1. Subjects
Twelve subjects took part in the study (six maleand six female), with ages ranging from 26 to 46
years old. All subjects were native to Granada
(Spain) and had lived in Granada all their lives.
No subject reported a history of speech or audi-
tory problems and at the time of recording the
experiment no subject was suffering from a cold or
infection of the respiratory tracts.
2.2. Speech sample
The speech sample consisted of 192 phonetic
sequences /C1VC2e/ (24 words and 168 nonsense
words), with the following break-down:Eight initial-position consonants (C1): /f, h, x, s,
m, n, l/ and /r/. Three vowels (V): /a, i/ and /u/.
Eight medial-position syllable-initial consonants
(C2): /f, h, x, s, m, n, l, r/. Multiplying the possiblecombinations (8�3�8) gives a total of 192 words ornon-words with stress on the first syllable. All se-
quences ended in the vowel /e/ (e.g., ruse, jane, lase,
jine. . .). Once the combinations were formed theywere randomised and presented to the subjects as
shown in Appendix A. Each consonant appears 24
times, both in C1 and in C2, and each vowel ap-
pears 64 times.
2.3. Procedure
Each subject was required to read some filler
words before the 192 experimental words and with
a short pause (2 or 3 s approximately) between
each one. The function of the filler words was to
familiarise the subjects with the actual corpus used
in the experiment and to train them to avoid thetonal descent characteristic of endings, the ‘‘list
E. Mendoza et al. / Speech Communication 40 (2003) 431–447 433
effects’’ since the target words were read in isola-
tion. The words were arranged in columns (1 col-
umn per page and 15 words in each column), typed
with a 24 point font size. Subjects were asked toread at their normal rhythm and with normal in-
tonation. If they committed a reading error they
were asked to repeat the misread word.
Speech samples were recorded using a AKG D
222 EB microphone with flat response, and a Sony
77 ES digital audiotape recorder with a sampling
frequency of 48 kHz. Volume was set between )30and )20 dB. All recordings took place in the VoiceLaboratory of the Psychology Department of the
University of Granada. Speech samples were dig-
itised at a 48 kHz sampling frequency, and later
sent to a device CSL 4300b (Computer Speech
Lab. Kay Elemetric Corp.) for its subsequent
analysis and interpretation.
2.4. Acoustic analysis
Acoustic analysis focused on the temporal seg-mentation of segments C1, V and C2. The vowel /e/
ending all the words in the corpus, was excluded
from the analysis. Segmentation was done using a
methodology similar to that described by van den
Heuvel et al. (1994), involving simultaneous use of
the waveform of the signal, the spectrogram, four
formants fitted by LPC analysis and the intensity
profile. Time cursors were used to select segmentsof interest; and were auditory controlled. The
measurement of their duration was taken and their
value (in ms) was labelled by hand. Transitions
C1V and VC2 were assumed to form part of the
vowel. An example of the segmentation procedure
is plotted in Fig. 1.
The vowel durations as well as the duration of
the consonants C1 and C2 were submitted to ananalysis of variance (ANOVA) since we were in-
terested in the two segment-bound factors in our
data set (namely, intrinsic duration and phonetic
context).
3. Results
This section is divided into three subsections,corresponding to the planned objectives of this
paper. First, we describe the durations found for
the speech segments within each context and their
corresponding duration relative to its position inthe context. Second, we show our findings re-
garding the contextual effects in duration in order
to analyse the existence of coarticulation. Third,
we give the results that we have obtained about the
speaker dependence of the durations of the pho-
nemes considered in this work. Finally, we de-
scribe the phonemes which seem to have a greater
speaker idiosyncrasy, that is which have a strongerspeaker dependence.
3.1. Segment durations in each context
Tables 1–4 and Figs. 2–6 show mean and
standard deviations (in ms) of the segment dura-
tions across contexts, and of durations in relation
to context. Specifically we have considered the
vowels /a, i, u/ and the consonants /f, h, x, s, m, n,l/ and /r/. Three two-factor ANOVAs were carriedout for the following variables: (a) duration of
initial position consonant, with factors C1 (8 lev-
els) and speaker (S) (12 levels); (b) duration of
vowel, with 3 levels in factor V and 12 in factor S;
(c) duration of C2, with 8 levels in factor C2 and 12
levels in factor S. Speakers were randomized in all
analyses. Results were as follows:
ANOVA 1: Significant differences were foundin variables C1 (F ð7; 77Þ ¼ 16:683, p < 0:001,
Fig. 1. Spectrographic representation of the non-word ‘‘sile’’.
434 E. Mendoza et al. / Speech Communication 40 (2003) 431–447
g2 ¼ 0:603) and S (F ð11; 77Þ ¼ 26:604, p < 0:001,g2 ¼ 0:792), and also in the interaction C1�S (F ð77;2208Þ ¼ 4:421, p < 0:001, g2 ¼ 0:134). A TukeyHSD post-hoc comparison (a ¼ 0:05) establishedthe following homogeneous subsets: [l, m, n] and [x,
r, h, f, s], with longer durations in subset 2.
ANOVA 2: Vowel duration showed differences
in factors V (F ð2; 22Þ ¼ 35:551, p < 0:001, g2 ¼0:764), S (F ð11; 22Þ ¼ 75:789, p < 0:001, g2 ¼0:974) and in the interaction of both factors
(F ð22; 2268Þ ¼ 1:900, p < 0:01, g2 ¼ 0:018). TukeyHSD post-hoc comparison (a ¼ 0:05) revealed two
Table 1
Means and standard deviations of the initial consonant (C1) duration in absolute and related-to-the-following-vocal values (in ms)
Absolute duration Related to /a/ Related to /i/ Related to /u/
Mean SD Mean SD Mean SD Mean SD
C11=f= 165.56 68.03 162.78 93.97 178.06 55.77 155.84 42.21
C12=h= 162.57 44.16 157.49 39.97 161.29 46.70 168.94 45.20
C13=x= 160.56 46.02 157.61 40.74 175.40 51.78 148.66 41.01
C14=s= 171.37 39.80 164.45 46.93 172.41 44.89 177.25 44.93
C15=m= 128.95 35.17 130.64 41.08 136.17 43.52 120.05 32.69
C16=n= 134.35 37.61 132.87 39.85 129.87 32.49 140.29 39.65
C17=l= 127.05 35.17 123.11 34.09 129.57 36.35 128.47 35.07
C18=r= 161.82 46.72 169.17 43.85 161.37 49.15 154.92 46.42
C1Total 151.52 49.32 149.77 53.28 155.52 49.36 149.30 44.78
Table 2
Means and standard deviations of the medial consonant (C2) duration in absolute and related-to-the-preceding-vocal values (in ms)
Absolute duration Related to /a/ Related to /i/ Related to /u/
Mean SD Mean SD Mean SD Mean SD
C21=f= 183.43 34.40 187.41 34.13 183.53 33.46 179.33 35.46
C22=h= 184.48 36.02 189.27 37.20 183.46 36.78 180.69 33.85
C23=x= 169.07 33.96 173.99 31.98 169.17 33.95 164.06 35.48
C24=s= 179.48 39.79 183.13 35.94 177.52 43.05 177.77 40.22
C25=m= 111.47 23.24 107.60 17.81 119.35 27.33 107.47 21.74
C26=n= 103.41 54.14 105.55 88.63 102.15 20.89 102.52 23.54
C27=l= 98.71 19.29 97.41 18.29 100.85 20.42 97.87 19.14
C28=r= 146.42 26.87 149.17 29.37 146.17 25.97 143.93 25.11
C2Total 147.05 49.40 149.20 56.20 147.78 45.67 144.21 45.51
Table 3
Means and standard deviations of the vowel duration (V) in absolute and related-to-the-preceding-consonant (C1) values (in ms)
Absolute duration C11=f= C12=h= C13=x= C14=s= C15=m= C16=n= C17=l= C18=r=
V1=a= 143.54 145.97 147.25 142.38 148.17 139.88 144.00 137.65 143.05
29.72 32.66 29.68 30.55 32.35 27.77 25.70 28.98 29.00
V2=i= 131.21 134.53 134.14 132.45 131.82 125.24 128.78 131.34 131.36
27.61 32.65 27.58 27.42 27.41 25.38 25.44 25.35 28.80
V3=u= 129.39 129.01 130.36 130.94 131.53 122.39 133.38 126.90 130.64
26.83 28.35 25.93 26.63 28.66 23.18 27.62 27.38 26.23
VTotal 134.71 136.50 137.24 135.25 137.17 129.16 135.39 131.96 135.02
28.77 31.97 28.61 28.61 30.45 26.55 26.95 27.54 28.52
bold (means), italic (SD). Preceding consonants: C11 . . .C18.
E. Mendoza et al. / Speech Communication 40 (2003) 431–447 435
homogeneous subsets: [u, i] and [a], with longerdurations for /a/.
ANOVA 3: Duration of C2 showed significant
differences in factors C2 (F ð7; 77Þ ¼ 76:778,p < 0:001, g2 ¼ 0:875), S (F ð1; 77Þ ¼ 18:418, p <0:001, g2 ¼ 0:725), and the interaction C2�S(F ð77; 2208Þ ¼ 4:556, p < 0:001, g2 ¼ 0:137). Tu-key HSD post-hoc comparison (a ¼ 0:05) gave the
following homogeneous subsets, in order ofshorter to longer duration––subset 1: /l, n/; subset
2: /n, m/; subset 3: /r/; subset 4: /x/ and subset
5: /s, f, h/.Applying Bonferroni t-test (p < 0:05), conso-
nants /f, h/ and /x/ show longer duration in C2 thanin C1, while the opposite occurs with consonants
/n/ and /l/. No durational differences between the
two contexts were found with consonants /s, m, r/.
Fig. 2. Comparison of the initial consonant (C1) durations (in
ms) related to the following vowels /a/, /i/ and /u/.
Fig. 3. Durational values (in ms) of the medial consonant (C2)
related to the preceding vowels /a/, /i/ and /u/.
Table 4
Means and standard deviations of the vowel duration (V) in absolute and related-to-the-following-consonant (C2) values (in ms)
Absolute duration C21=f= C22=h= C23=x= C24=s= C25=m= C26=n= C27=l= C28=r=
V1=a= 143.54 133.68 132.53 130.86 137.88 148.63 156.52 157.60 150.66
29.72 26.59 25.31 23.02 25.81 30.24 33.36 31.20 26.90
V2=i= 131.21 122.96 119.19 123.88 125.55 133.94 137.65 140.33 146.14
27.61 22.07 23.39 24.77 25.94 27.17 28.71 28.43 28.37
V3=u= 129.39 121.78 123.38 122.43 124.32 135.09 137.19 137.17 133.78
26.83 23.01 24.29 24.03 22.47 29.74 28.04 29.16 27.53
VTotal 134.71 126.14 125.03 125.72 129.25 139.22 143.79 145.04 143.52
28.77 24.48 24.89 24.15 25.45 29.74 31.35 30.85 28.42
bold (means), italic (SD). Following consonants: C21 . . .C28.
Fig. 4. Comparison between the durational values (in ms) of
the vowel /a/ related to the preceding (C1) and following (C2)
consonants /f, h, x, s, m, n, l, r/.
Fig. 5. Comparison between the durational values (in ms) of
the vowel /i/ related to the preceding (C1) and following (C2)
consonants /f, h, x, s, m, n, l, r/.
436 E. Mendoza et al. / Speech Communication 40 (2003) 431–447
3.2. Contextual effects in duration: temporal coar-
ticulation
Effects of C1 on vowel duration (V): Table 5
shows principal effects and interactions of factorsS, C1 and V. Factors C1 and V were considered
fixed, the speaker was treated as random factor. C1was nested in V. All the three principal factors
were shown to be significant (p < 0:001), as well asthe interaction S�V (p < 0:01). Interaction S�C1was not significant. The correlation ratio g2, givenby the formula g2 ¼ SSfactor=SStotal, was used to
determine the strength of (association for) each
factor.
As may be seen from the table, the greatest ef-
fect corresponds to factor V, followed by factor S,while C1 shows the smallest effect. Tukey HSD
post-hoc comparisons (a ¼ 0:05) were used to de-termine homogeneous subsets in vowel duration
according to the preceding consonant. This re-
sulted in the following two subsets, in order of
shorter to longer duration––subset 1: /m, l, r, x, n/
and subset 2: /l, r, x, n, f, s, h/. A large overlapexists between the two subsets.
Effects of C2 on vowel duration. Table 6 shows
principal effects and interactions of factors S, C2and V. Factors C2 and V were considered fixed,
while the speaker was treated random. C2 was
nested in V. The three principal factors were
shown to be significant (p < 0:001), as were theinteractions S�V (p < 0:001) and S�C2 (p < 0:05).The greatest effect corresponds to factor V,followed by factor C2 and factor S. Tukey HSD
post-hoc comparisons (a ¼ 0:05) established twohomogeneous subsets comprising the following
consonants, in order of shorter to greater vowel
duration––subset 1: / h, x, f, s/ and subset 2: /m, r,
Fig. 6. Comparison between the durational values (in ms) of
the vowel /u/ related to the preceding (C1) and following (C2)
consonants /f, h, x, s, m, n, l, r/.
Table 5
Degrees of freedom, F-ratios and g2-values for the speaker (S), initial consonant (C1) and vowel (V) factors and their interactions onthe vowel duration variable
Effect dfeffect SMeffect dferror SMerror F g2
S 11 88,729.28 2016 600.690 147.71*** 0.4463
C1 21 1640.36 231 657.806 2.49*** 0.1848
V 2 41,621.41 22 1170.737 35.55*** 0.7637
S�C1 231 657.81 2016 600.690 1.10 0.1112
S�V 22 1170.74 2016 600.690 1.95** 0.0208
**p < 0:01, ***p < 0:001.
Table 6
Degrees of freedom, F-ratios and g2-values for the speaker (S), final consonant (C2), vowel (V) factors and their interactions on thevowel duration variable
Effect dfeffect SMeffect dferror SMerror F g2
S 11 88,729.29 2016 525.176 168.95*** 0.4796
C2 21 9146.85 231 634.427 14.41*** 0.5672
V 2 41,621.41 22 1170.737 35.55*** 0.7637
S�C2 231 6344.30 2016 525.176 1.21* 0.1215
S�V 22 1170.74 2016 525.176 2.23*** 0.0237
*p < 0:05, ***p < 0:001.
E. Mendoza et al. / Speech Communication 40 (2003) 431–447 437
n, l/. This effect is similar in all vowels except with
nasals: the vowel /a/ is made longer before /n/ than
before /m/.
Effects of the vowel on C1 duration: Table 7shows principal effects and interactions of factors
S, C1 and V. Again, factors C1 and V were con-
sidered fixed factors, and S was treated as random.
V was nested in C1. The three principal factors
were shown to be significant (p < 0:001), as wasthe interaction S�C1 (p < 0:001). Interaction S�Vwas not significant. The greatest effect corresponds
to factor C1, followed by factor S and factor V.Tukey HSD post-hoc comparisons (a ¼ 0:05)established two homogeneous subsets in the du-
ration of C1 related to the vowel which fol-
lows––subset 1: /u, a/ and subset 2: /a, i/. Duration
of the initial consonant is shorter in subset 1. It
was observed that the vowel /i/ may lengthen the
duration of the preceding consonant relative to /u/,
in spite of the fact that the two vowels have asimilar intrinsic duration. This effect was observed
with the consonants /f, x, m/ only.
Effects of the vowel on C2 duration: Table 8
shows the principal effects and interactions of
factors S, C2 and V. Factors C2 and V were con-
sidered fixed, the speaker was treated as random.
V was nested in C2. Only the factors speaker and
vowel were shown to be significant (p < 0:001), asalso the interaction of the two factors (p < 0:05).The effect of the vowel on the duration of C2 was
not shown to be significant.
3.3. Inter-speaker differences
A specific analysis of variables which showed
significant interaction with the factor speaker wasundertaken. We have considered three interactions
related to intrinsic durations of the analysed seg-
ments (interaction S�C1 on the duration of C1,interaction S�V on the duration of the V, and in-teraction S�C2 on the duration of C2), and twointeractions related to contextual effects (interac-
tion S�V on the duration of C1 and interactionS�C2 on the duration of V).The following results were obtained:
(i) Interaction S�C1 on duration of C1: Table 9
shows values of F and g2 to estimate the magnitudeof the effect. Both statistical significance of
Table 7
Degrees of freedom, F-ratios and g2-values for the speaker (S), initial consonant (C1), vowel (V) factors and their interactions on theinitial consonant duration variable
Effect dfeffect SMeffect dferror SMerror F g2
S 11 1,51,173.40 2016 1210.890 124.84*** 0.4051
C1 7 94,796.80 77 5682.318 16.68*** 0.6026
V 16 6588.80 176 1657.137 3.98*** 0.2654
S�C1 77 5682.30 2016 1210.890 4.69 0.1519
S�V 176 1657.10 2016 1210.890 1.37** 0.1062
**p < 0:01, ***p < 0:001.
Table 8
Degrees of freedom, F-ratios and g2-values for the speaker (S), final consonant (C2), vowel (V) factors and their interactions on the finalconsonant duration variable
Effect dfeffect SMeffect dferror SMerror F g2
S 11 94,606.00 2016 85.78*** 0.3188
C2 7 3,94,428.80 77 5136.575 76.79*** 0.8746
V 16 1658.90 176 1361.293 1.22 0.0997
S�C2 77 5136.60 2016 1102.915 4.66*** 0.1510
S�V 176 1361.30 2016 1102.915 1.23* 0.0972
*p < 0:05, *** p < 0:001.
438 E. Mendoza et al. / Speech Communication 40 (2003) 431–447
F ðp < 0:05Þ and the effect of magnitude is very lowin S7, indicating that this speaker presents few
differences in duration of the initial consonants inour study. The greatest effects and hence the
greatest temporal differentiation in initial-position
consonants correspond to S4 and S8. Analysis of
durational data of each speaker reveals that S5
presents a very long duration in /r/ (�xx ¼ 242 ms,SD ¼ 32:51 ms); the opposite is true for S9
(�xx ¼ 171:03 ms, SD ¼ 43:37 ms). In the post-hocanalysis for each speaker, only S4 was shown to
conform to the general duration expectation of
initial-position consonants.
(ii) Interaction S�V on duration of V: Table 10
shows values of F and g2 for vowel duration in thedifferent speakers studied. S5 and S7 do not show
significant differences, indicating that in these
speakers duration of the vowels /a, i, u/ is similar.
The greatest effects correspond to S8 and S9. Post-
hoc analysis of subjects with significant differences
revealed that they all conform to the general ex-
pectation of vowel duration: the vowel /a/ is longer
than both /i/ and /u/.(iii) Interaction S�C2 on duration of C2: Table 11
shows F and g2 values for each subject. As may beobserved, differences are significant in all speakers.
Post-hoc analysis of each speaker showed that
none conformed to the general durational expec-
tation, although they all show a similar tendency.
(iv) Interaction S�V on duration of C1: As can be
seen in Table 12, initial consonant duration issimilar for all vowels in all speakers, except in S2.
Post-hoc analysis of S2 has shown that the dura-
tion of /u/ is significantly bigger than that of /a/. S9
has a tendency towards longer duration of C1 be-
fore /i/ than before /u/ and /a/. This effect is sig-
nificant for consonants /f, x, m/ only.
(v) Interaction S�C2 on duration of V: Table 13
shows values of F and g2 for each subject�s vocalicduration in relation to the following consonant. In
all speakers significant differences were found in
the duration of the vowel related to C2. In general,
Table 9
F-ratios and g2 values for each speaker on the initial consonant(C1) duration
Speaker F ð7; 184Þ g2
S1 15.468*** 0.370
S2 10.273*** 0.281
S3 9.273*** 0.259
S4 29.315*** 0.527
S5 24.077*** 0.478
S6 7.404*** 0.220
S7 2.111* 0.074
S8 29.165*** 0.526
S9 11.448*** 0.303
S10 12.745*** 0.327
S11 5.432*** 0.171
S12 10.281*** 0.281
*p < 0:05, ***p < 0:001.
Table 11
F-ratios and g2 values for each speaker on the medial consonant(C2) duration
Speaker F ð7; 184Þ g2
S1 181.905*** 0.874
S2 161.187*** 0.860
S3 4.483*** 0.146
S4 98.447*** 0.789
S5 134.665*** 0.837
S6 159.004*** 0.858
S7 155.558*** 0.855
S8 209.590*** 0.889
S9 72.929*** 0.735
S10 86.436*** 0.767
S11 70.744*** 0.729
S12 100.047*** 0.792
***p < 0:001.
Table 10
F-ratios and g2 values for each speaker on the vowel (V) du-ration
Speaker F ð2; 189Þ g2
S1 9.537*** 0.092
S2 7.944*** 0.078
S3 5.780*** 0.058
S4 19.946*** 0.174
S5 2.547 0.026
S6 6.325*** 0.063
F ð7; 184ÞS7 2.757 0.028
S8 36.913*** 0.281
S9 23.285*** 0.198
S10 10.746*** 0.102
S11 16.683*** 0.150
S12 19.571*** 0.172
***p < 0:001.
E. Mendoza et al. / Speech Communication 40 (2003) 431–447 439
duration of vowels is shortened before voiceless
fricative consonants and increases before voiced
consonants in all speakers. We note that duration
of vowel /a/ may be longer when preceded by /n/than when preceded by /m/, although this effect
was shown only in two (S9, S12) out of twelve
subjects.
Analysis of the effect of lengthening in each
vowel in relation to the ensuing consonant gave
the following results:
Vowel /a/: Anticipatory coarticulation was not
present in S3. Greatest effects were shown in S6, S9and S12. Vowel /i/: No coarticulation effect was
shown in S1, nor did post-hoc analysis establish
differences in S2 or in S9. Greatest effects were
shown in S6, S8 and S12. Vowel /u/: No coarticu-
lation effect was shown in S1, S4, S7 or S11. Post-
hoc analysis did not establish differences in S5.
Except in S6, effects were lesser than in the other
two vowels.
3.4. Analysis of temporal segments presenting
greatest inter-speaker differentiation
Table 14 shows one-factor ANOVA values F
and g2 for the following variables: duration of C1,duration of V and duration of C2 respectively. The
speaker factor is considered as an independentvariable. With regard to C1 duration we note that
the greatest effects correspond to consonants /x/,
/s/ and /r/. In vowel duration, greatest effects cor-
respond to the vowel /a/, followed by /i/. With re-
gard to duration of medial consonants, the table
Table 12
F-ratios and g2-values for each speaker on the related-to-the-following-vowel duration of C1
Speaker F ð2; 189Þ g2
S1 0. 280 0.003
S2 3.124� 0.032
S3 0.023 0.000
S4 2.589 0.027
S5 2.619 0.075
S6 1.174 0.012
S7 0.282 0.003
S8 2.280 0.024
S9 3.296� 0.034
S10 0.402 0.004
S11 0.192 0.002
S12 0.608 0.005
* p < 0:05.
Table 13
F-ratios and g2-values for each speaker on the related-to-the-following-consonant duration of the vowel
Speaker F ð7; 184Þ g2
S1 5.065��� 0.162
S2 10.237��� 0.280
S3 6.062��� 0.187
S4 4.399��� 0.143
S5 6.645��� 0.202
S6 22.415��� 0.460
S7 4.485��� 0.146
S8 7.968��� 0.233
S9 5.969��� 0.185
S10 10.539��� 0.286
S11 5.767��� 0.180
S12 11.5597��� 0.305
*** p < 0:001.
Table 14
F-ratios and g2-values for each analysed temporal segment (C11 . . .C18; V1 . . .V3; C21 . . .C28) on the differentiation among subjects
Initial consonant (C1) Vowel (V) Medial consonant (C2)
F ð11; 276Þ g2 F ð11; 756Þ g2 F ð11; 276Þ g2
C11=f= 6.274*** 0.200 V1=a= 100.047*** 0.593 C21=f= 103.233*** 0.804
C12=h= 18.016*** 0.416 V2=i= 83.715*** 0.549 C22=h= 85.686*** 0.774
C13=x= 37.315*** 0.598 V3=u= 22.351*** 0.245 C23=x= 48.860*** 0.661
C14=s= 34.674*** 0.580 C24=s= 88.449*** 0.779
C15=m= 21.792*** 0.465 C25=m= 3.064** 0.109
C16=n= 17.370*** 0.409 C26=n= 2.726** 0.98
C17=l= 20.644*** 0.451 C27=l= 19.289*** 0.435
C18=r= 32.025*** 0.561 C28=r= 3.578*** 0.125
**p < 0:01, ***p < 0:001.
440 E. Mendoza et al. / Speech Communication 40 (2003) 431–447
shows that voiceless fricative consonants present
greater inter-speaker differentiation, or in other
words, a more idiosyncratic realisation in each
speaker, with greater effects. Voiced consonantsshow a lower differential value, with effects of nasal
consonants and the trill being particularly reduced.
4. Discussion
The following findings result from our study.
Significant differences exist in the duration of
consonants /f, h, x, s, m, n, l, r/ in initial and medialpositions of nonsense words. In both positions,
longest durations correspond to the voiceless fric-atives /f, h, x, s/ and to the trill /r/, although dif-ferences are greater in medial-position consonants.
Temporal differences were also detected in the
vowels /a, i, u/: the vowel /a/ has a longer duration
than /i/ and /u/. There exists significant inter-
speaker variance in the duration variables studied,
as well as in speaker/segment interaction.
With regard to contextual effects, there is anotable decrease in vocalic duration preceding
voiceless fricative consonants, and an increase in
vocalic duration preceding voiced consonants; this
effect is similar for all the vowels. With regard to
nasals, our results show that the vowel /a/ may be
longer before /n/ than before /m/; however, this
effect was produced in speakers S9 and S12 only.
Effects of the initial consonant on vowel durationare weaker, with a strong overlap in the subsets
obtained. Nevertheless, it is interesting to note that
vowel /u/ was longer in duration when followed by
/n/ than when followed by /m/, although this effect
was present in S5 only.
The study shows that, relative to /u/, the vowel
/i/ may lengthen the duration of preceding /f/, /x/,
and /m/, in spite of the fact that the intrinsic du-rations of the two vowels are very similar. This
effect was present (p < 0:05) only in S2 and S9. Novocalic effects on following consonants were
found.
Concerning inter-speaker variability, our data
show differences in the duration of the initial
consonant, although only S4 conforms to the
general model for duration of initial-positionconsonants ([l, m, n] [x, r, h, s]). In vowel duration,
all speakers except S8 and S9 pronounce /a/ with
longer duration than the other vowels. All speak-
ers conform to the general durational model for
medial-position consonants.Anticipatory consonant-to-vowel coarticulation
in the duration of the vowel related to the conso-
nant following it, is present in all speakers; dura-
tion of vowels preceding voiced consonants is
greater; duration of vowels followed by voiceless
fricatives is shorter, although this effect is not
similar in all the vowels. We may state that the
effect is greatest for the vowel /a/, as it was shownin all subjects except S3. Next comes vowel /i/,
while vowel /u/ shows the least effect. speaker 6
presents the greatest coarticulation effect in all
vowels, while in S1 it is present only for vowel /a/.
Turning to the effect of duration of each seg-
ment on the degree of inter-speaker differentiation,
our results indicate that (a) the initial-position
consonants /x, s, r/ show most differences betweenspeakers; (b) effects are greater for the vowel /a/
than for the vowels /i, u/; (c) voiceless fricative
consonants in medial position show greater inter-
speaker variability than voiced consonants.
Generally speaking, the results obtained are in
accord with previously published findings. We
have detected intrinsic differences in the duration
of different speech segments, and also an antici-patory consonant-to-vowel coarticulation effect. In
addition, we have found notable differences be-
tween speakers with regard to the temporal vari-
ables in our analysis. The most significant of these
findings are discussed below.
4.1. Intrinsic durations of segments C1, V and C2
The comparison of our data with the durational
values of the corresponding consonants or vowels
found by previous works in Spanish (NavarroTom�aas, 1918; Borzone and Signorini, 1983;
Mart�ıınez Celdr�aan, 1989; Gusp�ıı Saiz, 1993; Carb-allo, 1995; Del Barrio and Torner, 1999) and in
other European languages (Laeufer, 1992; Farne-
tani and Recasens, 1993; Antoniades and Strube,
1984; van den Heuvel et al., 1994) is not very clear
because the intrinsic durations of speech segments
depend on a large number of factors, includingcontextual (position in the word), speaking style
E. Mendoza et al. / Speech Communication 40 (2003) 431–447 441
(reading versus spontaneous speech, reading
words versus reading texts, reading with or with-
out carrier phrase, etc.), and the language con-
cerned. Nevertheless, keeping this in mind, let ushighlight the observations done by a number of
pertinent works.
Del Barrio and Torner (1999) have studied the
durations of the consonants /f, x, h, s, r/ for twoSpanish speakers reading a text. They found values
somewhat shorter than our present data for the
same items, as one would expect because the words
in our analysis are read in isolation. Carballo(1995) finds a longer duration for the initial-posi-
tion trill /r/ in children. The difference may again
be explained by differences in procedure: she deals
with a group of children who had to name a
drawing of an object containing the phoneme in
initial position (e.g., rana/frog), while in our study
duration was determined through reading isolated
nonsense words.For Spanish vowels, Mar�ıın (1994–1995) and
Cuenca (1996–1997) found that /a/ is longer than
/i/, and /i/ is longer than /u/ as well. The data of the
present work show the same durational order for
these three vowels but our values for the duration
of the corresponding vowels are longer.
Farnetani and Recasens (1993) have noted
shorter duration of vowels in connected speechthan in isolated words in Italian, and we may as-
sume that the same phenomenon occurs with
consonants. van den Heuvel et al. (1994) gave
temporal values in Dutch speakers for some of the
consonants we have analysed, (namely /s, m, n/)
obtaining lower durations than us for the corre-
sponding Spanish consonants. In French voiceless
fricatives Laeufer (1992) found a similar averageduration to the value we obtained. For the Ger-
man vowels /a, i, u/, Antoniades and Strube (1984)
gave longer durations than our study gives for the
corresponding Spanish vowels.
4.2. Context-related duration of speech segments
Our findings indicate an influence of phonetic
context, or coarticulation, principally of anticipa-
tory coarticulation. Effects of carryover coarticu-
lation are small: they were detected in only onespeaker (S5), in one vowel (/u/), and in one initial
position consonant (/n/) relative to /m/. Hoole
et al. (1993) suggest that carryover coarticulation
is more readily shown in spectral measurements,
while temporal measurements such as ours aremore sensitive to anticipatory coarticulation. It is
true that van den Heuvel et al. (1996), following
the same experimental procedure as van den
Heuvel et al. (1994), find consonant-to-vowel car-
ryover coarticulation, but this involves taking the
spectral measurement of F2 as a dependent vari-
able. Given that all the measurements we have
used indicated only a minimal effect in one speakeralone, we may conclude that our temporal mea-
surements are not sufficiently sensitive to isolate
this effect.
The chief finding has been a contextual effect of
anticipatory coarticulation in the duration of the
vowel related to the consonant following it. In
general, vowels increase their duration if they are
followed by voiced consonants and lessen theirduration if they are followed by voiceless fricative
consonants. This finding may be explained in two
ways: (1) Anticipatory coarticulation corresponds
to a compensation effect, given that voiceless fri-
cative consonants, which reduce the duration of
the preceding vowel, are longer than voiced con-
sonants, which lengthen the duration of the vowel;
(2) By the same token, anticipatory coarticulationcorresponds to an effect of sonority. The first case
posits an automatic temporal adjustment mecha-
nism (Port et al., 1980; Kohler, 1984; Farnetani
and Recasens, 1993); the second case brings us to
the hypothesis of the existence of phonological
rules (such as the voiced/voiceless contrast) oper-
ating in the phenomenon of anticipatory coartic-
ulation (Daniloff et al., 1980; Kluender et al., 1988;Walsh and Parker, 1981; Braunschweiler, 1997).
Both hypotheses might appear to be viable, until
we come to consider the phoneme /r/.
The spectrogram of the phoneme /r/ presents
successive trilled movements (generally two or
three), formed by periods of closure or silence, and
by periods of aperture, or vocalic elements, in
which formants can be seen (Carballo and Men-doza, 2000). It is a voiced phoneme which, owing
to the sequence of closing and opening periods,
presents a relatively long duration. According to
our findings (Table 2), the duration of /r/ in medial
442 E. Mendoza et al. / Speech Communication 40 (2003) 431–447
position is closer to the duration of the voiceless
fricatives than to other voiced phonemes.
According to the temporal compensation hy-
pothesis, we would expect vowels preceding thetrill /r/ to shorten their duration. However, this
does not happen. Instead, vowels preceding /r/ are
lengthened, as also occurs with vowels preceding
the other voiced consonants (see Table 2). Fur-
thermore, as we can see from the Tukey HSD post-
hoc analysis of pre-consonantal vowel duration for
each speaker, the trill /r/ is found in subsets which
most increase the duration of the preceding vowelin speakers S1, S3, S4, S6, S7, S8, S11 and S12.
Our data do not, therefore, support the hypothesis
of temporal compensation, but rather the phono-
logical hypothesis, based on the factor of voice/
absence of voice, which the speaker has to antici-
pate in preceding sounds. That is, the voicing of
the consonant is the important factor, driven by
the necessity to change the global setting to pro-duce an upcoming voiceless consonant.
As Braunschweiler (1997) states, prior to the
emission of a voiced consonant, the speaker must
have some kind of information enabling him to
execute the motor programme for adaptation to
the characteristics of the consonant, rather than
acting automatically as a mechanism of temporal
compensation, as claimed by Farnetani and Re-casens (1993). The /r/ effect proves that not all long
consonants shorten the duration of the preceding
vowel: in spite of its long duration, /r/ lengthens
the duration of the preceding vowel. As stated by
Daniloff et al. (1980), ‘‘Anticipatory coarticulation
can occur only if the speaker can �look ahead� intime and anticipate oncoming sound. RL (right-to-
left) coarticulation must reflect a high-level, centraltype of phonological-phonetic processing, since an
entire utterance must be scanned in order for an-
ticipation to be deliberately programmed.’’ (p.
324). See also €OOhman (1966); Wahlen (1990);Fowler and Brancazio (2000).
With a methodology very similar to ours, van
den Heuvel et al. (1994) find that vowels preceding
the phoneme /�rr/ (as in ‘‘tirc’’ or ‘‘turc’’) lengthentheir duration. The trill /r/ is a phoneme of greater
duration than /r/ owing to its cyclic repetition of
periods of closure and aperture, yet in spite of the
difference in duration between the two phonemes,
the effects of lengthening the preceding vowel are
similar. Again, we may interpret this fact as sup-
porting the argument that anticipatory coarticu-
lation corresponds to the characteristics of thephonemes /�rr/ and /r/ and not to their duration.Our data therefore amount to a confirmation of
the existence of anticipatory vowel-to consonant
coarticulation, not depending specifically on the
duration of the consonant, nor upon a mechanism
of temporal compensation supposedly automatic
in character and related to motor control, but to a
higher-level phonological-phonetic processing.Hertrich and Ackermann (1999) and Fowler
and Brancazio (2000) have confirmed the existence
of anticipatory coarticulation in ataxic patients
with deteriorated motor control. Similarly, Baum
(1998) has found that anticipatory articulation
remains intact both in fluent aphasics and in non-
fluent aphasics whose motor control is affected.
The study of anticipatory coarticulation and itsmechanisms offers a highly interesting line of re-
search into various pathologies of speech and
reading. We may speculate that, as it is maintained
in speech pathologies where the motor control
presents alterations, it would be diminished or
perhaps not even present in phonological dyslex-
ics, where the phonological access route to the
lexicon is damaged; see e.g. Defior (1996). In fu-ture studies it would be interesting to confirm this
point, which would lend further support to the
hypothesis of anticipatory coarticulation as an
aspect of phonological-phonetic processing.
Although it is less consistent, we have also de-
tected an anticipatory coarticulation effect on the
preceding consonant (C1) in two speakers: S2 and
S9. As previously described, this effect involves alengthening of the consonants /f, x, m/ preceding
the vowel /i/ as compared with /u/. It is not easy to
interpret this data, particularly as most previous
studies have concentrated on consonant-to-vowel
anticipatory coarticulation, with both segments
corresponding to different syllables, and not on
two segments comprising the same syllable. It
would be interesting to study the phenomenon indepth; the interpretation we tentatively suggest
here requires confirmation.
In accordance with the hypothesis of the effect
of ‘‘articulatory distance’’ on duration, Farnetani
E. Mendoza et al. / Speech Communication 40 (2003) 431–447 443
and Recasens (1993) consider that the shorter
duration of the vowel /i/ as compared with /a/ may
be viewed as an automatic consequence of the very
short articulatory distance from the configurationof /i/ to the configuration of the coronal conso-
nants used in their research: /t, d, z, �, l/. It is alsoprobable that during the production of a vowel,
this has to match the characteristics of the con-
sonant with which it forms a syllable, in order to
shorten its duration and carry out rapid articula-
tory adjustments. This is certainly the case for
coronal consonants (h, s, n, l, r) in our study. Yet itis precisely with the non-coronal consonants /f, x,
m/ that the effect is significant and where the
lengthening of the consonant previous to the vowel
/i/ is produced. This phenomenon requires further
research.
4.3. Temporal segments with greatest speaker-
idiosincrasy
All the speech segments in our analysis are
realised in a specific way by different speakers, allof whom show significant effects. For this reason,
our discussion here centres on greater or lesser
differentiation, rather than on its presence or
absence. In general, our study does not totally
confirm previous findings that longer-duration
segments present a more speaker-specific realisa-
tion, as has been suggested by, among others,
O�Shaughnessy (1984) and van den Heuvel et al.(1994). We have found this to be true only in
medial-position voiceless fricatives. In this posi-
tion fricative consonants present the most idio-
syncratic realisation for each speaker, while the
effect is reduced for nasal consonants and for /r/, in
spite of the relatively long duration of this pho-
neme in Spanish. This could be specific for Span-
ish.Although initial position consonants also pre-
sent inter-speaker variance, the effects are smaller,
particularly in /f/. The lesser differentiation be-
tween speakers for this phoneme may be due to a
measurement error, given that it presents very little
energy in initial position, which on occasions may
have made it difficult to determine the signal�sinitiation. Table 1 shows that the standard devia-tion in the duration of this phoneme is greater than
in the others, which may be due to this circum-
stance. With regard to the vowels, /u/ is realised
least specifically to each speaker, in spite of the
fact that its intrinsic duration is similar to /i/.In our view, the relations between ‘‘speaker-
specificity’’ and ‘‘segment duration’’ are highly
complex, and can never be established by means of
intrinsic durations of each segment, but rather by
considering coarticulation effects. Thus the vowel
/u/, whose realisation is least speaker-specific,
presents the least anticipatory coarticulation effect.
The opposite occurs with vowel /i/, with an in-trinsic duration similar to /u/ yet presenting a
greater coarticulation effect.
4.4. Inter-speaker variance in segment duration
Our findings indicate that all speakers partici-
pating in the study produce both initial-position
and medial consonants with different intrinsic du-
rational values, with the duration of voiceless fri-
cative consonants being longer than the voiced
consonants except in /r/. The same is not true ofvowel duration, given that S5 and S7 emit the
three vowels analysed with the same duration. We
may consider that intrinsic duration of consonants
is relatively stable among the speakers, while this
does not apply to the duration of vowels in
Spanish.
With regard to contextual effects, our findings
indicate that all speakers show anticipatory con-sonant-to-vowel coarticulation, but not in all the
vowels. Extent and type vary considerably from
speaker to speaker. The coarticulation effect is
most stable for the vowel /a/, given that it is shown
in a greater number of speakers, and less stable for
/i/ and /u/. This fact has been observed previously
by Crystal and House (1988c), who state that an-
ticipatory coarticulation is much smaller in shortvowels than long ones. The finding cannot be ex-
actly extrapolated to Spanish, which does not have
long or short vowels with phonological value;
however it does have vowels of greater duration
(such as /a/) and of lesser duration, (such as /i/ and
/u/).
The design and results of our study show that
the speaker factor is very strong in the variablesanalysed and in some of the interactions found.
444 E. Mendoza et al. / Speech Communication 40 (2003) 431–447
We can see that certain temporal characteristics
exist in speech segments, whether phonetic or
phonological in nature, operating in some speakers
only: for example, intrinsic differences in the du-ration of Spanish vowels––/a/ being longer than /i,
u/––which are not shown in S5 and S7; or the nasal
phoneme coarticulation differential in S9 and S12.
Others are operative in all speakers, such as the
effect of anticipatory consonant-to-vowel coartic-
ulation. However, even in this last case, the mag-
nitude and direction of effect differ between
speakers, or in other words, not all speakers pre-sent the same magnitude and direction in coartic-
ulation. We believe this finding to be of great
interest as an indicator of speaker idiosincrasy.
Our interest here has been to prove the exis-
tence of temporal differences in speech segments
related to phonetic context and speaker, with the
hope that future studies will proceed with a
methodology permitting a more precise analysis ofthe differences and interactions found. With 192
items and 12 subjects, measuring three segments in
each, our total of 6912 measurements renders the
identification of a ‘‘temporal profile’’ for each
speaker excessively complex. However, such a
profile would be of great interest, particularly in
the field of forensic acoustics.
What we do wish to emphasis here is the factthat some phonetic studies are carried out with too
few subjects or using very few observations. It is
highly probable that many standardised criteria
concerning duration, configuration, distance of
formant frequencies, transitions and so on, would
be different if they were obtained using more
speakers. As Pisoni (1990) states: ‘‘Linguistic the-
ory, with its primary emphasis on speech as anidealised representation abstracted away from the
physical medium, has basically ignored the prob-
lem of talker variability� � � One of the traditionalways of coping with stimulus variability in speech
has been to simply view it as ‘‘noise’’ in the signal
that needs to be stripped away in order to get at
the symbolic representation of the linguistic mes-
sage that has been encoded in the speech wave-form’’ (p. 171).
Studying the ‘‘noise’’, or speaker-idiosyncrasy,
forces us to reconsider many of the so-called
‘‘linguistic universals’’ and shows that many
obligatory rules may in fact be optional, not ob-
served by all speakers in all contexts.
5. Summary and conclusions
We have examined context (contiguous pho-neme) and speaker influences on the duration of
speech segments of eight selected consonants
(voiceless obstruents, nasals and liquids) and three
vowels in a set of 192 disyllabic words and non-
words in Spanish with stress on the first syllable,
spoken as isolated citation forms by 12 speakers of
a southern variety of Peninsular Spanish.
The contextual effects upon segment durationwere analysed in terms of anticipatory coarticula-
tion effects, and compared with similar data of
other European languages. Speaker differences are
focussed for the potential of segment duration as
an index of speaker identity for applications such
as forensic phonetics.
We have yielded a body of data about contex-
tual effects due to adjacent phoneme types as wellas about intrinsic segmental durations. We would
like to contribute to find a systematic principle
which may be at work governing the intrinsic du-
rational pattern, if any, in relation to the speaker
strategy of temporal organization of utterances
(Ladefoged, 1993). Our work illustrates that one
needs a substantially larger body of data for
understanding speaker-to-speaker variability andcontextual effects of various utterance factors.
Methodologically we have followed the lines of
van den Heuvel et al. (1994) although with no re-
peated measurements on items across testing oc-
casions. However, we have tested a greater number
of items from all possible combinations of conso-
nants and vowels in the CVCe frame.
A number of research problems which wouldnaturally continue the present work include (i) the
study of the potential effects of the difference be-
tween familiar or known words and non-words,
and (ii) the spectral analysis (formants, center of
gravity) of the segments of our speech sample be-
cause this information might be more appropriate
for studying speaker-specific aspects of articula-
tion such as carryover coarticulation (Hoole et al.,1993). Temporal measurements are much more
E. Mendoza et al. / Speech Communication 40 (2003) 431–447 445
sensitive to anticipatory coarticulation, although
they might be more prone to style and rate effects.
The raw data of this work are available for the
interested researchers under request.
Acknowledgement
This work was partially supported by the Juntade Andaluc�ııa (HUM-605).
Appendix A. Relation of words and non-words
utilized in the experiment
References
Antoniades, Z., Strube, H.W., 1984. Untersuchungen zur
spezifischen Dauer deutscher vokale. Phonetica 41, 72–87.
Bartkova, K., 1988. On the use of segmental duration in
speaker-independent speech recognition systems. In: Pro-
ceedings of the 7th FASE symposium, Edinburg, pp. 763–
770.
Baum, S.R., 1998. Anticipatory coarticulation in aphasia:
effects of utterance complexity. Brain and Language 63,
357–380.
Borzone, A.M., Signorini, A., 1983. Segmental duration and
rhythm in Spanish. Journal of Phonetics 11, 117–128.
Braunschweiler, N., 1997. Integrated cues of voicing and vowel
length in german: a production study. Language and Speech
40, 353–376.
Carballo, G., 1995. Estudio de las adquisiciones fonol�oogicas.
An�aalisis ac�uustico del fonema /�rr/. Unpublished doctoraldissertation. University of Granada, Spain, pp. 103–107.
Carballo, G., Mendoza, E., Valencia-Naranjo, N., 1997.
Interobserver agreement of perceived intelligibility of /�rr/ inchildren. Perceptual and Motor Skills 84, 1099–1104.
Carballo, G., Mendoza, E., 2000. Acoustic characteristics of
trill productions by groups of Spanish children. Clinical
Linguistics & Phonetics 14 (8), 587–601.
Crystal, Th.M., House, A.S., 1988a. The duration of American-
English vowels: an overview. Journal of Phonetics 16, 263–
284.
Crystal, Th.M., House, A.S., 1988b. The duration of American-
English stop consonants: an overview. Journal of Phonetics
16, 285–294.
Crystal, Th.M., House, A.S., 1988c. Segmental durations in
connected-speech signal. Journal of the Acoustical Society
of America 85, 1553–1573.
Cuenca, M.H., 1996–1997. An�aalisis instrumental de la duraci�oon
de las vocales en espa~nnol. Philologia Hispalensis 11, 295–307.
RUSE JURRE JUFE FASE
FUME LARRE MICE FUSE
RAFE FUCE SILE NAME
SUFE SUCE SAFE JILEZARRE JAME MUNE ZUSE
NANE MALE ZASE JULE
NARRE NIFE RURRE ZUCE
NASE RILE MUME SINE
NIJE MAME ZUNE JARRE
LISE NURRE NUCE NINE
ZULE RINE LUME RISE
JINE SURRE SIFE SACEFIME FULE JIFE LIJE
MASE NICE LALE NULE
MUFE CINE RARRE JUSE
MIFE RIRRE FICE FIJE
RUNE ZUME JAFE FALE
NUJE LUJE JACE MURRE
LICE LUSE NISE FISE
SISE NACE FAFE ZURRESIRRE SIME JIJE SASE
RUFE JISE SIJE LUNE
FILE FINE RALE JAJE
SUNE NUNE CIRRE SALE
MIRRE RASE CIME MANE
JANE SAJE JASE JICE
FAJE NUME JUME FURRE
MINE LAME LAFE LURREMARRE JUNE LIME LILE
MUJE NAFE MIME MIJE
JIME MAJE ZAJE MISE
FUJE NALE FARRE RIJE
FUNE NAJE RUCE ZAFE
JALE LASE FUFE RUME
MAFE LUFE JUJE RANE
SAME FAME RICE FANE
ZACE JIRRE FIRRE NUSE
RUJE ZAME CIJE LUCELIFE ZALE NIRRE MUSE
NUFE SUSE LINE MILE
CISE RAJE LACE JUCE
LANE CICE LIRRE LAJE
SANE SARRE SULE RIME
CIFE NIME ZUJE RACE
MUCE LULE FIFE MULE
FACE SUJE CILE RULESUME NILE ZUFE SICE
RIFE ZANE MACE RAME
446 E. Mendoza et al. / Speech Communication 40 (2003) 431–447
Daniloff, R.G., Hammarberg, R.E., 1973. On defining coartic-
ulation. Journal of Phonetics 1, 239–248.
Daniloff, R., Schuckers, G., Feth, L., 1980. The Physiology of
Speech and Hearing. Prentice-Hall, Inc., New Jersey, pp.
219–366.
Del Barrio, L., Torner, S., 1999. La duraci�oon conson�aantica encastellano. Ling€uu�ııstica Espa~nnola Actual, XXI 1, 99–126.
Defior, S., 1996. Las Dificultades de Aprendizaje: Un Enfoque
Cognitivo. M�aalaga, Aljibe, pp. 63–107.
Delattre, P., 1965. Comparing the Phonetic Features of English,
German, Spanish and French. Julius Groos Verlag, Heidel-
berg.
Farnetani, E., Recasens, D., 1993. Anticipatory consonant-to-
vowel coarticulation in the production of VCV sequences in
italian. Language and Speech 36, 279–302.
Fowler, C.A., Brancazio, L., 2000. Coarticulation resistance of
American English consonants and its effects on transconso-
nantal vowel-to-vowel coarticulation. Language and Speech
43, 1–41.
Gusp�ıı Saiz, M., 1993. Estudi de la duraci�oo de les consonants en
el context de final i principi de paraules en castell�aa i encatal�aa. Estudios de Fon�eetica Experimental, V (Barcelona),
189–221.
Hertrich, I., Ackermannn, H., 1995. Coarticulation in slow
speech: durational and spectral analysis. Language and
Speech 38, 157–187.
Hertrich, I., Ackermann, H., 1999. Temporal and spectral
aspects of coarticulation in ataxic dysarthria: an acoustic
analysis. Journal of Speech, Language and Hearing Re-
search 42, 367–381.
Hoole, P., Nguyen-Trong, N., Hardcastle, W., 1993. A com-
parative investigation of coarticulation in fricatives: elec-
tropalatographic, electromagnetic and acoustic data.
Language and Speech 36, 235–260.
House, A.S., Crystal, Th.M., 1997. A note on the durations of
American English consonants. In: Kiritani, A., Hirose, H.,
Fujisaki, H. (Eds.), Speech Production and Language: In
Honor of Osamu Fujimura. Mouton de Gruyter, Berlin.
Johnson, Ch., Hollien, H., Hicks, J.W., 1984. Speaker identi-
fication utilizing selected temporal speech features. Journal
of Phonetics 12, 319–326.
Jongman, A.J., 1998. Effects of vowel length and syllable
structure on segmental duration in Dutch. Journal of
Phonetics 26, 207–222.
Kluender, K., Diehl, R., Wright, B., 1988. Vowel-length
differences before voiced and voiceless consonants: an
auditory explanation. Journal of Phonetics 16, 153–169.
Kohler, K.J., 1984. Phonetic explanation in phonology. The
feature fortis/lenis. Phonetica 41, 150–174.
Ladefoged, P., 1993. A Course in Phonetics. Harcourt Brace,
New York.
Laeufer, Ch., 1992. Patterns of voicing-conditioned vowel
duration in French and English. Journal of Phonetics 20,
411–440.
Lehiste, I., 1970. Suprasegmentals. MIT Press, Princeton.
Mar�ıın, R., 1994–1995. La duraci�oon voc�aalica en espa~nnol.Estudios de Ling€uu�ııstica 10, 213–226.
Mart�ıınez Celdr�aan, E., 1989. Cantidad e intensidad en los
sonidos obstruyentes del castellano: Hacia una caracteriz-
aci�oon ac�uustica de los sonidos aproximantes. Estudios deFon�eetica Experimental, I (Barcelona), 73–129.
Navarro Tom�aas, T., 1918. Diferencias de duraci�oon entre las
consonantes espa~nnolas. Revista de Filolog�ııa Espa~nnola, V,
367–393.
Nooteboom, S.G., Slis, I.H., 1972. The phonetic feature of
vowel length in Dutch. Language and Speech 15, 301–316.€OOhman, S., 1966. Coarticulation in VCV utterances: spectro-
graphic measurements. Journal of the Acoustical Society of
America 39, 151–168.
O�Shaughnessy, D., 1981. A study of French vowel and
consonant durations. Journal of Phonetics 9, 385–406.
O�Shaughnessy, D., 1984. A multispeaker analysis of durationsin French paragraphs. Journal of the Acoustics Society of
America 76, 1664–1672.
O�Shaughnessy, D., 1987. Speech Communication. Human andMachine. Addison-Wesley Publishing Company., pp. 39–
127.
Pisoni, D., 1990. Effects of talker variability on speech
perception: implications for current research and theory.
Research on Speech Perception. Progress Report, 16.
Indiana University, pp. 169–191.
Port, R.F, Al-Anis, S., Maeda, S., 1980. Temporal compensa-
tion and universal phonetics. Phonetica 37, 235–252.
Quilis, A., Esgueva, M., Guti�eerrez, M.L., Cantarero, M., 1979.
Caracter�ıısticas ac�uusticas de las consonantes laterales
espa~nnolas. Ling€uu�ııstica Espa~nnola Actual 1, 233–343.Umeda, N., 1977. Consonant duration in American English.
Journal of Acoustical Society of America 61, 846–858.
van den Heuvel, H., Cranen, B., Rietveld, T., 1996. Speaker
variability in the coarticulation of /a, i, u/. Speech Commu-
nication 18, 113–130.
van den Heuvel, H., Rietveld, T., Cranen, B., 1994. Method-
ological aspects of segment- and speaker-related variability.
A study of segmental durations in Dutch. Journal of
Phonetics 22, 389–406.
van Santen, J.P.H., 1992. Contextual effects on vowel duration.
Speech Communication 11, 513–546.
van Santen, J.P.H., Coleman, J.S., Randolph, M.A., 1992.
Effects of postvocalic voicing on the time course of vowels
and diphthongs. Journal of the Acoustical Society of
America 2, 2444.
Wahlen, D.H., 1990. Coarticulation is largely planned. Journal
of Phonetics 18, 3–35.
Walsh, T., Parker, F., 1981. Vowel length and ‘‘voicing’’ in a
following consonant. Journal of Phonetics 9, 305–308.
Zimmerman, S.A., Sapon, S.M., 1958. Note on vowel duration
seen cross-linguistically. Journal of the Acoustical Society of
America 30, 152–153.
E. Mendoza et al. / Speech Communication 40 (2003) 431–447 447