Hiatus and diphthong: Acoustic cues and speech situation differences

18
Hiatus and diphthong: Acoustic cues and speech situation dierences Lourdes Aguilar 1 Departament de Filologia Espanyola, Universitat Aut onoma de Barcelona, Bellaterra, 08193 Barcelona, Spain Received 18 June 1997; received in revised form 8 August 1998; accepted 30 November 1998 Abstract The aim of this study is to determine the acoustic properties of hiatuses (vowel-vowel sequences) and diphthongs (glide-vowel sequences) in Spanish and to observe how these properties are modified depending on communicative factors. To do this, two groups of data were used: speech samples gathered from conversations between two speakers participating in the execution of a map task, in which the corpus items corresponded to toponyms, and the reading of the same sequences at a normal speaking rate. The comparison was done phonetically and phonologically: first, diphthongs and hiatuses were analyzed acoustically, studying their duration and spectral dynamics, and later, an in- ventory of diphthongizations and monophthongizations was made. Results show that hiatuses and diphthongs dier in the temporal and frequential domain: hiatuses have a longer duration and a greater degree of curvature in the F2 trajectory than diphthongs. Dierences between the two categories (hiatus and diphthong) exist in both communicative situations, although changes within each category due to the speech situation were also observed: sequences from the map task are shorter and the degree of curvature of their formant trajectories is lower than for the reading task. We have also found that vowel-vowel and glide-vowel sequences behave dierently in the way they are phonetically re- duced: a reduction axis can be drawn in which hiatuses become diphthongs, and diphthongs vowels. It is concluded that hiatus and diphthong are two phonetic categories which can be described on the basis of their acoustic characteristics and are subject, like any other phonetic category, to modifications due to a change in the communicative situa- tion. Ó 1999 Elsevier Science B.V. All rights reserved. Re ´sume ´ Cet etude a pour but de d eterminer les caract eristiques acoustiques des s equences voyelle-voyelle (hiatus) et semi- voyelle-voyelle (diphtongue) en espagnol ainsi que d’observer les modifications de ces caract eristiques selon les pro- priet es du contexte de communication. Nous avons utilis e deux groupes des donn ees: des echantillons provenant des dialogues entre deux locuteurs qui suivent la t^ ache de la carte g eographique, o u les items du corpus correspondent aux noms dans les cartes, et la lecture des m^ emes s equences. La comparaison a et e faite phon etiquement et phonologi- quement: d 0 abord, les dur ees et la dynamique spectrale des diphthongues et des hiatus ont et e analys ees, et apr es, les processus phon etiques de r eduction ont et e inventori es. Les r esultats montrent que la s equence voyelle-voyelle est di erente de la s equence semivoyelle-voyelle dans les domaines du temps et fr equence: les hiatus sont plus longs et ils ont une trajectoire de F2 plus courb ee que les diphthongues. D 0 autre c^ ot e, les processus de r eduction signalent la pr esence d 0 une ligne d 0 aaiblissement qui explique les prononciations des hiatus comme des diphthongues, et des diphthongues comme des voyelles. Il appara ^ ıt que les hiatus et les diphthongues sont deux cat egories phon etiques qui www.elsevier.nl/locate/specom Speech Communication 28 (1999) 57–74 1 E-mail: [email protected] 0167-6393/99/$ – see front matter Ó 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 6 3 9 3 ( 9 9 ) 0 0 0 0 3 - 5

Transcript of Hiatus and diphthong: Acoustic cues and speech situation differences

Hiatus and diphthong: Acoustic cues and speech situationdi�erences

Lourdes Aguilar 1

Departament de Filologia Espanyola, Universitat Aut�onoma de Barcelona, Bellaterra, 08193 Barcelona, Spain

Received 18 June 1997; received in revised form 8 August 1998; accepted 30 November 1998

Abstract

The aim of this study is to determine the acoustic properties of hiatuses (vowel-vowel sequences) and diphthongs

(glide-vowel sequences) in Spanish and to observe how these properties are modi®ed depending on communicative

factors. To do this, two groups of data were used: speech samples gathered from conversations between two speakers

participating in the execution of a map task, in which the corpus items corresponded to toponyms, and the reading of

the same sequences at a normal speaking rate. The comparison was done phonetically and phonologically: ®rst,

diphthongs and hiatuses were analyzed acoustically, studying their duration and spectral dynamics, and later, an in-

ventory of diphthongizations and monophthongizations was made. Results show that hiatuses and diphthongs di�er in

the temporal and frequential domain: hiatuses have a longer duration and a greater degree of curvature in the F2

trajectory than diphthongs. Di�erences between the two categories (hiatus and diphthong) exist in both communicative

situations, although changes within each category due to the speech situation were also observed: sequences from the

map task are shorter and the degree of curvature of their formant trajectories is lower than for the reading task. We

have also found that vowel-vowel and glide-vowel sequences behave di�erently in the way they are phonetically re-

duced: a reduction axis can be drawn in which hiatuses become diphthongs, and diphthongs vowels. It is concluded that

hiatus and diphthong are two phonetic categories which can be described on the basis of their acoustic characteristics

and are subject, like any other phonetic category, to modi®cations due to a change in the communicative situa-

tion. Ó 1999 Elsevier Science B.V. All rights reserved.

ReÂsumeÂ

Cet �etude a pour but de d�eterminer les caract�eristiques acoustiques des s�equences voyelle-voyelle (hiatus) et semi-

voyelle-voyelle (diphtongue) en espagnol ainsi que d'observer les modi®cations de ces caract�eristiques selon les pro-

priet�es du contexte de communication. Nous avons utilis�e deux groupes des donn�ees: des �echantillons provenant des

dialogues entre deux locuteurs qui suivent la tache de la carte g�eographique, o�u les items du corpus correspondent aux

noms dans les cartes, et la lecture des memes s�equences. La comparaison a �et�e faite phon�etiquement et phonologi-

quement: d0abord, les dur�ees et la dynamique spectrale des diphthongues et des hiatus ont �et�e analys�ees, et apr�es, les

processus phon�etiques de r�eduction ont �et�e inventori�es. Les r�esultats montrent que la s�equence voyelle-voyelle est

di��erente de la s�equence semivoyelle-voyelle dans les domaines du temps et fr�equence: les hiatus sont plus longs et ils

ont une trajectoire de F2 plus courb�ee que les diphthongues. D0autre cot�e, les processus de r�eduction signalent la

pr�esence d0une ligne d0a�aiblissement qui explique les prononciations des hiatus comme des diphthongues, et des

diphthongues comme des voyelles. Il apparaõt que les hiatus et les diphthongues sont deux cat�egories phon�etiques qui

www.elsevier.nl/locate/specomSpeech Communication 28 (1999) 57±74

1 E-mail: [email protected]

0167-6393/99/$ ± see front matter Ó 1999 Elsevier Science B.V. All rights reserved.

PII: S 0 1 6 7 - 6 3 9 3 ( 9 9 ) 0 0 0 0 3 - 5

peuvent etre d�ecrites selon leurs caract�eristiques acoustiques et qui suivent des processus de r�eduction, �egalement �ad0autres cat�egories phon�etiques, quand ils apparaõssent dans des contextes de prononciation relax�es. Ó 1999 Elsevier

Science B.V. All rights reserved.

Keywords: Vowel-vowel sequences; Glide-vowel sequences; Hiatus; Diphthong; Speech situation; Formant modeling; Spanish;

Phonetic reduction processes

1. Introduction

In the development of phonological theory,problems related to the phonological nature ofdiphthongs and, consequently, to the interpretat-ion of glides, arise repeatedly, as a result of thefocus of di�erent approaches in rules or represen-tations (Anderson, 1985). The questions are di-verse. Are diphthongs a single unit or are theyformed by two phonemes? (Navarro-TomaÂs, 1946;Alarcos, 1965). Is the glide a phoneme or is there aphonological process which changes the vowel intoa glide at the phonetic level? (Harris, 1971;Hualde, 1991); What is the role of the glide in thesyllabi®cation process? (Waksler, 1990; Morgan,1984). Nevertheless, phonetic properties are usu-ally absent from phonological explanations. Thecriteria determining the di�erence between glideand vowel, and consequently, between hiatus anddiphthong, are based in the syllabicity, but theproperty of syllabicity is not phonetically de®nedin a precise way. In the case of Spanish, onlyBorzone de Manrique (1976) has tried to charac-terize glides on an acoustic basis. In addition,phonetic descriptions of the hiatus-diphthongdistinction are rather vague, emphasizing the roleof the formant transition rate. If the transition islong and it takes place at a slow rate, the twovowels form a diphthong whereas if the transitionis short and quick, the vowels belong to di�erentsyllables (Borzone de Manrique, 1979; Quilis,1981). Borzone de Manrique has also paid atten-tion to the onset duration, transition duration ando�set duration ®nding di�erences in the targetvowel areas: the hiatus-diphthong distinction isindicated by the longer duration of the initial areaof hiatuses.

In traditional Spanish linguistics, it is custom-ary to describe diphthongs as the combination oftwo phonological vowels, one of which is /i/ or /u/,

in a single syllable. Hiatuses, however, are thecombination of two phonological vowels, one ofwhich is /i/ or /u/, in two syllables. In the phoneticmanifestation of diphthongs, it is generally ac-knowledged that a glide appears. From this pointof view, since /i/ and /u/ alternate their phonetics asa vowel or as a glide, we cannot attribute the hi-atus-diphthong di�erence to the type of vowel.Moreover, there are some examples of phonolog-ical contrast between vowel and glide, such as p�õe[0pi.e] (I/he/she/it cheeped) / pie [0p ] (foot) (Listento Signal A and B). It could be argued in this casethat the stress in the vowel /i/ is the factor deter-mining the presence of the hiatus or diphthong,but the stress cannot explain the di�erence with pi�e[pi.0e] (I cheeped) (Listen to Signal C). Thus, al-though stress is strongly linked to the hiatus-diphthong distinction, it is not the fact that ex-plains the contrast, as proved by the existence ofdiphthongs appearing in both stressed and un-stressed syllables, and of hiatuses with the stress inthe vowel /i u/ or in the other vowel. In the ®rstcase, the hiatus is called reverse and in the second,normal (RAE, 1973).

If we cannot use the type of vowel or the stressto explain the hiatus-diphthong distinction, wecannot rely on the position of the segment /i/ or /u/,the only vowels that alternate with glide, since thissegment can appear either in the ®rst or in thesecond position of the group, both in hiatuses anddiphthongs. Accordingly, rising and falling diph-thongs result: if the nucleus appears before theglide, we are dealing with a falling diphthong and,inversely, with a rising diphthong when the nu-cleus goes after the glide (Navarro-TomaÂs, 1918;RAE, 1973). Although it is not common to do so,we will use the same terminology applied to hia-tuses. As a summary, Fig. 1 o�ers the possibilitiesof vowel-vowel and vowel-glide sequences inSpanish.

58 L. Aguilar / Speech Communication 28 (1999) 57±74

Considering the above, we can infer that dif-ferences between hiatuses (vowel-vowel sequences)and diphthongs (glide-vowel and vowel-glide se-quences) is a genuine feature in Spanish. The factthat a sequence can be pronounced as a hiatus ±i.e. in two separate syllables ± or must be pro-nounced as a diphthong ± that is, in a singlesyllable ± is a lexical property: the acquisition of anew word implies the knowledge about its sylla-bi®cation. Although in some cases, both hiatusand diphthong are allowed, as proved by the list ofwords with two pronunciations such as cardiaco /card�õaco (cardiac), amoniaco / amon�õaco (ammo-nia), speakers have strong intuitions concerningthe right articulation of vowel sequences. Relatedto this, Hualde (1991) re¯ects on the shared intu-itions of ®ve Spanish speakers about the pronun-ciation in hiatus or diphthong of a series of words,such as du.e.to (short duet) and due.lo (grief)(Listen to Signal D and E). Nevertheless, a ten-dency to reduce hiatuses to diphthongs has alsobeen described when studying historical changes(MeneÂndez-Pidal, 1940) or processes due tospeaking rate changes (Harris, 1969). It is provedthen that problems associated to hiatus-diphthongdistinction are complex, involving diverse phoneticand phonological questions.

The purpose of the present study is to providedata for Spanish showing which are the acousticcues that distinguish hiatus from diphthong andwhich modi®cations are found due to a change inthe speech situation. The comparison will be done®rst, phonetically, studying the duration andspectral con®guration of hiatuses and diphthongs;and later, phonologically, observing the behavior

of vowel sequences with respect to phonetic re-ductions.

2. Experimental design

An experimental analysis of hiatuses and diph-thongs was undertaken in two communicativesituations. It was hypothesized that di�erencesbetween categories could be described, and thatthese categorial di�erences would be maintainedacross the speech situation changes. The analysiswas con®ned to a comparison of rising diphthongswith rising hiatuses and, for the sake of clarity,groups formed with /i/ ± palatal hiatus or diph-thong ± and groups formed with /u/ ± velar hiatusor diphthong ± were treated separately.

2.1. Corpus

To build up the corpus, the following variableswere considered:1. phonetic category: hiatus and diphthong;2. stress: presence or absence in diphthongs, but

the position on the ®rst element or on the sec-ond in hiatuses;

3. the vowel that follows the segment [i] or [u]: [a],[e] or [o].

According to these variables, 24 combinationswere obtained, as shown in Table 1, where thesequence, the Spanish word, the phonetic tran-scription and the English translation are presentedfor each item of the corpus.

The corpus consisted of 24 real words whosepronunciation in hiatus or in diphthong wasagreed upon by all the speakers participating in theexperiment. Since /i/ and /u/ are closed vowels,di�use consonants (labial, dental and alveolar)were chosen as the surrounding consonants ofvowel sequences. Due to lexicon gaps, we couldnot impose a uniform constraint on the sequenceposition in the word or the number of syllables.

2.2. Speech situations

In order to characterize the sample units acous-tically and to observe their appearance depending

Fig. 1. Spanish vowel-vowel and glide-vowel sequences.

L. Aguilar / Speech Communication 28 (1999) 57±74 59

on communicative factors, two groups of data weregathered: speech samples excerpted from conver-sations between two speakers participating in theexecution of a map task, and the reading of thesame sequences at a normal speaking rate.

We adapted for our purposes the model of theHCRC corpus (Anderson et al., 1991): one speakerhas a map of an imaginary zone with a printedroute going from an initial point to a ®nal site; theother speaker has a copy of the same map butwithout the route. The aim of the task is for oneparticipant to complete the map by following theinstructions given by the other participant. Ges-tures are not allowed and a barrier prevents visualaccess to the other participant's map.

Eliciting the corpus by means of this strategyhas several advantages (Anderson et al., 1991). Itallows us to consider the degree to which a com-municative act requiring cooperation a�ects lan-

guage use, since the goal can only be achieved bymeans of the verbal interaction of the speakers.Moreover, involvement in the task quickly shiftsthe speaker's attention away from his language,and unconstrained speech is obtained. With regardto experimental design, it o�ers the researcher thepossibility of predicting a percentage of the speechmaterial that will be found in the dialogues sincethe toponyms correspond to the items of the cor-pus. In addition, although the sentences are notidentical, coherence in corpus is assured becausethe dialogues share an objective: to reproduce aroute with a known shape and controlled com-plexity with a comparable number of toponyms.

It may be argued against this procedure that, infact, speakers are reading the names in the mapand as a consequence they are not using speechspontaneously. Nevertheless, it is not our aim todiscover the di�erences between scripted and

Table 1

Items of the corpus: sequence, Spanish word, phonetic transcription and English translation

Hiatus Diphthong

Reverse Normal Stresssed Unstressed

Palatal 0i.e i.0er�õete diedro piedra piedad

[0ri.e.te]

``laugh'' ``dihedron'' ``stone'' ``devotion''

0i.a i.0adr�õade piara piadoso piador

``nymph'' ``herd'' ``devout'' ``devout''

0i.o i.0or�õos [0ri.os] diodo diosa viol�õn

``rivers''

``diode'' ``goddess'' ``violin''

Velar 0u.e u.0ebamb�ues dueto duelo vuelillo

[bam.0bu.es] [du.0e.to]

``bamboos'' ``duet'' ``grief'' ``lace''

0u.a u.0afal�uas buhar santuario muar�e

[fa.0lu.as]

``launches'' ``to snort'' ``sanctuary'' ``moir�e''

0u.o u.0ob�uhos sinuoso muones monstruoso

[0bu.os] [si.nu.0o.so]

``eagle owls'' ``winding'' ``monkeys'' ``freakish''

60 L. Aguilar / Speech Communication 28 (1999) 57±74

unscripted speech or to identify the cues indicatingspontaneity in the discourse, but to compare a setof sequences in two communicative situations.

The words were selected from a larger corpusthat covers several questions related to the prob-lem of vowel-glide-consonant alternation inSpanish, such as the glide-vowel-consonant or thesemivowel-semiconsonant contrast, and the pho-netic behavior of [l n s] before glides and conso-nants. In the present study, the corpus is limited tothe sequences with the hiatus-diphthong distinc-tion.

Four maps were designed to reduce the di�-culty of the task and the time needed to perform it.Not all the items of the corpus appeared on everymap, but they were assigned by means of the Latinsquare procedure. Each map contained at least oneexample for the contrast [i]V / [ ]V and the con-trast [u]V / [ ]V for each vowel environment.There were two versions of each type of map: onewith the route printed (version A) ± whose coor-dinate points were obtained from a list of numbersgenerated at random in order to guarantee a sim-ilar degree of di�culty in the maps ± and the otherwithout it (version B). Fig. 2 o�ers an example of a

map and a piece of dialogue during the task ± thetoponyms not referred to in the piece of dialoguehave been deleted from the graph for the sake ofclarity.

To do the reading task, the words of the corpuswere inserted in carrier sentences. There were dif-ferent kinds of carrier sentences in order to avoidthe `list' e�ect, and the sentences were presented onseparate sheets to regulate the speech rate of thespeaker.

2.3. Speakers

Sixteen male speakers aged between 20 and 40,all high school or university graduates, partici-pated in the experiment. For each speaker, samplesof both speech situations (reading task and dia-logue task) were gathered.

2.4. Recording

The recording sessions took place in a sound-treated room, large enough to allow the speakersto sit comfortably facing each other when execut-ing the map task, using a Tascam 112 cassette

Fig. 2. Example of a map and piece of dialogue during the task.

L. Aguilar / Speech Communication 28 (1999) 57±74 61

recorder and a Sennheiser MKH20 microphone.Half of the speakers started the session with themap task and the other half with the list readingbut recording of both speech situations was doneon the same day for each speaker. While thereading of the sentences was done twice at a nor-mal speech rate by each speaker individually, tocarry out the map task, speakers were organised inpairs, in which both speakers were familiar witheach other, in order to increase the naturalness ofthe conversation. In the map task, each subjectacted twice: once as an instruction giver and onceas an instruction follower, with a di�erent map ineach dialogue so as to obtain examples of all thecorpus items.

The assigning of maps to each of the four pairswas done following the Latin square procedure toavoid the presentation-order e�ect. Pair A in their®rst conversation had map 1, in their secondconversation map 2, in their third conversation,map 3 and in their fourth conversation, map 4;pair B followed the order 2, 3, 4, 1; pair C ®rstperformed map 3, then, map 4, map 1 and map 2;®nally, the presentation for pair D was 4, 1, 2, 3.

3. Procedure of analysis

The vowel sequences were digitized at 16 kHzand excerpted from the recordings manually, re-lying on the waveform and spectrographic dis-plays, using the speech analysis software Waves+.In the temporal domain, the global duration of thegroup was determined, since we were dealing withthe hiatus-diphthong distinction, not the vowel-glide contrast.

With regard to the frequency domain, wewanted to avoid segmentation problems that ap-pear, particularly in unscripted speech, followingtraditional procedures of diphthong analysis,which involved segmenting the sequence in severalareas, usually three, corresponding to an initialsegment, a transition and a ®nal segment (Lehisteand Peterson, 1961; Gay, 1968; Burgess, 1969;Borzone de Manrique, 1979; Jha, 1985; Maddie-son and Emmorey, 1985). Here a 14-order LPCanalysis was performed every 10 ms with a 20 mswindow by means of the speech analysis software

Waves+. This modeling of formant trajectoriesrespects the properties of formants found in speechsequences, mainly transient and continuous, and isin line with other calculations found in works suchas Yang (1987), Carre and Mrayati (1991) andClermont (1993).

After the LPC-analysis, the second-order poly-nomial equation which best ®tted the F1 and F2trajectories resulting from the LPC-tracking wascalculated for each item. The decision concerningthe use of second-order polynomials was adoptedafter regarding the spectrograms of all vowel se-quences (hiatus and diphthong) in the corpus.Observations can be summarized as follows. Hia-tuses can have two steady parts in their formanttracks, corresponding to each of the vowels in thegroup, but in general it appears that the ®rst andthe second formant move to the target of the ®rstvowel and then to the target of the following vowelwithout reaching it. Fig. 3 is a representative ex-ample of a hiatus, with parabolic formant shapes.

As for diphthongs, two steady-state parts arerarely present in their formant tracks, instead, acontinuous transition from one frequency area toanother is generally observed. Given that this pa-per focused in a comparison of hiatus and diph-thong, a procedure that allowed to compare thetwo kind of sequences was searched, and therefore,second-order polynomials were selected even if forsome hiatuses a third-order modeling would have

Fig. 3. Illustration of the procedure of analysis: LPC analysis

and formant extraction of F1 and F2 of the sequence [0ia] and

the associated second-order equation.

62 L. Aguilar / Speech Communication 28 (1999) 57±74

been more accurate. To support the decision, theindex correlation between the curve and the LPC-track was calculated obtaining a mean index ofcorrelation of 0.91.

In order to handle with sequences of di�erentduration, a temporal normalisation was applied inthe [ÿ1, 1] interval; so for each formant of everysequence an equation of the type F �x� � ax2 �bx� c; ÿ1 < x < 1 was obtained, where a repre-sents the degree and the direction of the curvature.Fig. 3 illustrates the procedure with a sequence[0ia]: the LPC analysis and the formant values forF1 and F2 are depicted together with the associ-ated second-order equation.

The total number of sequences analyzed in theexperiment was 1082. In the reading corpus 768cases (24 words ´ 16 speakers ´ 2 repetitions) wereanalyzed, while in the dialogue corpus, the numberof cases decreased to 638 due to the fact that twosamples for each speaker were not always avail-able.

4. Results

Results are organised according to the two ob-jectives of the study. (a) The search for acousticcues for hiatuses and diphthongs, in the temporaland frequency domain, ®rst separately for eachspeech situation, and later, comparing both speechsituations. (b) The observation of the behavior ofvowel sequences with respect to phonetic reduc-tions. Related to this, it is necessary to mentionthat when the vowel-vowel or glide-vowel se-quences underwent a process of reduction fromtheir intended category, namely, when hiatuseswere diphthongized or diphthongs were mono-phthongized, they were not included in the set ofexamples used to determine the acoustic cues ofthe categories.

4.1. Acoustic cues: di�erences between hiatuses anddiphthongs

4.1.1. Temporal domain4.1.1.1. Reading task. Temporal di�erences be-tween hiatuses and diphthongs coming from thereading corpus were as follows: hiatuses were

longer than diphthongs in both palatal and velargroups, as shown in Fig. 4, which compares meanduration values. If we pool the data of palatal andvelar groups, we obtain a mean duration of 193 ms(s.d.� 44) for a vowel-vowel sequence and 141 ms(s.d.� 37) for a glide-vowel sequence, and ananalysis of variance (ANOVA) with the groupingfactor `category' (hiatus, diphthong) reveals sig-ni®cant di�erences (F� 457, p < 0.001). Taking asreference the duration of the diphthong, due to itshigher rate of occurrence in Spanish, we can es-tablish a percentage increase in hiatus length of36% and more speci®cally, of 26% for palatal hi-atus and 47% for velar hiatus.

Besides temporal di�erences between hiatus anddiphthong, we have also considered the e�ect ofstress and vowel environment. Fig. 5 compares themean duration for vowel-vowel and glide-vowelsequences separately for each vowel context. It canbe noted that irrespective of the di�erences be-tween the groups, a hiatus is always longer than adiphthong. In order to determine if these di�er-ences are signi®cant, a two-way ANOVA with thegrouping factor `category' (hiatus, diphthong) andthe grouping factor `vowel context' (a, e, o) wasdone separately for palatal and velar sets. In thepalatal set, both factors `category' and `vowelcontext' were signi®cant with F� 159, p < 0.001and F� 21, p < 0.001, respectively. The interac-tion between the two factors was not signi®cant

Fig. 4. Mean duration values (in ms) of hiatuses and diph-

thongs in the reading corpus.

L. Aguilar / Speech Communication 28 (1999) 57±74 63

(F� 3, p > 0.05). In the velar set, both factorswere again signi®cant: `category' (F� 183,p < 0.001), and `vowel context' (F� 3, p < 0.05).In this case, however, an interaction appeared(F� 8, p < 0.05). This is due to the fact that thegradation in duration is reverse for hiatuses anddiphthongs: hiatuses with [a] are longer than hia-tuses with [e] and [o], whereas for diphthongs, thebehavior is the opposite.

As far as stress is concerned, comparisons wereonly done within categories, since the manifesta-tion of stress is not identical for each. A diphthongis either entirely stressed or unstressed whereas ahiatus is always stressed, but only in its ®rst part(reverse hiatus) or its second part (normal hiatus).Nevertheless, it is worthwhile noting that despitethese di�erences, in both kind of sequences, stressdetermines the duration of the group. It can beobserved in Fig. 6 that, ®rstly, diphthongs show alonger duration in stressed contexts than in un-stressed contexts; and secondly, hiatuses are longerwhen the stress appears in the vowel [i u]. AnANOVA was done on the hiatus duration with thegrouping factor `stress position' (appearance ofstress in the ®rst vowel or in the second one),which turned out to be signi®cant (F� 172,p < 0.001); and an ANOVA on the diphthongduration with the grouping factor `stress' (presenceor absence of stress) showed signi®cant di�erences(F� 58, p < 0.001).

Results show that although di�erences due tovowel environment and stress arises, the hiatus-diphthong distinction is maintained.

4.1.1.2. Dialogue task. Hiatuses gathered from thedialogue corpus also present a longer durationthan diphthongs. The mean duration for a palatalhiatus is 168 ms (s.d.� 39), whereas for a diph-thong it is 119 ms (s.d.� 23); the mean value for avelar hiatus is 159 ms (s.d.� 43) and for a velardiphthong 107 ms (s.d.� 27). Palatal hiatuses aretherefore 41% longer than palatal diphthongs andvelar hiatuses are lengthened by 49% compared tovelar diphthongs. An ANOVA was done on thesequence duration with the grouping factor `cate-gory' (hiatus, diphthong) which turned out to besigni®cant when palatal groups were considered(F� 148, p < 0.001), as well as when velar groupswere analyzed (F� 143, p < 0.001). Pooling thedata for palatal and velar groups, we obtain amean duration of 163 ms (s.d.� 41) for vowel-vowel sequences, and of 113 ms (s.d.� 26) forglide-vowel sequences.

At this point, di�erences due to vowel envi-ronment were examined. Fig. 7 shows the meanduration values for each vowel combination ob-tained in the corpus of dialogues. Note that hia-tuses are always longer than diphthongsindependently of the surrounding vowels. A two-way ANOVA of variance with the grouping factor

Fig. 5. Mean duration values (in ms) of hiatuses and diph-

thongs for each vowel environment in the reading corpus.

Fig. 6. Mean duration values (in ms) of normal and reverse

hiatuses, and diphthongs in stressed and unstressed contexts in

the reading corpus.

64 L. Aguilar / Speech Communication 28 (1999) 57±74

`category' (hiatus, diphthong) and the groupingfactor `vowel context' (a, e, o) showed signi®cantdi�erences in palatal sequences due to both thefactor `category' (F� 131, p < 0.001) and thefactor `vowel context' (F� 3, p < 0.05); the inter-action between the factors was not signi®cant(F� 0.62, p > 0.05).

The same type of analysis was done on velarcombinations: the factor `category' turned out tobe signi®cant (F� 144, p < 0.001), just as thefactor `vowel context' (F� 8, p < 0.001); however,there was interaction (F� 4, p < 0.05). This e�ectis explained by the fact that gradation in sequenceduration is not the same in hiatuses as in diph-thongs, as can be inferred from data in Fig. 7.

With respect to stress, comparisons were donefor hiatuses and diphthongs separately. Fig. 8gives the mean duration values for normal andreverse hiatuses as well as for diphthongs instressed and unstressed syllables obtained in thedialogue corpus. The results show that diphthongsare longer when appearing in a stressed syllableand hiatuses with the stress in /i u/ are longer thanhiatuses with the stress in the second element ofthe group, namely, in elements other than /i u/. AnANOVA was done on the hiatus duration with thegrouping factor `stress position' (appearance ofstress in the ®rst vowel or in the second one),which was signi®cant (F� 24, p < 0.001). As fordiphthong duration, an ANOVA with the factor

`stress' (presence or absence of stress) was done,and di�erences between groups were shown to besigni®cant (F� 12, p < 0.001).

4.1.2. Frequency domainThe main hypothesis in the frequency domain

for this study was that the degree of curvature ofthe formant trajectories is a parameter which candi�erentiate hiatuses and diphthongs. Speci®cally,we wanted to know if these di�erences exist, and ifthey do, whether the degree of curvature is eithergreater or smaller for a hiatus than for a diph-thong. Only diphthongs appearing in a stressedenvironment and reverse hiatuses were studied.The analysis procedure speci®ed in Section 3 wasperformed obtaining an equation of the typeF �x� � ax2 � bx� c; ÿ1 < x < 1 for F1 and F2for each unit of the corpus.

4.1.2.1. Reading task. Table 2 presents the valuesof the coe�cients a, b, c of the equation ax2 � bx�c when pooling all the data from the reading cor-pus. Thus, an idealized representation of the F1±F2 trajectories for the vowel-vowel sequences [0i.a0i.e 0i.o 0u.a 0u.e 0u.o] and the glide-vowel sequences[ ] was obtained.

If we compare the coe�cients a of F1 for hia-tuses and diphthongs in Table 2, we should notethat the degree of curvature is higher for the for-mer. The same can be said for F2, except for the

Fig. 7. Mean duration values (in ms) for hiatuses and diph-

thongs for each vowel environment in the dialogue corpus.

Fig. 8. Mean duration values (in ms) for hiatuses and diph-

thongs in stressed and unstressed contexts in the dialogue cor-

pus.

L. Aguilar / Speech Communication 28 (1999) 57±74 65

comparison [0u.o] versus [ ]. Fig. 9 o�ers agraphical representation in the domain F1±F2 ofthe mean values of the a coe�cients of the second-order polynomial of F1 and F2. A displacement ofdiphthongs towards more central areas of thespace can be observed.

In order to determine if the di�erences in thecoe�cient a of the second-order polynomial for F1and F2 trajectories are statistically signi®cant, anANOVA with the grouping factor `category' (hi-atus, diphthong) was done separately for eachvowel combination, given the di�erent end pointsof their formant tracks. The results of the tests,summarized in Table 3, show that in palatalgroups, the coe�cient a is signi®cantly di�erent inhiatuses and diphthongs for both F1 and F2 tra-jectories, with the exception of the coe�cient a inF1 trajectory for ie. For velar sequences, thecombination ua shows di�erences in both coe�-cients a for F1 and F2 trajectories, while ue onlymanifests a distinction in the F2 trajectory andthere is no di�erence found for uo.

Hence, if we take into account the degree ofcurvature in formant tracks, it is possible to dif-ferentiate vowel-vowel sequences from glide-vowelsequences. In general, they present an a coe�cientgreater than the corresponding diphthongs, indi-cating that in hiatus, the movement of F2 has tocover a greater di�erence in frequency betweentwo points. This result is in line with the obser-vation that hiatuses have a longer duration thandiphthongs.

4.1.2.2. Dialogue task. In Table 4, the mean valuesof the coe�cients of the polynomial equationax2 � bx� c, associated with F1 and F2 trajecto-ries of hiatuses and diphthongs extracted from thedialogue corpus are presented. Similar tests asthose applied to the reading data were performed.

If we compare the a coe�cients of F1 of diph-thongs and hiatuses, we notice a lack of consis-tency across vowel groups as far as the category

Table 2

Mean coe�cient values of the polynomial equation ax2 � bx� cfor F1 and F2 trajectories of hiatuses and diphthongs from the

reading corpus

F1 F2

a b c a b c

[0ia] 108 180 390 ÿ498 ÿ266 2171

[ ] 70 213 442 ÿ337 ÿ293 2070

[0ie] 61 79 350 ÿ371 ÿ21 2204

[ ] 49 101 365 ÿ283 ÿ149 2183

[0io] 72 100 365 ÿ588 ÿ257 2098

[ ] 32 114 369 ÿ432 ÿ493 1990

[0ua] 110 186 404 352 318 886

[ ] ÿ12 206 511 74 380 975

[0ue] 24 101 380 524 677 906

[ ] 12 111 413 251 328 994

[0uo] 38 100 359 138 228 791

[ ] 22 137 409 148 229 775

Fig. 9. Mean values of the coe�cients a of the polynomial

equation ax2 � bx� c of F1 and F2 for hiatuses and diphthongs

from the reading corpus.

Table 3

ANOVA table for di�erences between hiatuses and diphthongs

in the coe�cients a of the polynomial equations ax2 � bx� cassociated with F1 and F2 trajectories: reading corpus

a F1 a F2

ia * F� 4, p < 0.05 * F� 24, p < 0.001

ie F� 0.49, p > 0.05 * F� 8, p < 0.001

io * F� 6, p < 0.05 * F� 10, p < 0.05

ua * F� 30, p < 0.001 * F� 74, p < 0.001

ue F� 0.19, p > 0.05 * F� 42, p < 0.001

uo F� 0.70, p > 0.05 F� 0.22, p > 0.05

66 L. Aguilar / Speech Communication 28 (1999) 57±74

(hiatus or diphthong) is concerned. This coe�cientis greater for the diphthong in ia, io combinations,but is smaller in the case of the ie, ue and uogroups. In addition, in the comparison betweenthe diphthong and the hiatus [ua], the curvehas a di�erent direction. On the contrary, wefound that for F2, a greater coe�cient alwayscorresponds to the hiatuses.

The representation in the domain x-y of themean values of the a coe�cients of the second-

order polynomial of F1 and F2 is given in Fig. 10where the same tendency to occupy more centralspaces found in the reading corpus is observed.

An ANOVA with the grouping factor `catego-ry' (hiatus, diphthong) was done for each vowelcombination on a coe�cients of the polynomialequation ax2 � bx� c for F1 and F2 trajectories.As summarized in Table 5, the displacement of F1is only signi®cant in ua whereas the curve of thesecond formant track shows di�erences in all thegroups except uo. This special behavior, also ob-served in the reading data, reveals a problem in thecorpus maybe due to the fact that the sequencesintegrated by the vowels [u] and [o] do not belongto the common lexicon, but only appear in re-stricted lexical domains. Consequently, thespeaker tends to adopt a hyperarticulate pronun-ciation, thus minimizing the di�erences betweenhiatus and diphthong.

4.2. Acoustic cues: the e�ect of the communicativesituation

When comparing the sequences obtained in thereading corpus with those in the dialogue corpus,we observe a drop in duration time in the secondcommunicative situation: hiatuses are reduced by15%, and diphthongs by 20%. The number ofcases, mean values and standard deviations arepresented in Table 6 and a two-way ANOVA withthe grouping factor `category' (hiatus, diphthong)and the grouping factor `speech situation' (readingtask, dialogue task) showed a signi®cant e�ect ofboth the factor `category' (F� 544, p < 0.001) andthe factor `speech situation' (F� 173, p < 0.001)

Fig. 10. Mean values of the coe�cients a of the polynomial

equation ax2 � bx� c for F1 and F2 of hiatuses and diphthongs

from the dialogue corpus.

Table 5

ANOVA table for di�erences between hiatuses and diphthongs

in the coe�cients a of the polynomial equations ax2 � bx� cassociated with F1 and F2 trajectories: dialogue corpus

a F1 a F2

ia F� 1, p > 0.05 * F� 16, p < 0.001

ie F� 1, p > 0.05 * F� 15, p < 0.001

io F� 1, p > 0.05 * F� 16, p < 0.001

ua * F� 31, p < 0.05 * F� 21, p < 0.001

ue F� 1, p > 0.05 * F� 62, p < 0.001

uo F� 2, p > 0.05 F� 2, p > 0.05

Table 4

Mean coe�cient values of the polynomial equation ax2 � bx� cassociated with the F1 and F2 trajectories in hiatuses and

diphthongs from the dialogue corpus

F1 F2

a b c a b c

[0ia] 47 190 432 ÿ419 ÿ273 2134

73 183 429 ÿ255 ÿ213 1991

[0ie] 86 90 357 ÿ359 ÿ13 2162

57 116 369 ÿ198 ÿ149 2083

[0io] 30 87 395 ÿ507 ÿ289 1998

46 95 389 ÿ295 ÿ268 1955

[0ua] 128 141 373 302 200 880

ÿ51 176 505 44 331 1045

[0ue] 35 120 382 465 559 891

20 109 426 144 276 1059

[0uo] 57 79 346 171 157 734

18 103 410 120 198 818

L. Aguilar / Speech Communication 28 (1999) 57±74 67

on the duration of the group. There was no sig-ni®cant interaction (F� 0.2, p > 0.05).

Nevertheless, this reduction in duration is oftenfollowed by changes in formant trajectories. F2has a more curved shape in read materials, and aless pronounced degree of curvature in sequencesobtained using the map task. Di�erences betweenthe read sentences and the dialogues a�ectingvowel sequences can be observed by plotting the acoe�cients of the equation ax2 � bx� c for F1 andF2 as in Figs. 11 and 12.

In order to determine if di�erences exist due toa change in the speech situation, a two-way AN-OVA with the grouping factor `speech situation'(reading task, dialogue task) and the groupingfactor `category' (hiatus, diphthong) has been ap-plied for each vowel combination. Tables 7 and 8summarize the results for F1 and F2 trajectories,respectively. With respect to palatal groups, sig-ni®cant di�erences due to a move in speech situa-

tion are not found for the coe�cient a of F1(p > 0.05 in all comparisons) but they arise for thecoe�cient a of F2: ia (F� 10, p < 0.01), ie (F� 4,p < 0.05), io (F� 9, p < 0.05). As for velargroups, no signi®cant di�erence has been found inthe a coe�cients of F1 when comparing speechsituations (p > 0.05 in all comparisons) whereasfor the a coe�cients of F2, signi®cant di�erencesare only manifest for ue (F� 8, p < 0.001).

We conclude therefore that the main di�erencesare manifest in the F2 trajectory in line with otherstudies such as Lindblom (1963) or van Bergem(1993). However, despite the di�erences betweenread sequences and sequences obtained in a dia-logue-oriented task, the temporal and frequencyrelations related to the hiatus-diphthong distinc-tion are maintained across changes in the speechsituation. In both reading and dialogues, hiatus islonger than diphthong, independent of vowelcontext or stress; and in both corpora, the degreeof curvature of the F2 trajectory in hiatuses isgreater than in diphthongs.

4.3. Reduction processes

The main characteristic of a communicativesituation such as participating in the map task,where subjects engaged in a common goal cease tobe aware of their own speech, is the presence ofsegmental reductions. These reductions involve

Fig. 11. Mean values of the coe�cients a of the equation ax2 �bx� c of F1 and F2 of hiatuses in the reading and dialogue

corpora.

Fig. 12. Mean values of the coe�cients a of the equation ax2 �bx� c associated with F1 and F2 of diphthongs in the reading

and dialogue corpora.

Table 6

Number of cases (n), mean values (x) and standard deviation

(s.d.) of the duration of hiatuses and diphthongs in reading and

dialogues

Hiatus Diphthong

n x s.d. n x s.d.

Reading 372 193 44 367 141 37

Dialogues 231 163 42 288 113 25

68 L. Aguilar / Speech Communication 28 (1999) 57±74

consonants, vowels and even whole words; how-ever, here we will only consider reduction pro-cesses a�ecting vowel groups. Regarding thesegroups, three phenomena can be observed:1. diphthongization, where a hiatus becomes a

diphthong,2. deletion in a hiatus, and3. vocalisation of a diphthong, manifested as a fu-

sion in an intermediate element, which sharesthe properties of the original segments of thegroup, or as a deletion of one of the segments.

It can be noted that deletion in a hiatus and vo-calisation of a diphthong can be considered to-gether as monophthongization processes.

The appearance of each phonological processhas been determined observing the acoustic be-havior of the sequence by means of spectrogramsand oscillograms. On the one hand, we say that ahiatus has become a diphthong when at least twoof the following conditions are met: (a) there arenot vowel targets but a continuous formantmovement from a frequency area to another; (b)F1 and F2 are situated at lower frequency areasthan the intended hiatuses; (c) duration is reduced.Vowel targets are de®ned, following Lehiste and

Peterson (1961) as the time interval in whichformants remain parallel to the time axis. Tem-poral and frequency patterns of the so-called `in-tended hiatuses' are taken from a set of samplesextracted from the dialogue task (but not includedin the analyzed corpus), in which categorizationsbetween hiatus and diphthong were done by ear bythree phoneticians. They were required to markpositively (Y: yes) those listened samples that willbe unambiguously identi®ed as hiatus in whatevercontext; and negatively (N: no) if they doubtedabout their hiatus-diphthong nature.

Frequency and duration values of selected hi-atuses were obtained and their mean and devia-tion data were used to determine if adiphthongization has occurred. Therefore, al-lowed deviations are those appearing in the rangebetween maximum and minimum values. The re-sults of a diphthongization are not meant to beequal to the same `intended' diphthong (that is,when the vocalic sequence is lexically a diph-thong), but extremely di�erent than the expectedfor a hiatus.

On the other hand, monophthongizations (eitherin a hiatus or a diphthong) were identi®ed using the

Table 8

ANOVA table for di�erences between hiatuses and diphthongs in the coe�cients a of the polynomial equations ax2 � bx� c associated

with the F2 trajectory

`speech situation' `category' interaction

ia * F� 10, p < 0.001 * F� 39, p < 0.001 F� 0.04, p > 0.05

ie * F� 4, p < 0.05 * F� 25, p < 0.001 F� 2, p > 0.05

io * F� 9, p < 0.001 * F� 25, p < 0.001 F� 0.6, p > 0.05

ua F� 2, p > 0.05 * F� 77, p < 0.001 F� 0.1, p > 0.05

ue * F� 8, p < 0.001 *F� 101, p < 0.001 F� 0.8, p > 0.05

uo F� 0.01, p > 0.05 F� 0.9, p > 0.05 F� 2, p > 0.05

Table 7

ANOVA table for di�erences between hiatuses and diphthongs in the coe�cients a of the polynomial equations ax2 � bx� c associated

with the F1 trajectory

`speech situation' `category' interaction

ia F� 4, p > 0.05 F� 0.1, p > 0.05 * F� 4, p < 0.05

ie F� 1, p > 0.05 F� 2, p > 0.05 F� 0.4, p > 0.05

io F� 1, p > 0.05 F� 1, p > 0.05 * F� 5, p < 0.05

ua F� 0.2, p > 0.05 * F� 59, p < 0.05 F� 2, p > 0.05

ue F� 0.3, p > 0.05 F� 0.8, p > 0.05 F� 0.1, p > 0.05

uo F� 0.2, p > 0.05 F� 3, p > 0.05 F� 0.5, p > 0.05

L. Aguilar / Speech Communication 28 (1999) 57±74 69

spectrographic displays when only a single vowelcould be observed. The values of these vowels are inthe range of duration and frequency values ob-tained in the analysis of a set of vowels in conso-nantal context extracted from the dialogue task.

Fig. 13 illustrates these processes with se-quences taken from the corpus of dialogues. In the®rst example ± the sequence [0io] pronounced as adiphthong (Listen to Signal D) ± it is not possibleto determine vowel targets, but instead we observea continuous transition from a frequency area toanother. Moreover, the duration is shorter com-pared with the sum of two vowels, and the fre-quencies corresponding to [i] and [o] have beencentralized. In the second example ± the groupreduced to [e] (Listen to Signal E) ± we consider a

deletion since it is not possible to segment twoelements. And ®nally, the third representationcorresponds to the diphthong [ ] manifested as[o]; similar to the previous case, a single phoneticsegment is observed (Listen to Signal F).

Although both hiatuses and diphthongs aresubject to phonetic reductions in the corpus ofdialogues, the percentage of occurrence for the

®rst is not so high. Overall, vowel-vowel sequencesare reduced to a diphthong in 9% and to a vowel in2% of cases, whereas diphthongs are reduced to avowel in 22% of cases.

Nevertheless, as can be observed in Table 9, allthe hiatuses of the corpus present at least one caseof diphthongization. Deletion is a less frequentprocess that has been observed only for palatalhiatus (5%) and in general it a�ects the secondelement. All the palatal groups that were reducedwere pronounced as a vowel [i], apart from [0ie]which appeared as an [e].

With respect to diphthongs, all of them showedat least one case of reduction, except the diph-thong in a stressed syllable. Table 10 presentsthe number of deletion cases found in the diph-thongs analyzed in the dialogue corpus and thesolution adopted for each case. A higher degree ofreduction was observed in the case of palatalgroups (23%) compared to velar ones (20%). Andif we examine the e�ect of stress, we can say thatdiphthongs appearing in unstressed syllables havea higher percentage of reduction (23%) than thoseappearing in stressed environments (20%).

Fig. 13. Examples of the reduction processes found in the dialogue corpus: [0io] reduced to in `diodo', [0ie] reduced to [e] in `r�õete'

and reduced to [o] in `duelo'.

70 L. Aguilar / Speech Communication 28 (1999) 57±74

In relation to the phonetic result of the reduc-tion, Table 11 presents the number of cases ofdeletion of the ®rst element, the second element,and the number of cases in which the result is anew element for each monophthongization foundin the corpus. A strong tendency to maintain the

®rst element in the group can be observed. 60% ofdiphthongs show an elision of the ®nal element,which is also the open element and the syllabicnucleus, compared with 13% where deletion a�ectsthe initial element; even the percentage corre-sponding to the appearance of a new element ishigher (26%).

Considering that for rising diphthongs, the ®rstelement is the glide, which is the phoneticallyclosest in the group, our phonological intuitionsclash with the phonetic behavior observed. Ifphonetic reduction were related to a strength hi-erarchy of segments, a higher percentage of dele-tion of the glide would be expected. The samecould be said if syllabic role was taken into ac-count: given that the vowel occupies the nucleusposition, the element most likely to undergo aphonetic change should be the glide. But, in con-trast to these considerations, in the presence ofphonetic restructuring, a preference towards de-leting the second element in the group is found;that is to say, there is a primacy of the position ofthe element, regardless of its vowel or semiconso-nant nature. Regarding this point, it could be hy-pothesized that the position in the syllabic groupexerts a stronger in¯uence than the phonetic cat-egory of the element.

5. Discussion and conclusions

From the results, two aspects can be highlight-ed: the modeling of hiatuses and diphthongs andthe appearance of phonetic reductions in a corpus

Table 10

Number of diphthongs analyzed in the dialogue corpus (n tot),

number of cases appearing as a vowel (n red) and the vowel

observed. When there are several results, the number of cases of

each result is given (n)

n tot n red vowel n

[ ] 39 7 [i] 5

[o] 1

[e] 1

[ ] 31 2 [i] 1

[o] 1

[ ] 29 1 [i]

[ ] 34 8 [i] 6

[e] 2

[ ] 33 21 [i] 20

[e] 1

[ ] 41 14 [o] 6

[u] 8

[ ] 30 2 [o]

[ ] 30 4 [a] 1

[o] 3

[ ] 36 9 [o] 8

[u] 1

[ ] 28 1 [u]

[ ] 21 7 [o] 4

[u] 1

Table 11

Number of diphthongs reduced to a vowel in the dialogue

corpus (n red), number of cases of deletion of the ®rst element

(n ®rst), number of cases of deletion of the second element (n

second), number of cases in which the result is a new element (n

new)

n red n ®rst n second n new

[ ] 9 2 6 1

[ ] 1 1

[ ] 29 3 26

[ ] 16 8 8

[ ] 13 1 1 11

[ ] 8 4 4

76 10 46 20

Table 9

Number of cases analyzed in the dialogue corpus (n tot),

number of sequences appearing as diphthongs (n diph), number

of sequences appearing as a vowel (n vowel) and the resulting

vowel

n tot n diph n vowel

[0io] 28 3 3 [i]

[i0o] 19 2 1 [i]

[0ia] 34 1 1 [i]

[i0a] 20 1

[0ie] 20 2 1 [e]

[i0e] 11 2

[0ue] 31 3

[u0e] 26 3

[0ua] 19 3

[u0a] 28 2

[0uo] 17 1

L. Aguilar / Speech Communication 28 (1999) 57±74 71

of dialogues. Regarding the former, the resultsshowed that the hiatus-diphthong distinction isacoustically signaled by changes in duration and informant trajectories, especially of F2: hiatusesshow a longer duration and a greater degree ofcurvature of the F2 trajectory than diphthongs.These results are in line with those reported byother authors such as Borzone de Manrique (1979)or Quilis (1981) as far as duration is concerned.

With respect to the frequency description ofdiphthongs, the main di�erence described has beenthe formant rate of F2 (Lehiste and Peterson,1961; Gay, 1968; Burgess, 1969; Borzone deManrique, 1979; Jha, 1985; Maddieson and Em-morey, 1985). The analysis procedure, however,separates this study from those mentioned since inthem data are obtained from the segmentation ofthe sequence in the initial vowel target, the tran-sition and the ®nal vowel target, and the subse-quent calculation of the transition formant rate.This method, based on the extraction of formantvalues at the points closest to ideal targets hidesthe information concerning the dynamics of for-mant trajectories.

Moreover, in the works mentioned above theintention was to study the e�ect of certain vari-ables, such as speaking rate, on the acousticmanifestation of diphthongs, not to di�erentiatehiatuses from diphthongs. Diphthong modi®ca-tions depending on changes in speaking rate havebeen analyzed for Spanish (Borzone de Manrique,1979), American English (Gay, 1968) and Maithili(Jha, 1985). Gay (1968) proves that the initialfrequency and the F2 formant rate change inAmerican English diphthongs are independentfrom di�erences in speaking rate, and Borzone deManrique (1979) and Jha (1985) have reporteddata in favour of this hypothesis: the formant ratechange in both Spanish and Maithili is maintainedacross speaking rate modi®cations. In contrast,Toledo and AntonÄanzas-Barroso (1987) indicatedi�erences in F2 formant rate due to speaking rate:the faster the speaking rate is, the further from theideal value the formant frequency is.

However, it should be pointed out that thede®nition of `speaking rate' is not shared by all theauthors, which explains the divergences in theirresults. While Gay (1968) and Toledo and Anto-

nÄanzas-Barroso (1987) gather their data from thereading of carrier sentences at three speaking rates,in other works such as Borzone de Manrique(1979), the variable `speaking rate' is related to`context'; thus, isolated sequences are consideredto be slow, words moderate, and carrier sentences,fast.

In the present study, the speaking rate has notbeen examined, instead we investigated the e�ectof a change in speech situation on vowel se-quences. Regarding this, all the sequences, as wehave already explained, have a shorter duration inthe dialogue corpus: hiatuses are reduced by 15%and diphthongs by 20%.

Besides the description of diphthongs, what weare mainly concerned with is the comparison ofhiatus and diphthong, a problem which has beenstudied less. At this point, we share the hypothesisproposed by Ren (1986) according to which thehiatus-diphthong distinction should be acousti-cally re¯ected. The idea is that the syllable com-ponents are planned before the phoneticrealization; so, the acoustic result of hiatus ± twovowels in two syllables ± has to be necessarilydi�erent from the acoustic result of diphthong ±two vowels in a syllable, which requires a re-structuring in time and frequency to adjust bothvocalic segments to the syllable frame. Related tothis, Quilis (1981) observes that hiatuses have aquicker transition than diphthongs, but he doesnot support this with any quantitative data. AndBorzone de Manrique (1979) does not ®nd relevantdi�erences in the formant rate change betweendiphthongs and hiatuses, only the duration of thetarget zones changes, as we have already men-tioned.

The approach presented here is quite di�erent,in line with other dynamic analysis procedures thattry to respect the properties of sequences in speech(Yang, 1987; Carre and Mrayati, 1991; Clermont,1993). In order to ®nd a procedure that allows usto describe diphthongs versus hiatuses, and at thesame time, avoid segmentation problems, partic-ularly important in unscripted speech, the formanttrajectory in the vowel sequence has been con-verted into a polynomial equation ax2 � bx� c.From the results, it is inferred that F2 curvatureacts as an acoustic cue for discriminating between

72 L. Aguilar / Speech Communication 28 (1999) 57±74

the two kinds of vowel sequences: hiatuses presenta greater degree of curvature than diphthongs.Moreover, di�erences in temporal domain andformant trajectory shapes due to a speaking stylechange do not overlap with di�erences in the cat-egories of hiatus and diphthong. In the sequencescoming from the dialogues, the ideal values are notattained, so diphthong trajectories are ¯atter. Thiswould con®rm the hypothesis of the existence offormant reduction in hypoarticulated styles, aswell as the inverse relation between duration andformant reduction: when duration decreases, theformant positions are closer in the vowel area,then the trajectory between the two vowel posi-tions is ¯atter.

On the other hand, the phonetic reductionprocesses highlight a continuum of articulatoryreduction which relies on contextual knowledge.This leads us to the second question. A reductionaxis consisting of three steps (hiatus, diphthong,vowel) can be established, a continuum revealedby the presence of hiatuses manifested either asdiphthongs or as vowels. This is in agreement withresults found in other studies. Aguilar and Ma-chuca (1995) analyzed the occurrence of phoneticreduction processes in a semidirected interviewand in a map task. Results showed that in bothspeech situations, weakening processes appear butto a di�erent degree: in the semidirected interview,27% of hiatuses are pronounced as a diphthongand 12% as a vowel, whereas in the map taskcorpus, no case of diphthongization has beenfound, but 33% were reduced. Concerning theacoustic manifestation of diphthongs, 59% werereduced to a vowel in the map task and 43% in thesemidirected interviews. Similar to these data,MoosmuÈller (1997) describes a change in progressof diphthongs /ae a c/ into the monophthongs /e:/and / c:/ respectively for the Austrian German inboth reading and spontaneous speech material.

As regards consonants, a reduction continuumhas been shown for German and Spanish (Kohler,1990, 1995; Aguilar et al., 1993): in unconstrainedspeech, any consonant can undergo a phoneticreduction, showing di�erent stadia. For instance,in Spanish, the voiceless stops can be manifest asvoiced stops, as approximants or otherwise un-dergo a deletion process.

The existence of these reductions supports thetheory of adaptative variation (Lindblom, 1990)the production of speech depends on the maximalperceptual contrast principle interacting withminimal articulatory e�ort. In the corpus of dia-logues, the reduction processes occur because thespeaker relies on contextual knowledge, that al-lows the listener to restore the non-existentacoustic information (namely, elided vowels) ordegraded one (diphthongized sequences). In thereading of sentences, on the other hand, there is noaudience and, consequently, the speaker has toadjust his style of pronunciation. Thus, a tendencyto hypoarticulation is observed in the dialogues asopposed to the hyperarticulation found in the readsentences. Even if it can be argued that somereading task is implied in the execution of the maptask, because the toponyms are written on themap, generation of speech under unconstrainedconditions is the di�erentiating factor.

We conclude from the results that hiatus anddiphthong are two phonetic categories which canbe described on the basis of their acoustic char-acteristics and are subject, like any other category(vowel, consonant) to changes due to the com-municative situation. Despite the phonetic reduc-tions, the hiatus-diphthong distinction ismaintained in unscripted speech, that is, we cannotsay that hiatus disappears as a phonetic category.There is, on the contrary, an axis of reductionwhere a hiatus becomes a diphthong and a diph-thong becomes a vowel. These results will argue infavour of the existence of a phonological structureshared by all the speaking styles, but with di�erentphonetic manifestations in function of extralin-guistic factors, such as the speaker's attention tohis speech.

Acknowledgements

Part of the work described here has appeared inthe author's doctoral dissertation. The investiga-tions were funded by a grant from the UniversitatAut�onoma de Barcelona. I am grateful to themembers of the Departament of Filologia Esp-anyola for their help and valuable comments. Re-cordings were done at the Phonetics Laboratory of

L. Aguilar / Speech Communication 28 (1999) 57±74 73

the Universitat Aut�onoma de Barcelona, and theacoustic analysis were carried out at the Depart-ament d'Ac�ustica de l'Escola de Telecomunicac-ions de la Salle-Universitat Ramon Llull,Barcelona. Special thanks to all the people involvedin these tasks for their assistance. I wish to thank aswell anonymous reviewers for their outstandingcritical review of earlier versions of the paper.

References

Aguilar, L., Machuca, M., 1995. Intentionality in the speech act

and reduction phenomena. In: Elenius, K., Branderud,

P. (Eds.), Proceedings of the XIIIth Internatational Cong.

Phon. Sc., Stockholm, 13±19 August 1995. Vol. 3, pp. 460±

463.

Aguilar, L., Blecua, B., Machuca, M., Mar�õn, R., 1993.

Phonetic reduction processes in spontaneous speech, In:

Proceedings Eurospeech'93, Berlin, 21±23 September 1993,

Vol. 1, pp. 433±436.

Alarcos, E., 1965. Fonolog�õa Espa~nola Gredos, Madrid.

Anderson, A.H., Bader, M., Bard, E.G., Boyle, E., Doherty, G.,

Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J.,

Sotillo, C., Thompson, H., Weinert, R., 1991. The HCRC

map task corpus. Language and Speech 34 (4), 351±366.

Anderson, S.R., 1985. Phonology in the Twentieth Century.

Theories of Rules and Theories of Representations. The

University of Chicago Press, Chicago.

Borzone de Manrique, A.M., 1976. Acoustic study of /i, u/ in

the Spanish diphthong. Language and Speech 19, 121±128.

Borzone de Manrique, A.M., 1979. Acoustic analysis of the

Spanish diphthongs. Phonetica 36, 194±206.

Burgess, N., 1969. A spectrographic investigation of some

diphthongal phonemes in Australian English. Language

and Speech 12, 238±246.

Carr�e, R., Mrayati, M., 1991. Vowel-vowel trajectories and

region modeling. J. Phonetics 19, 433±443.

Clermont, F., 1993. Spectro-temporal description of diphthongs

in F1±F2±F3 space. Speech Communication 13, 377±390.

Gay, T., 1968. E�ect of speaking rate on diphthong formant

movements. J. Acoustical Soc. Amer. 44 (6), 1570±1573.

Harris, J.W., 1969. Spanish Phonology. MIT Press, Cambridge.

Harris, J.W., 1971. Aspectos del consonantismo espa~nol. In:

Contreras, H. (Ed.), Los Fundamentos de la Gram�atica

Transformacional. s.XXI ed., M�exico, pp. 164±185.

Hualde, J.I. 1991., On Spanish syllabi®cation. In: Campos, H.,

Mart�õnez Gil, F. (Eds.), Current Studies in Spanish

Linguistics. Georgetown University Press, Washington,

pp. 475±493.

Jha, S.K., 1985. Acoustic analysis of the Maithili diphthongs.

J. Phonetics 13 (1), 107±115.

Kohler, K.J., 1990. Segmental reduction in connected speech in

German: phonological facts and phonetic explanations. In:

Hardcastle, W.J., Marchal, A. (Eds.), Speech Production

and Speech Modelling. Kluwer Academic Publishers,

Dordrecht.

Kohler, K.J., 1995. Articulatory reduction in di�erent speaking

styles. In: Elenius, K., Branderud, P. (Eds.), Proc. XIIIth

Internat. Cong. Phon. Sc. Stockholm, Sweden, 13±19

August 1995, Vol. 2, pp. 12±19.

Lehiste, I., Peterson, G., 1961. Transitions, glides and diph-

thongs. J. Acoustic. Soc. Amer. 33 (3), 268±277.

Lindblom, B., 1963. Spectrographic study of vowel reduction.

J. Acoustic. Soc. Amer. 35, 1773±1781.

Lindblom, B., 1990. Explaining phonetic variation: a sketch of

the H&H theory. In: Hardcastle, W.J., Marchal, A. (Eds.),

Speech Production and Speech Modelling. Kluwer Aca-

demic Publshers, Dordrecht.

Maddieson, I., Emmorey, K., 1985. Relationship between

semivowels and vowels: cross-linguistic investigations of

acoustic di�erence and coarticulation. Phonetica 42, 163±

174.

Men�endez-Pidal, R., 1940. Manual de Gram�atica Hist�orica.

CSIC, Madrid.

Moosm�uller, S., 1997. Diphthongs and the process of mono-

phthongization in Austrian German: a ®rst approach. In:

Proceedings of Eurospeech'97. Rhodes. September, 1997,

Vol. 22±25, pp. 787±790.

Morgan, A.T., 1984. Consonant-glide-vowel alternations in

Spanish: a case study in syllabic and lexical phonology,

Ph.D. dissertation, University of Texas, Austin.

Navarro-Tom�as, T., 1918. Manual de Pronunciaci�on Espa~nola.

CSIC, Madrid.

Navarro-Tom�as, T., 1946. Estudios de Fonolog�õa Espa~nola.

Las Am�ericas Publishing Company, New York.

Quilis, A., 1981. Fon�etica Ac�ustica de la Lengua Espa~nola.

Gredos, Madrid.

RAE, 1973. Esbozo de una nueva gram�atica de la lengua

espa~nola. Real Academia Espa~nola, Espasa-Calpe, Ma-

drid.

Ren, H., 1986. On the Acoustic Structure of Diphthongal

Syllables. UCLA Working Papers in Phonetics, University

of California, Los Angeles.

Toledo, G.A., Anto~nanzas-Barroso, N., 1987. In¯uence of

speaking rate in Spanish diphthongs. In: Proc. XIth

Internat. Cong. Phon. Sc., Tallin, 1±7 August 1987, Vol.

3, pp. 125±138.

van Bergem, D.R., 1993. Acoustic vowel reduction as a

function of sentence accent, word stress, and word class.

Speech Communication 12, 1±23.

Waksler, R., 1990. A formal account of glide/vowel alternation

in prosodic theory. Ph.D. dissertation, Harvard University,

Cambridge, Mass.

Yang, S., 1987. An articulatory model for diphthongs and

triphthongs in Chinese. In: Proc. XIth Internat. Cong.

Phon. Sc., Tallin, 1±7 August 1987, pp. 239±242.

74 L. Aguilar / Speech Communication 28 (1999) 57±74