Which phonetics is phonological?

In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371

1

Which phonetics is phonological?

Geoffrey Schwartz ([email protected])

Adam Mickiewicz University in Poznań

1 Introduction – communication problems between phoneticians and phonologists

Despite the fact that the fields of phonetics and phonology are both interested in the sounds of

language, there is still no consensus about the nature of the relationship between them. In

phonological theory there remains a deep divide between scholars who seek to explain

phonological patterns in terms of their phonetic motivation and those for who claim that the

physical aspects of speech sounds, be they in the articulatory, acoustic, or auditory domain,

are by nature extra-linguistic. Part of the reason for this divide may be attributed to an

assumption, on both sides of the debate, that certain empirical observations are incompatible

with the opposing view. In other words, when a finding raises challenging issues with regard

to the phonetics-phonology relationship, phonologists on both sides of the divide have tended

to retreat to their own theoretical positions, instead of seeking out areas of compatibility

between opposing camps.

The debate over “incomplete neutralization” (Manaster-Ramer 1996; Port 1996) of

final voice contrasts may serve as a case in point. A series of experimental phonetic studies

(e.g. Slowiaczek & Dinnsen 1985; Dinnsen & Charles-Luce 1984) found small but systematic

acoustic differences in presumably neutralized word-final voicing contrasts in languages such

as German, Catalan, and Polish. As a consequence, the suggestion was advanced that contrast

neutralization as understood in the phonological sense is impossible. Instead of arguing that

phonological theories must be revised to account for such findings, phonetics researchers

apparently sought to challenge the very foundations of phonological study. At the same time,

many phonologists were slow to acknowledge the possibility of small but systematic phonetic

differences or of asymmetries between speech production and speech perception. In other

words, instead of finding ways to revise phonological theory to incorporate the new findings,

some scholars retreated to the competence side of the competence-performance border.

Another obstacle in establishing the role of phonetics in phonology may be observed

when phonological theorists take phonetic knowledge for granted. The assumption that the

phonetics has already been ‘done’ can result in a problematic promotion of a given phonetic

feature without the consideration of less widely known phonetic facts. The role of voice onset

time (VOT) in analyses of laryngeal contrasts is a good example. VOT has long been known

to be a primary cue to voice contrasts across language (Lisker & Abramson 1964).

Typological studies observe two basic patterns in its implementation. Languages are

described either as voicing languages, with pre-voicing and short-lag VOT, or as aspiration

languages, with short-lag and long-lag VOT. Traditionally, this typology is seen as a case of

‘phonetic implementation’ a level of grammar below phonemic contrast (Keating 1990).

Under this view, VOT is seen as a non-distinctive property. An alternative view, known as

‘laryngeal realism’ (Honeybone 2005) promotes the status of the VOT difference, and claims

that the aspiration-voicing split is due to phonological, rather than implementational

differences. Cyran (2013) shows that ‘laryngeal realism’ cannot explain the complex sandhi

voicing processes observed in different dialects of Polish. Since laryngeal realism is based

primarily on VOT, traditionally interpreted as an aspect of phonetic implementation, Cyran

concludes that the phonetics-phonology link must be arbitrary, arguing against the direct role

of phonetics in phonology. The implication behind this conclusion is that VOT completely

defines the phonetics of laryngeal contrasts. However, it has long been known that in addition

to VOT, laryngeal contrasts are based on other cues, including fundamental frequency (e.g.

Lisker and Abramson 1985) and burst amplitude (Repp 1979). In Polish, Aperliński (2012)

mailto:[email protected]


2

has shown that when VOT cues are not available, listeners still hear the voice contrast. Thus,

while Cyran correctly identified problems with ‘laryngeal realism’, these problems do not

lead to the conclusion that phonetics does not have a direct role in phonology. Rather, Cyran’s

study simply shows that VOT is insufficient for describing the realization of laryngeal

contrasts.

The issues in the realization of laryngeal contrasts have shown that many

phonologists, in particular those who argue that phonetics is not phonological, may be overly

selective in attributing a given phonetic feature to the realization of a phonological

representation. In other cases, phonologists perhaps have not been selective enough. In the

1990s, a new ‘phonetically-based’ phonology was born (Hayes 1999, Steriade 1997), utilizing

the formal environment of Optimality Theory (Prince & Smolensky 1993) to incorporate

perceptual and articulatory constraints directly into grammar. This research has spawned

important new typological and experimental studies and re-analyses of old data. Thanks to

this work we now have much better understanding of the phonetic underpinnings of a number

of phenomena, including place assimilation (Jun 1995), syllable weight (Gordon 1999), vowel

reduction (Crosswhite 2001), vowel harmony (Kaun 1995), and many others. Nevertheless,

most recent phonetic studies have generally not addressed a fundamental question: which

phonetics is phonological and why? For example, Flemming (2002) offers auditory

representations of vowel quality deriving from numerical scales of formant frequencies for

F1, F2, and F3. Flemming’s representations go a long way toward explaining the tendency of

vowel inventories to maximize auditory contrast. At the same time, however, the nature of the

representations suggests that each of the three formants should play an equal role in the

phonological patterning of vowels. No explanation is offered for the functional primacy of F1

over F3 in vowel systems across languages. In other words, we are left wondering why F1 is

more ‘phonological’ than F3. While the perceptual factors underlying this fact are known,

they are not incorporated into the phonological representation.

This paper will discuss various aspects of the physical realization of speech sounds

with an eye toward evaluating their phonological credentials. That is, we shall focus on the

question: “Which phonetics is phonological?”. I will argue for two basic claims. First, the

domain of speech perception, rather than articulation, must play a dominant role in the

formulation of phonological frameworks. In other words, for the most part, perception is

phonology (cf. Boersma & Hamann 2009). The second claim is that the phonetic properties

associated with manner of articulation are inherently more categorical, and thus phonological,

than the phonetic properties associated with place and laryngeal features. This claim has been

implemented within the Onset Prominence framework (OP; Schwartz 2013a), revealing

important insights on a number of problematic phenomena. Stated briefly, from the phonetics

of manner of articulation we may construct a new prosodic ‘skeleton’ that captures important

phonological generalizations.

The rest of this paper will proceed as follows. Section 2 will present arguments in

favor of the hypothesis that speech perception and phonology are closely linked. Section 3

compares the phonological implications of the acoustic and auditory properties of speech.

Section 4 examines the effects of phonetic variability in the realization of place, laryngeal,

and manner contrasts. Finally, Section 5 presents the OP perspective on a range of

phenomena, including sonority and strength, word-initial vowels, and the behavior of coda

stops. From the standpoint of the OP framework, we gain new insights into the question posed

in the title of this paper.

2 Phonology and perception


3

The field of phonetics encompasses two primary areas: speech production and speech

perception. With respect to the fundamental question asked in this paper, ‘which phonetics is

phonological?’, a comparison of the phonological aspects of articulation and perception is a

natural place to start. In what follows, we will argue for the primary role of speech perception

in phonology.

The first argument concerns the relative acoustic and perceptual constancy of speech

in the face of variability in production. Stevens (1989) describes some aspects of this

constancy in his Quantal Theory of speech. He notes that the acoustic-articulatory space is

made of a number of ‘quanta’, which articulatory changes have only minimal acoustic

consequences. For example, /ɹ/ in American English may be produced in at least three

different ways (Hagiwara 1994), but still is characterized by a low third formant. Both

computers and ventriloquists (as mentioned in Harris 1994: 110) are capable of producing

comprehensible spoken language in ways that differ significantly from those described in

phonological theories based on articulation. Jakobson (1962) noted that being understood is

the primary goal of speech, and that the acoustic signal is the only shared experience of

speaker and hearer. These considerations strongly suggest that it is indeed perceptual targets

that are recorded in speakers' minds, and any available articulatory means to produce them

may be employed.

Empirical evidence for the primacy of perception may be found in the study of first

language phonological acquisition. Simply stated, congenitally deaf people generally do not

learn to speak. Hearing children typically acquire phonological categories in perception long

before they are able to produce them. Blevins (2004) cites studies of children's production of

American /ɹ/, in which the apparent [w] that is often substited for the target rhotic is

systematically different from underlying /w/. This finding suggests that these children have

acquired the rhotic in perception but are unable to produce it in a way that allows adult

listeners to distinguish it from /w/. Similar perception-first effects may be found in second

language acquisition – learners typically can understand quite a bit of an L2 before they are

capable of speaking.

Sound change also suggests a primary role for speech perception. Near-merger (e.g.

Labov 1997), a phenomenon described in the sociolinguistic literature, provides evidence for

a perception-first view. Nearly merged categories are merged perceptually. However, small

systematic differences in production are still found, implying that production lags behind

perception in the process of language change. Ohala (1981) surveys a number of changes with

perceptual motivation, proposing a model of sound changes in which the listener plays a

primary role. The role of perception, however, goes beyond the fact that some changes may

occur for perceptual reasons. Any sound change, including those with articulatory motivation,

must be licensed by listeners in order to take hold in the grammar of a language. From this

perspective, all processes are listener-oriented, regardless of whether perceptual or

articulatory considerations provided their original spark.

Perhaps the most striking evidence for a direct link between phonology and speech

perception comes from cross-language studies. It is well established that one’s native

phonology has an impact on speech perception (e.g. Best 1995, Flege 1995). Briefly stated,

the same acoustic stimulus is often heard differently by speakers of different languages. A

particularly notable example is perceptual epenthesis (e.g. de Jong & Park 2012), by which

listeners report hearing vowels that are not produced, a process that is governed by native

phonotactic restrictions. For example, speakers of languages with restricted coda inventories

or without consonant clusters perceive a vowel in these positions in the speech of L2

speakers. If perception were not an intimate aspect of phonological competence, processes

such as perceptual epenthesis, which are quite widespread, would not be expected.


4

3 Auditory vs. acoustic considerations in listener-oriented phonology

Why should there be such a close relationship between phonology and speech perception?

Briefly stated, the auditory system acts in a “phonological” manner, reducing the amount of

acoustic information to be processed in both the spectral and temporal domains. The entire

frequency range of human audition (up to about 20 kHz) may be reduced to a relatively small

number of critical bands. In the temporal domain, onset boosts (see e.g. Wright 2004) serve to

focus perceptual attention on smaller portions of the signal, again reducing the amount of

acoustic input to be processed. In what follows, we shall briefly examine a selection of cases

in which the auditory system filters out acoustic variability in the formation of categories that

play important roles in phonological grammars. The phenomenon of categorical perception is

indeed a striking example of the phonological nature of the auditory system.

Acoustic variability found in speech often resides in the domain of contextually

induced spectral details. For instance, the high front vowel /i/ often shows a lower F2 in a

labial context than it does in a velar or alveolar context. For an adult male speaker, 2000 Hz

for F2 in a labial context and 2200 Hz in the alveolar context might be considered typical

measurements. While this acoustic difference is not insignificant, the auditory difference

between these two measurements is only about 0.6 Bark, less than one critical band.

Moreover, the general spectral shape of the resonance patterns is constant in the two contexts,

with a large dip (e.g. Harris 1994) in the spectrum between the first and second formants, and

a relatively close convergence of the second and third formants (Syrdal & Gopal 1986). This

auditory constancy is found in phonological feature systems, represented as either a feature [-

back] or an element {I}, the latter of which is an explicitly auditory representation.

A more striking case in which the human auditory system acts as an information

reduction device may be found in asymmetries in auditory response to certain portions of the

signal. In particular, pre-vocalic consonants, which are acoustically lacking in robustness have

been found to be associated with a period of heightened perceptual sensitivity, referred to as

an onset boost, followed by a period of decreasing auditory response known as adaptation

(e.g. Delgutte 1997) . As a result, CV-type syllables, produce the best match between

phonotactics and auditory response. Consonants tend to correspond with increased response,

while vowels align with decreased perceptual sensitivity. These considerations are reflected in

the functional primacy of consonants in the formation of lexical contrasts across languages;

consonants have a tendency to align with perceptually robust portions of the signal. The

widespread requirement across languages for a vocalic ‘nucleus’ to a syllable can also be

explained in terms of auditory sensitivity. Vowels are the only segment types that produce

enough acoustic energy to overcome the decrease in auditory response associated with

adaptation.

In sum, basic auditory mechanisms offer plausible motivation for the assumption that

speech perception and phonological systems must be intimately linked. Despite a high degree

of acoustic variability in speech, auditory processing categorizes selected aspects of the

speech signal. Many recurrent phonological patterns appear to be the natural consequence of

perceptual constraints.

4 Incorporating perception into phonology – where to start?

A fundamental problem for any model of the phonetics-phonology relationship is the range of

gradient and variable phenomena found in speech. On numerical acoustic and articulatory

scales, we rarely if ever find two identical realizations of a given category. While the

perceptual underpinnings of the processes underlying the formation of phonological

categories may be known, it is not obvious how these considerations might be incorporated


5

into a phonological theory. To fully address the question of “which phonetics is

phonological?”, we need to re-examine the phonological side of the equation and formulate a

new question. Which phonological phenomena are associated with phonetic realizations that

are most conducive to auditory categorization? In other words, we seek to compare the

phonetics of various phonological categories to assess which categories are more

‘phonological’ in their phonetic realization. Such as examination leads to the conclusion that

the phonetics of manner of articulation is inherently more categorical than the phonetics of

other phonological classes. As a result, it may be suggested that manner contrasts serve as a

fundamental building block of phonological structure, which allows us to restrict the domain

of the phonetics-phonology interface to place and laryngeal contrasts.

The phonological nature of manner may be explained by the type and consequences of

phonetic variability. While much has been made of the gradient and variable properties of

speech, not all gradient phenomena are equivalent in their perceptual and phonological

consequences. Gradient spectral information may be found in the resonance properties of the

vocal tract as well as voice source characteristics. For instance, two instances of a vowel such

as /u/ may show a difference in F2 frequency resulting from contextual variability, speech

rate, prosodic position, or other factors. Temporal gradience may be observed in largyngeal

contrasts and quantity distinctions; voice onset time (VOT) in stops shows differences on the

basis of consonant place of articulation and vocalic context, among countless other factors.

Gradience in both the spectral and temporal domains is readily quantifiable and transfers

easily to descriptions of both place and laryngeal features.

In the case of manner contrasts however, the acoustic signal appears to be somewhat

more categorical in nature, characterized by acoustic ‘landmarks’ (Stevens 2002). Consider

stops, which are perceived on the basis of a rapid drop in amplitude and a (near) silent closure

period. Although stops may be realized with incomplete closure (Crystal & House 1988) that

is quantifiable on a gradient continuum, this does not necessarily entail sufficient frication

noise for the perception of another manner category (fricatives). That is, despite gradient

realization, stop closure is an inherently privative property. It is either perceived or it is not

perceived, but the failure to hear it does not inevitably imply the perception of another

category. By contrast, changes in formant frequencies associated with vowel quality result in

the perception of a different vowel, but the newly perceived sound is still a vowel.

One acoustic aspect in the realization of manner contrasts sets manner apart from

laryngeal and (especially) place specifications. Place of articulation is reflected in terms of

spectral properties and is largely independent of duration and overall amplitude. Laryngeal

contrasts are based primarily on the relative timing of glottal and supra-glottal events; the

spectral domain is secondary1. In other words, place and voicing may be reliably cued by a

single acoustic measure. By contrast, manner contrasts are reflected primarily in the

interaction of two physical scales. A primary cue for a number of manner contrasts is rise time

in the amplitude envelope (Shinn & Blumstein 1984; Johnson 1997). Rise time is not

quantifiable in absolute terms on a single physical scale, but must be defined in terms of a

ratio between two scales: amplitude and time. In essence, rise time requires the listener to

compute this ratio, which may be expected be more taxing on cognitive resources than

identifying categories on a single physical dimension. Consequently, we might expect

listeners to be less sensitive to gradience in the realization of manner contrasts than in the

realization of place and laryngeal contrasts.

The relative robustness of the gradient properties associated with place, manner, and

laryngeal features may be seen very easily in the inventory of diacritics in the International

1 Place and laryngeal features have secondary cues that manipulate additional physical scales. Voicing in

particular is reflected in both the amplitude domain; voiceless consonants tend to produce higher amplitude burst

and aspiration, and raise the pitch on neighboring sounds.


6

Phonetic Alphabet. The greatest number of diacritics specify adjustments to vowel quality,

which are reflected in spectral properties that have been more robustly identifiable for

phoneticians. There are also a large number of diacritics related to laryngeal features. By

contrast, the IPA has only a small number of manner diacritics. We may conclude that

gradience in the place and laryngeal features is simply easier to identify than gradience in

manner.

In sum, to answer the question posed earlier, “which phonetics is phonological?”, we

suggest that manner-related phonetic features are more categorical in nature, and should be

separated from place and laryngeal features in the construction of phonological

representations.

5 Implementation of a manner-based representational theory

With regard to the phonetics-phonology relationship, manner of articulation appears to be the

most ‘phonological’ of the three major classes of features. Prosodic structure has also been

claimed to be ‘phonological’ rather than phonetic (e.g. Steriade 1997). Consequently, it may

be suggested that manner of articulation is an inherently prosodic or structural specification,

while place and laryngeal specifications are better described as melodic. This idea has

occasionally been proposed in recent years (Steriade 1993, Golston & van der Hulst 1999,

Pöchtrager 2006), but remains outside the mainstream.

5.1 The Onset Prominence representational environment

A structural view of manner is implemented in the Onset Prominence representational

environment (OP; Schwartz 2013a), in which prosodic constituents and segmental

representations are constructed from the same representational materials that define manner

of articulation. Equating manner with prosodic structure thus restricts the primary domain of

the phonetics-phonology interface to place and (some languages’) laryngeal contrasts.

The representational insights of the OP framework rest on two assumptions, the first of

which concerns the choice of phonological primitives. Since stop-initial CV syllables are the

most common syllable type across languages, they are, in slightly modified form, assumed to

be the primitive building block for all segmental and prosodic representations. This building

block is shown in the tree structure in (1), and is constructed from the series of phonetic

events associated with a stop-vowel sequence in initial position.

(1) The Onset Prominence representational hierarchy

The highest layer of the hierarchy is the closure (Closure) of the stop. The release of the stop

produces a portion of aperiodic noise (Noise). Noise is followed by a significant rise in

amplitude as the vowel begins (Vocalic Onset; VO), after which we may observe a portion of

relatively stable formant frequencies (Vocalic Target; VT). This is followed by a decrease in


7

amplitude (Offset). A stop-vowel sequence will contain all four layers of the structure in (1).

Place and laryngeal specifications attach to the terminal nodes.

The other assumption crucial to the OP model is related to the lowest two layers of the

hierarchy, the VO and VT nodes. Vowels, which are traditionally described as single

segments, are split into two representational layers. The VO layer represents the initial portion

of a vowel in the CV context, which typically contains acoustic cues to the identity of the

consonant (e.g. Wright 2004). As a result, this node may be built into the representation of

consonants. The VT layer encodes the later portion of vowels. In Section 5.4, we will return

to the ambiguity inherent in this representational split.

5.2 Manner and constituency

From the hierarchy in (1) we derive manner of articulation in terms of the layers of structure

contained by a given segmental tree. This is shown in (2), which provides structures for a

labial stop, nasal, fricative, approximant, and vowel. The binary nodes are active elements in

the individual representations, while the unary nodes, which represent properties that are

absent from the speech signal, serve as place holders to indicate the relative hierarchical

position occupied by a given segmental structure. The segmental symbols may be interpreted

as shorthand for place and laryngeal specifications. Note that these representations encode

both sonority and consonantal strength. More ‘sonorous’ segments are housed lower in the

representational hierarchy, while stronger consonants are higher. We will return to the OP

view of these phenomena shortly.

(2) Manner distinctions in the OP environment

The relationship between prosodic constituents and segmental representations in the OP

environment is illustrated in (3). On the left we see individual segmental structures for

English quick, while on the right we see the word as a single prosodic constituent. The unary

nodes from (2) have been removed in (3) for the sake of visual clarity. Prosodic constituents

are formed when lower level sonorant and vowel structures are absorbed into higher level

obstruent structures. In quick, the glide and vowel are absorbed into the structure of the initial

stop. Stray structures, such as the final /k/ in quick, may be submerged under the preceding

constituent. We will return to submersion in Section 5.5.

(3) Segmental and constituent (right) structures for English quick


8

5.3 Sonority and strength

The manner contrasts in (2) allow us to capture generalizations that may be attributed to the

role of sonority in describing the ways in which phonological segments may be arranged in a

linear string. Sonority is generally assumed to be a scalar feature, by which segments may be

ranked according to the following hierarchy:

stops < fricatives< nasals < liquids <glides<vowels.

Sonority is frequently invoked to account for restrictions on sound sequences within and

between syllables. The most sonorous part of a syllable is the nucleus, and according to the

Sonority Sequencing Generalization (SSG; Selkirk 1984), segments decrease in sonority the

further they get from the nucleus. In the case of onset clusters, the SSG mandates rising

sonority (e.g. pr, kl). This restriction is captured by absorption in OP structures – higher level

obstruents absorb liquids and glides, which themselves absorb vowels, as shown in (3) for the

word quick.

Harris (2006) notes certain problems with the sonority proposal, despite its heuristic

usefulness. First of all, a clear phonetic correlate of sonority has been difficult to identify.

Acoustic intensity or degree or articulatory opening are not satisfactory, given the intensity

associated with sibilant fricatives and the full oral closure associated with nasals. Other

proposals based on a conglomerate of features associated with perceptual salience (Clements

1990) fail to capture place-based restrictions on consonant clusters. Nevertheless, the

empirical generalizations of the sonority hierarchy are quite robust, and it is desirable to be

able to express them in a phonological theory, as the OP framework manages to accomplish.

A comparison of the predictions of the OP framework and the sonority hierarchy reveals

differences with respect to the status of nasals in onset clusters. According to the traditional

hierarchy, stop-nasal clusters show a rising sonority slope and should be tolerated in a fair

number of languages. By contrast, in the OP environment both stops and nasals are specified

for the Closure node and should not be subject to absorption. In other words, OP predicts that

stop-nasal clusters should be quite rare, while the traditional hierarchy predicts they should

occur regularly, albeit not as frequently as stop-liquid clusters.

Sonority may be seen as the opposite of phonological strength, another frequently

invoked hierarchy that is claimed to be relevant to phonological theory. Strength is frequently

invoked to describe weakening process such as spirantization of stops and the vocalization of

fricatives (LaVoie 2001). These processes are easily represented in the OP framework: each

represents the loss of the highest active node in the structure. However, as Harris (2009)

points out, as with sonority the status of nasals is problematic for the strength as sonority


9

view: weakening of fricatives never produces nasals. This restriction is predicted with OP

representations, in which stops and nasals share a high-level closure node.

5.4 Phonetically-derived phonological ambiguity

Among the more important aspects of the Onset Prominence representational environment is

its incorporation an important phonetic ambiguity with respect to the initial portion of vocalic

segments. In the hierarchy in (1), the segmental affiliation of the VO node is ambiguous. This

single layer of structure may be claimed by multiple segment types. On its own, VO

represents the class of approximants and glides as we see in (2) and (3). In (2), we also see

that VO may be active in the representation of obstruents and nasals, representing formant

transitions that cue consonant place of articulation (e.g. Wright 2004). At the same time, the

VO node is derived from a portion of the signal that is, strictly speaking, vocalic. As a result,

we might expect to find VO built into the representation of vowels as well as consonants. The

trees in (4) present two types of vowel representation, with or without an active VO node2.

(4) Vowels with our without VO specification

The presence or absence of VO in vowel representations offers a useful perspective on the

prosodically ambiguous behavior of onsetless syllables across languages (see Schwartz

2013a). Briefly stated, VO-specification allows initial vowels to satisfy prosodic constraints

requiring onsets. This may be manifested as an apparent ‘empty consonant’ (e.g. Marlett &

Stemberger 1983), or simply as prosodic well-formedness for processes such as stress

assignment or reduplication (cf. Downing 1998 for discussion of prosodically ill-formed

onsetless syllables). Alternatively, VO specification may be associated with glottal marking

on word-initial vowels, as is frequently observed in languages such as German (Wiese 1996),

Czech (Bissiri & Volín 2010), and Polish (Schwartz 2013b). In the case of Polish, VO

specification facilitates the formulation of predictions with regard to the formation of prosodic

boundaries that correspond with orthographic boundaries. VO specification in Polish

strengthens boundaries, preventing processes occurring in English such as sandhi

palatalization (got you ~ gotcha) and resyllabification (find out ~fine doubt).

While glottal marking has traditionally been seen as a phonetic detail, OP

representations suggest that this phonetic strengthening process has phonological origins in

some languages. We now turn to the representation of post-vocalic consonants, in which the

OP perspective suggests that the release of coda stops, traditionally seen as a phonetic process

affecting a non-contrastive feature, may indeed have significant phonological implications.

2 The structures in (4) may raise questions concerning the questions of glide-vowel sequences of the type /wu/

and /ji/, which at first glance would appear to be structurally identical to VO-specified /u/ and /i/. Levi (2008)

identified differences between ‘underlying’ and ‘derived’ glides. For the former we posit that the glide-initial

sequence contains an additional specification. For details, see Schwartz (2012).


10

5.5 Submersion and the representation of prosodic weakness

In the basic constituent forming mechanism of the OP environment, lower level vowel and

approximant structures are absorbed into higher-level obstruents. Absorption is possible with

the leftmost structure is higher in the OP hierarchy. In some cases, absorption may not take

place since this condition is not met. In such cases, a different mechanism, submersion, may

be motivated. One example of submertion is shown in (5), in which the second of two /a/

structures is submerged under the first, resulting in a long vowel .

(5) Submersion resulting in a long vowel

Beyond the relatively simple vowel lengthening mechanism shown in (8), submersion

is a process with far-reaching prosodic implications. Before discussing specific structures, it is

important to note that submersion is a form of phonological recursion of the type proposed by

van der Hulst (2010). In his view, codas may be seen as ‘syllables inside syllables’. OP

submersion unifies this view with the representation of long vowels. Additionally, submersion

offers insight into the behavior of consonants in both VC and VCV contexts, with deeper

predictions for the form and behavior of larger prosodic constituents.

First, we turn to the behavior of coda consonants. In (6), we see a string of segmental

structures for the English word click. On the basis of the discussion thus, far, we should

expect the liquid and the vowel to be absorbed into the structure of the initial /k/. As yet,

however, we have not addressed the fate of the final /k/. It may not be absorbed into the

preceding constituent, since its Closure and Noise nodes are higher in the OP hierarchy. In

English, a stop such as /k/ cannot stand on its own as a prosodic constituent, and traditional

representations would assign this segment to the ‘coda’ position.

(6) String of segmental trees for click

Polish has borrowed the word click, reproducing it as klik. In Polish and English however,

there is a systematic difference in the behavior of the final /k/ with regard to the release of the


11

stop. In Polish the release is obligatory (e.g. Dukiewicz & Sawicka 1995), while in English it

is frequently suppressed. Such behavior suggests two different representations of the ‘coda’

/k/, shown in (7).

(7) Constituent structures for klik (left) and click

On the right, the final stop in English click is submerged under the preceding constituent. On

the left the final /k/ in Polish klik is not submerged. The result is that the English ‘coda’ is in a

lower prosodic position in the OP hierarchy. Associating lower hierarchical levels with

prosodic weakness, we should expect the English coda to be subject to lenition processes such

as the suppression of release.

Submersion in English is not limited to individual segments – entire ‘syllables’ may

also be lowered under the preceding constituent. In (8) we offer a representation of the

English word pity. This results in a larger ‘foot’ structure in which the onset to the second

syllable is underneath the first syllable. Such a configuration offers an insightful

representation of ‘ambisyllabicity’ (Kahn 1976) and prosodic weakness in languages in which

the second consonant in a trochaic CVCV foot is subject to lenition (see Jensen 2000; Harris

2004). The /t/ in this this word is of course realized as a tap or a glottal stop in various

vernacular dialects of English. A segmentally ‘similar’ word in Polish, PIT-y, which refers to

tax return forms, would not contain submerged structure, the two constituents would be

adjoined at a higher level of structure .

(8) The English word pity


12

In sum, our discussion points to an opposition between English and Polish with regard to

relations between individual segmental structures and the primitive CV tree in (1). English

offers the possibility of submersion, associated with long vowels, weak coda consonants, and

weak consonants in VCV contexts. Submersion is absent in Polish. Codas and intervocalic

consonants are generally not subject to weakening, and vowel quantity distinctions are absent.

Our perspective suggests that phonetic properties associated with manner of

articulation may be claimed to be phonological. This view restricts the domain of the

phonetics phonology interface to the realization of place and laryngeal contrasts. At the same

time, certain phenomena that have traditionally been seen as non-phonological may be subject

to re-evaluation. For example, production of glottal marking on word-initial vowels may be

attributable to a phonetically-derived phonological parameter. In traditional accounts, vowel

glottalization entails the insertion of a non-contrastive segment [ʔ]. Our perspective suggests

that in languages in which initial vowels are specified for the VO node such as Polish and

German, vowel glottalization is better analyzed as the strengthening of a phonological

property that is already present in the vowel’s representation.

5.6 Coda stop release is phonological

In what follows, we will return to the question of coda stop release, another process that is

traditionally viewed as non-contrastive. With regard to stop release, languages adopt one of a

limited number of strategies, suggesting that the phenomenon is in fact phonological. Our

discussion will focus on a comparison of English and Polish with Korean, a language in

which coda stop release is obligatorily suppressed. Korean has a particularly restrictive

inventory of coda consonants. While stops are allowed in coda position, they are always

unreleased. In addition, the laryngeal contrast, a three-way opposition among plain voiceless,

voiceless aspirated, and tense or stiff-voiced stops, is neutralized.

To provide some perspective on Korean phonotactic constraints, OP representations

for the three Korean labial stops are proposed in (9). Crucially, the framework allows for the

possibility that different melodic specifications may be housed at different levels of the OP

hierarchy in accordance with their phonetic realization. Place is specified as a [labial]

annotation on the Closure node. This is to be expected since it is the location of the closure

that defines place of articulation. By contrast, laryngeal features, whose phonetic realization

may be impeded by stop closure, may be assigned at lower levels. This is seen in ().

Aspiration, which is of course associated with aperiodic noise, is shown as a [spread glottis]

specification on the Noise node. Tenseness is represented as a [constricted glottis] annotation

on the VO node. This feature is associated with a stiffer voice quality on the onset of the

following vowel (e.g. Ladefoged & Maddieson 1996), so the VO node is a natural structural

position for [cg] specifications. With these representations in mind, we may turn to the

representation of codas in Korean, which are characterized by the presence of place contrasts,

but neutralized manner and laryngeal contrasts.

(9) Korean labial stops; plain (left), aspirated, and tense (right)


13

In the OP environment, codes may be attributed to the process of submersion, by which a

stray structure is moved to the bottom of the preceding constituent. Alternatively, the post-

vocalic consonant may resist submersion as we saw in Polish klik. Submerged stops are in a

prosodically weaker position, and are predicted to be subject to lenition processes such as the

suppression of release. Since in Korean, coda stops are always unreleased, we posit that

submersion is the process that forms codas in this language. The submersion of a Korean stiff-

voiced stop would produce the structure in (10). Note the structural nodes associated with the

coda consonant are located at the bottom of the constituent structure.

(10) Submerged stiff-voiced stop in Korean

To explain the fact that Korean maintains place contrasts in coda stops, but neutralizes

laryngeal constrasts, we need only propose a constraint against multiple layers of submerged

structure. In other words, Korean, like many other languages, appears to restrict the size of

syllable rimes. The claim here would be that Korean only allows a single node under the VT

level. The laryngeal specification, which as we saw in (9) is housed on lower-level nodes,

loses its structural housing and is not realized in this position. This is shown in (10) as the

crossed-out labels of the lower Noise and VO nodes. Thus, the apparent mismatch in Korean

between a licensed place contrast and neutralized laryngeal contrast may be explained as a

single constraint on the size of prosodic constituents. With the lower Noise and VO nodes

eliminated in (10), it falls out naturally that coda stops in Korean are always produced without

an audible release. The only remaining part of the stop is Closure. The Korean requirement

that coda stops are unreleased is a systematic element of the language’s prosodic

representation.

Kang (2003) studied epenthesis in post-vocalic stops in English loanwords in Korean.

She provides evidence that the presence of epenthesis in Korean loanwords in English is

closely related to the probability that target language coda stops are produced with an audible

release. Epenthesis should be expected when the target language coda contains a release burst,

which of course in Korean only occurs in syllable onsets. Thus, released stops are adapted

with an additional prosodic constituent whose prominence is enhanced by the epenthetic

vowel.

While in English coda stops are optionally released and in Korean coda stops are

always unreleased, in Polish codas coda stops are always released, except in homorganic

clusters. Thus, for Polish we must claim that coda stops are not submerged under the

preceding rhyme. Rather, they remain at their underlying level of the OP hierarchy as seen in

(7). The absence of submersion in Polish offers a unified explanation of the obligatory release


14

of coda stops and the lack of vowel length distinctions, two seemingly unrelated phenomena.

It is interesting to note that epenthesis in the speech of Korean learners of Polish is obligatory

(Dziubalska-Kołaczyk, p.c). It is not subject to the variability found in English loanwords in

Korean.

Kang attributes the link between epenthesis in Korean English and target language

stop release to ‘perceptual similarity’, arguing for the importance of a non-contrastive

phonetic detail, stop release, in the mechanism of loanword adaptation. From the OP

perspective, stop release, though non-contrastive, is not merely a phonetic detail. It is

predictable on the basis of phonological parameters and constraints. The lack of release in

Korean and the obligatory release in Polish are systemic elements of the phonologies of the

two languages. The OP framework offers representational devices in which these

generalizations may be captured.

Final remarks

This paper has examined certain unresolved issues concerning the role of phonetics in

phonological theory. In particular, we addressed the following question: ‘Which phonetics is

phonological?’. Auditory considerations point to the conclusion that those phonetic

properties associated with manner of articulation are purely phonological, while place and

laryngeal-based features constitute the locus of the phonetics-phonology interface. This view

of the phonetics-phonology relationship is implemented in the Onset Prominence framework,

in which prosodic structure is built from manner-based features.

The theoretical considerations outlined in this paper suggest a number of practical

applications. Phonological theory is relevant in a wide range of fields, including language

learning, speech technology, and speech therapy. The OP framework is an explicit model that

allows for the formulation of concrete hypotheses for experimental study into the relation

between competence and performance in speech. As such, the model has clear benefits in any

practical area in which phonology is applied. For instance, in the area of L2 speech

acquisition, new phonological parameters bring new awareness of non-contrastive phonetic

aspects of the target language. Successful acquisition of features such as unreleased stops

(5.6), or the suppression of initial vowel glottalization (5.4) is associated with more target-like

production and higher listener ratings on scales of foreign accentedness. The contribution of

the OP framework lies in the fact that it provides new devices for the formulation of

experimental hypotheses for cross-linguistic comparisons.

References

Abramson A. S. & L. Lisker. 1985. Relative power of cues: F0 shift versus voice timing, in:

Fromkin V. (ed.). Phonetic Linguistics. 25-33. New York: Academic Press.

Aperliński, G. (2012). Is VOT enough? Paper presented at the 6th

International Conference on

Accents of English. Łódź.

Best, C. T. (1995). A direct realist view of cross-language speech perception. In W. Strange

(Ed.), Speech Perception and Linguistic Experience. Baltimore, MD: York Press.

Blevins, J. (2004). Evolutionary phonology. Cambidge: Cambridge University Press.


15

Bissiri, M.P. & Volín, J. (2010). Prosodic structure as a predictor of glottal stops before word-

initial vowels in Czech English. In R. Vích [Ed], 20th Czech-German Workshop – Speech

Processing, Prague, 23-28, 2010.

Boersma, P. & S. Hamann. 2009. Loanword adaptation as first-language phonological

perception. In Andrea Calabrese & W. Leo Wetzels (eds.), Loanword phonology. Amsterdam:

John Benjamins.11-58.

Chomsky, N. & M. Halle. 1968. The Sound Pattern of English. New York: Harper & Row.

Clements, G.N., 1990. The role of the sonority cycle in core syllabification. In: Kingston, J.,

Beckman, M.E. (Eds.), Papers in Laboratory Phonology I; Between the Grammar and Physics

of Speech. Cambridge University Press, Cambridge, pp. 283–333.

Crosswhite, K. (2001). Vowel reduction in Optimality Theory. New York: Routledge.

Crystal, T. & A.S. House. (1988). The duration of American English stop consonants: an

overview. Journal of Phonetics 16. 285-294.

Cyran, E. (2013). Polish voicing – between phonology and phonetics. Lublin: Wydawnictwo

KUL.

De Jong, K. & H. Park. (2012). Vowel epenthesis and segmental identity in Korean learners

of English. Studies in Second Language Acquisition 34. 127-155.

Delgutte, B. 1997. Auditory neural processing of speech. In Hardcastle, W. and J.

Laver (eds.), The handbook of phonetic sciences, pp. 507-38. Oxford: Blackwell.

Downing, Laura. 1998. On the prosodic misalignment of onsetless syllables. Natural

Language & Linguistic Theory 16. 1–52.

Dinnsen, D. & J. Charles-Luce. 1984. Phonological neutralization, phonetic implementation

and individual differences. Journal of Phonetics 12. 49–60.

Dukiewicz, L. & I. Sawicka. (1995). Gramatyka współczesnego języka polskiego – fonetyka i

fonologia [Grammar of modern Polish – phonetics and phonology]. Krakow:

Wydawnictwo Instytutu Języka Polskiego PAN.

Flege, J.E.(1995). Second language speech learning: Theory, findings, and problems. In

W.Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language

research (pp. 233–277). Baltimore: York Press.

Flemming, E. 2002. Auditory representations in phonology. New York: Routledge

Golston, C. and H. van der Hulst. 1999. Stricture is structure. In. Hermans, B. and M. van

Oostendorp, eds. The Derivational Residue in Phonological Optimality Theory. Amsterdam:

John Benjamins. 153-173.

Gordon, M. 1990. Syllable weight – phonetics, phonology and typology. Ph.D. dissertation,

UCLA.


16

Hagiwara, R. 1994. Three types of American /r/. UCLA Working Papers in Phonetics 88. 55-

62.

Harris, J. 1994. English Sound Structure. Oxford: Blackwell.

Harris, J. 2004. Release the captive coda: the foot as a domain of phonetic interpretation. In J.

Local, R. Ogden & R. Temple (eds.), Phonetic interpretation: Papers in Laboratory Phonology

6, 103-129. Cambridge: Cambridge University Press.

Harris, J. 2006. On the phonology of being understood: further arguments against sonority.

Lingua 116. 1483-1494.

Harris, J. 2009. “Why final obstruent devoicing is weakening”. In: Nasukawa, N. and P.

Backley (eds.), Strength relations in phonology. Berlin: Mouton de Gruyter. 9–46.

Hayes, B. (1999). Phonetically-based phonology – the role of Optimality Theory and

inductive grounding. In Michael Darnell, Edith Moravscik, Michael Noonan, Frederick

Newmeyer, and Kathleen Wheatly, eds., Functionalism and Formalism in Linguistics,

Volume I: General Papers, John Benjamins, Amsterdam, pp. 243-285.

Honeybone, P. 2008. Lenition, weakening and consonantal strength: tracing concepts through

the history of phonology. In Brandão de Carvalho, J., Scheer, T. & Ségéral, P. (eds). Lenition

and Fortition. Berlin: Mouton de Gruyter, 9-93.

Honeybone, P. (2005). Diachronic evidence in segmental phonology: the case of obstruent

laryngeal specifications. In M. van Oostendorp & J. van de Weijer (eds.), The internal

organization of phonological segments, 319-354. Berlin: Mouton de Gruyter.

van der Hulst, H. 2010. A note on recursion in phonology. In van der Hulst, Harry, (ed.).

Recursion and Human Language, 301-342. Berlin: Mouton de Gruyter.

Jakobson, R. (1962). Phonological studies. The Hague: Mouton.

Jensen, J. T. 2000. Against ambisyllabicity. Phonology 17: 187-235.

Johnson, K. 1997. Acoustic and auditory phonetics. Oxford: Blackwell.

Jun, J. (1995) Perceptual and Articulatory Factors in Place Assimilation: An Optimality

Theoretic Approach, Ph.D. dissertation, UCLA.

Kahn, D. (1976). Syllable-based Generalizations in English Phonology. Ph.D. dissertation.

MIT. Bloomington, IN: Indiana University Linguistics Club

Kang. Y. (2003). Perceptual similarity in loanword adaptation: English post-vocalic word-

final stops to Korean. Phonology 20,2: 219-273.

Kaun, A. (1995). An Optimality-Theoretic Typology of Rounding Harmony, Ph.D.

dissertation, UCLA.


17

Keating, P. (1990). Phonetic representations in a generative grammar. Journal of Phonetics

18. 321-334.

Labov, W. 1997. Principles of linguistic change: Internal factors. Oxford: Blackwell.

Ladefoged, P. & I. Maddieson. (1996). The sounds of the world’s languages. Oxford:

Blackwell.

Lavoie, L. 2001. Consonant strength: phonological patterns and phonetic manifestations. New

York: Garland.

Levi, S. 2008. Phonemic vs. derived glides. Lingua 118,1956-1978.

Lisker, L. & Abramson, A.S (1964). A cross-language study of voicing in initial stops:

Acoustical measurements. Word , 20, 384-422.

Manaster Ramer, A. 1996. A letter from an incompletely neutral phonologist. Journal of

Phonetics, 24 (4), 477-489.

Marlett, S. & J. Stemberger. (1983). Empty consonants in Seri. Linguistic Inquiry, 14, 617-

639.

Ohala, J. J. 1981. The listener as a source of sound change. In: C. S. Masek, R. A. Hendrick,

& M. F. Miller (eds.), Papers from the Parasession on Language and Behavior. Chicago:

Chicago Ling. Soc. 178 - 203.

Pöchtrager, M. 2006. The structure of length. Ph.D. dissertation. University of Vienna.

Port, R.F. 1996. The discreteness of phonetic elements and formal linguistics: response to A.

Manaster Ramer. Journal of Phonetics, 24(4), 491-511.

Prince, Alan and Paul Smolensky. 1993. Optimality Theory: Constraint Interaction in

Generative Grammar. Ms. Technical Reports of the Rutgers University, Center for Cognitive

Science.

Repp, B. H. 1979. Relative amplitude of aspiration noise as a voicing cue for syllable-initial

stop consonants. Language and Speech 22: 173–189.

Schwartz, G. (2012). Glides and initial vowels in the Onset Prominence representational

environment. Poznań Studies in Contemporary Linguistics 48 (4). 661-685.

DOI: 10.1515/psicl-2012-0029

Schwartz, G. (2013a). A representational parameter for onsetless syllables. Journal of

Linguistics 49 (3), 613-646. DOI: http://dx.doi.org/10.1017/S0022226712000436.

Schwartz, G. (2013b). Vowel hiatus at Polish word boundaries – phonetic realization and

phonological implications. Poznań Studies in Contemporary Linguistics 49 (4). 557-585.

Selkirk, E. (1984). Phonology and syntax – the relation between sound and structure.

Cambridge, MA: MIT Press.

http://dx.doi.org/10.1515/psicl-2012-0029

http://dx.doi.org/10.1017/S0022226712000436


18

Shinn, P. & S. Blumstein. (1984). On the role of amplitude for the perception of [b] and [w].

Journal of the Acoustical Society of America 75(4). 1243-1252.

Slowiaczek, L.M. and Dinnsen, D.A. 1985. On the neutralizing status of Polish word-final

devoicing. Journal of Phonetics, 13(3). 325-341.

Steriade, D. (1993). Closure, release, and nasal contours. In M. K. Huffman and R. A.

Krakow (Eds.). Nasals, nasalization, and the velum, 401−470. San Diego: Academic Press.

Steriade, D. 1997. Phonetics in phonology: the case of laryngeal neutralization. Ms. UCLA.

Stevens, K. N. 1989. On the quantal nature of speech. Journal of Phonetics, 17(1), pp. 3-45.

Stevens, K. (2002). Toward a model for lexical access based on acoustic landmarks and

distinctive features. Journal of the Acoustical Society of America 111 (4). 1872-1891.

Syrdal, A. and H. Gopal. 1986. A perceptual model of vowel recognition based on the

auditory representation of American English vowels. JASA 79 (4), 1086-1100.

Wiese, Richard (1996), The Phonology of German, Oxford: Oxford University Press.

Wright, R. 2004. Perceptual cue robustness and phonotactic constraints. In Hayes, B., R.

Kirchner and D. Steriade (eds). Phonetically Based Phonology. Cambridge: Cambridge

University Press, 34-57.

Which phonetics is phonological?

Documents

Transcript of Which phonetics is phonological?