THE PHONETICS AND PHONOLOGY OF INTONATIONAL PHRASING IN ROMANCE
Which phonetics is phonological?
Transcript of Which phonetics is phonological?
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
1
Which phonetics is phonological?
Geoffrey Schwartz ([email protected])
Adam Mickiewicz University in Poznań
1 Introduction – communication problems between phoneticians and phonologists
Despite the fact that the fields of phonetics and phonology are both interested in the sounds of
language, there is still no consensus about the nature of the relationship between them. In
phonological theory there remains a deep divide between scholars who seek to explain
phonological patterns in terms of their phonetic motivation and those for who claim that the
physical aspects of speech sounds, be they in the articulatory, acoustic, or auditory domain,
are by nature extra-linguistic. Part of the reason for this divide may be attributed to an
assumption, on both sides of the debate, that certain empirical observations are incompatible
with the opposing view. In other words, when a finding raises challenging issues with regard
to the phonetics-phonology relationship, phonologists on both sides of the divide have tended
to retreat to their own theoretical positions, instead of seeking out areas of compatibility
between opposing camps.
The debate over “incomplete neutralization” (Manaster-Ramer 1996; Port 1996) of
final voice contrasts may serve as a case in point. A series of experimental phonetic studies
(e.g. Slowiaczek & Dinnsen 1985; Dinnsen & Charles-Luce 1984) found small but systematic
acoustic differences in presumably neutralized word-final voicing contrasts in languages such
as German, Catalan, and Polish. As a consequence, the suggestion was advanced that contrast
neutralization as understood in the phonological sense is impossible. Instead of arguing that
phonological theories must be revised to account for such findings, phonetics researchers
apparently sought to challenge the very foundations of phonological study. At the same time,
many phonologists were slow to acknowledge the possibility of small but systematic phonetic
differences or of asymmetries between speech production and speech perception. In other
words, instead of finding ways to revise phonological theory to incorporate the new findings,
some scholars retreated to the competence side of the competence-performance border.
Another obstacle in establishing the role of phonetics in phonology may be observed
when phonological theorists take phonetic knowledge for granted. The assumption that the
phonetics has already been ‘done’ can result in a problematic promotion of a given phonetic
feature without the consideration of less widely known phonetic facts. The role of voice onset
time (VOT) in analyses of laryngeal contrasts is a good example. VOT has long been known
to be a primary cue to voice contrasts across language (Lisker & Abramson 1964).
Typological studies observe two basic patterns in its implementation. Languages are
described either as voicing languages, with pre-voicing and short-lag VOT, or as aspiration
languages, with short-lag and long-lag VOT. Traditionally, this typology is seen as a case of
‘phonetic implementation’ a level of grammar below phonemic contrast (Keating 1990).
Under this view, VOT is seen as a non-distinctive property. An alternative view, known as
‘laryngeal realism’ (Honeybone 2005) promotes the status of the VOT difference, and claims
that the aspiration-voicing split is due to phonological, rather than implementational
differences. Cyran (2013) shows that ‘laryngeal realism’ cannot explain the complex sandhi
voicing processes observed in different dialects of Polish. Since laryngeal realism is based
primarily on VOT, traditionally interpreted as an aspect of phonetic implementation, Cyran
concludes that the phonetics-phonology link must be arbitrary, arguing against the direct role
of phonetics in phonology. The implication behind this conclusion is that VOT completely
defines the phonetics of laryngeal contrasts. However, it has long been known that in addition
to VOT, laryngeal contrasts are based on other cues, including fundamental frequency (e.g.
Lisker and Abramson 1985) and burst amplitude (Repp 1979). In Polish, Aperliński (2012)
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
2
has shown that when VOT cues are not available, listeners still hear the voice contrast. Thus,
while Cyran correctly identified problems with ‘laryngeal realism’, these problems do not
lead to the conclusion that phonetics does not have a direct role in phonology. Rather, Cyran’s
study simply shows that VOT is insufficient for describing the realization of laryngeal
contrasts.
The issues in the realization of laryngeal contrasts have shown that many
phonologists, in particular those who argue that phonetics is not phonological, may be overly
selective in attributing a given phonetic feature to the realization of a phonological
representation. In other cases, phonologists perhaps have not been selective enough. In the
1990s, a new ‘phonetically-based’ phonology was born (Hayes 1999, Steriade 1997), utilizing
the formal environment of Optimality Theory (Prince & Smolensky 1993) to incorporate
perceptual and articulatory constraints directly into grammar. This research has spawned
important new typological and experimental studies and re-analyses of old data. Thanks to
this work we now have much better understanding of the phonetic underpinnings of a number
of phenomena, including place assimilation (Jun 1995), syllable weight (Gordon 1999), vowel
reduction (Crosswhite 2001), vowel harmony (Kaun 1995), and many others. Nevertheless,
most recent phonetic studies have generally not addressed a fundamental question: which
phonetics is phonological and why? For example, Flemming (2002) offers auditory
representations of vowel quality deriving from numerical scales of formant frequencies for
F1, F2, and F3. Flemming’s representations go a long way toward explaining the tendency of
vowel inventories to maximize auditory contrast. At the same time, however, the nature of the
representations suggests that each of the three formants should play an equal role in the
phonological patterning of vowels. No explanation is offered for the functional primacy of F1
over F3 in vowel systems across languages. In other words, we are left wondering why F1 is
more ‘phonological’ than F3. While the perceptual factors underlying this fact are known,
they are not incorporated into the phonological representation.
This paper will discuss various aspects of the physical realization of speech sounds
with an eye toward evaluating their phonological credentials. That is, we shall focus on the
question: “Which phonetics is phonological?”. I will argue for two basic claims. First, the
domain of speech perception, rather than articulation, must play a dominant role in the
formulation of phonological frameworks. In other words, for the most part, perception is
phonology (cf. Boersma & Hamann 2009). The second claim is that the phonetic properties
associated with manner of articulation are inherently more categorical, and thus phonological,
than the phonetic properties associated with place and laryngeal features. This claim has been
implemented within the Onset Prominence framework (OP; Schwartz 2013a), revealing
important insights on a number of problematic phenomena. Stated briefly, from the phonetics
of manner of articulation we may construct a new prosodic ‘skeleton’ that captures important
phonological generalizations.
The rest of this paper will proceed as follows. Section 2 will present arguments in
favor of the hypothesis that speech perception and phonology are closely linked. Section 3
compares the phonological implications of the acoustic and auditory properties of speech.
Section 4 examines the effects of phonetic variability in the realization of place, laryngeal,
and manner contrasts. Finally, Section 5 presents the OP perspective on a range of
phenomena, including sonority and strength, word-initial vowels, and the behavior of coda
stops. From the standpoint of the OP framework, we gain new insights into the question posed
in the title of this paper.
2 Phonology and perception
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
3
The field of phonetics encompasses two primary areas: speech production and speech
perception. With respect to the fundamental question asked in this paper, ‘which phonetics is
phonological?’, a comparison of the phonological aspects of articulation and perception is a
natural place to start. In what follows, we will argue for the primary role of speech perception
in phonology.
The first argument concerns the relative acoustic and perceptual constancy of speech
in the face of variability in production. Stevens (1989) describes some aspects of this
constancy in his Quantal Theory of speech. He notes that the acoustic-articulatory space is
made of a number of ‘quanta’, which articulatory changes have only minimal acoustic
consequences. For example, /ɹ/ in American English may be produced in at least three
different ways (Hagiwara 1994), but still is characterized by a low third formant. Both
computers and ventriloquists (as mentioned in Harris 1994: 110) are capable of producing
comprehensible spoken language in ways that differ significantly from those described in
phonological theories based on articulation. Jakobson (1962) noted that being understood is
the primary goal of speech, and that the acoustic signal is the only shared experience of
speaker and hearer. These considerations strongly suggest that it is indeed perceptual targets
that are recorded in speakers' minds, and any available articulatory means to produce them
may be employed.
Empirical evidence for the primacy of perception may be found in the study of first
language phonological acquisition. Simply stated, congenitally deaf people generally do not
learn to speak. Hearing children typically acquire phonological categories in perception long
before they are able to produce them. Blevins (2004) cites studies of children's production of
American /ɹ/, in which the apparent [w] that is often substited for the target rhotic is
systematically different from underlying /w/. This finding suggests that these children have
acquired the rhotic in perception but are unable to produce it in a way that allows adult
listeners to distinguish it from /w/. Similar perception-first effects may be found in second
language acquisition – learners typically can understand quite a bit of an L2 before they are
capable of speaking.
Sound change also suggests a primary role for speech perception. Near-merger (e.g.
Labov 1997), a phenomenon described in the sociolinguistic literature, provides evidence for
a perception-first view. Nearly merged categories are merged perceptually. However, small
systematic differences in production are still found, implying that production lags behind
perception in the process of language change. Ohala (1981) surveys a number of changes with
perceptual motivation, proposing a model of sound changes in which the listener plays a
primary role. The role of perception, however, goes beyond the fact that some changes may
occur for perceptual reasons. Any sound change, including those with articulatory motivation,
must be licensed by listeners in order to take hold in the grammar of a language. From this
perspective, all processes are listener-oriented, regardless of whether perceptual or
articulatory considerations provided their original spark.
Perhaps the most striking evidence for a direct link between phonology and speech
perception comes from cross-language studies. It is well established that one’s native
phonology has an impact on speech perception (e.g. Best 1995, Flege 1995). Briefly stated,
the same acoustic stimulus is often heard differently by speakers of different languages. A
particularly notable example is perceptual epenthesis (e.g. de Jong & Park 2012), by which
listeners report hearing vowels that are not produced, a process that is governed by native
phonotactic restrictions. For example, speakers of languages with restricted coda inventories
or without consonant clusters perceive a vowel in these positions in the speech of L2
speakers. If perception were not an intimate aspect of phonological competence, processes
such as perceptual epenthesis, which are quite widespread, would not be expected.
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
4
3 Auditory vs. acoustic considerations in listener-oriented phonology
Why should there be such a close relationship between phonology and speech perception?
Briefly stated, the auditory system acts in a “phonological” manner, reducing the amount of
acoustic information to be processed in both the spectral and temporal domains. The entire
frequency range of human audition (up to about 20 kHz) may be reduced to a relatively small
number of critical bands. In the temporal domain, onset boosts (see e.g. Wright 2004) serve to
focus perceptual attention on smaller portions of the signal, again reducing the amount of
acoustic input to be processed. In what follows, we shall briefly examine a selection of cases
in which the auditory system filters out acoustic variability in the formation of categories that
play important roles in phonological grammars. The phenomenon of categorical perception is
indeed a striking example of the phonological nature of the auditory system.
Acoustic variability found in speech often resides in the domain of contextually
induced spectral details. For instance, the high front vowel /i/ often shows a lower F2 in a
labial context than it does in a velar or alveolar context. For an adult male speaker, 2000 Hz
for F2 in a labial context and 2200 Hz in the alveolar context might be considered typical
measurements. While this acoustic difference is not insignificant, the auditory difference
between these two measurements is only about 0.6 Bark, less than one critical band.
Moreover, the general spectral shape of the resonance patterns is constant in the two contexts,
with a large dip (e.g. Harris 1994) in the spectrum between the first and second formants, and
a relatively close convergence of the second and third formants (Syrdal & Gopal 1986). This
auditory constancy is found in phonological feature systems, represented as either a feature [-
back] or an element {I}, the latter of which is an explicitly auditory representation.
A more striking case in which the human auditory system acts as an information
reduction device may be found in asymmetries in auditory response to certain portions of the
signal. In particular, pre-vocalic consonants, which are acoustically lacking in robustness have
been found to be associated with a period of heightened perceptual sensitivity, referred to as
an onset boost, followed by a period of decreasing auditory response known as adaptation
(e.g. Delgutte 1997) . As a result, CV-type syllables, produce the best match between
phonotactics and auditory response. Consonants tend to correspond with increased response,
while vowels align with decreased perceptual sensitivity. These considerations are reflected in
the functional primacy of consonants in the formation of lexical contrasts across languages;
consonants have a tendency to align with perceptually robust portions of the signal. The
widespread requirement across languages for a vocalic ‘nucleus’ to a syllable can also be
explained in terms of auditory sensitivity. Vowels are the only segment types that produce
enough acoustic energy to overcome the decrease in auditory response associated with
adaptation.
In sum, basic auditory mechanisms offer plausible motivation for the assumption that
speech perception and phonological systems must be intimately linked. Despite a high degree
of acoustic variability in speech, auditory processing categorizes selected aspects of the
speech signal. Many recurrent phonological patterns appear to be the natural consequence of
perceptual constraints.
4 Incorporating perception into phonology – where to start?
A fundamental problem for any model of the phonetics-phonology relationship is the range of
gradient and variable phenomena found in speech. On numerical acoustic and articulatory
scales, we rarely if ever find two identical realizations of a given category. While the
perceptual underpinnings of the processes underlying the formation of phonological
categories may be known, it is not obvious how these considerations might be incorporated
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
5
into a phonological theory. To fully address the question of “which phonetics is
phonological?”, we need to re-examine the phonological side of the equation and formulate a
new question. Which phonological phenomena are associated with phonetic realizations that
are most conducive to auditory categorization? In other words, we seek to compare the
phonetics of various phonological categories to assess which categories are more
‘phonological’ in their phonetic realization. Such as examination leads to the conclusion that
the phonetics of manner of articulation is inherently more categorical than the phonetics of
other phonological classes. As a result, it may be suggested that manner contrasts serve as a
fundamental building block of phonological structure, which allows us to restrict the domain
of the phonetics-phonology interface to place and laryngeal contrasts.
The phonological nature of manner may be explained by the type and consequences of
phonetic variability. While much has been made of the gradient and variable properties of
speech, not all gradient phenomena are equivalent in their perceptual and phonological
consequences. Gradient spectral information may be found in the resonance properties of the
vocal tract as well as voice source characteristics. For instance, two instances of a vowel such
as /u/ may show a difference in F2 frequency resulting from contextual variability, speech
rate, prosodic position, or other factors. Temporal gradience may be observed in largyngeal
contrasts and quantity distinctions; voice onset time (VOT) in stops shows differences on the
basis of consonant place of articulation and vocalic context, among countless other factors.
Gradience in both the spectral and temporal domains is readily quantifiable and transfers
easily to descriptions of both place and laryngeal features.
In the case of manner contrasts however, the acoustic signal appears to be somewhat
more categorical in nature, characterized by acoustic ‘landmarks’ (Stevens 2002). Consider
stops, which are perceived on the basis of a rapid drop in amplitude and a (near) silent closure
period. Although stops may be realized with incomplete closure (Crystal & House 1988) that
is quantifiable on a gradient continuum, this does not necessarily entail sufficient frication
noise for the perception of another manner category (fricatives). That is, despite gradient
realization, stop closure is an inherently privative property. It is either perceived or it is not
perceived, but the failure to hear it does not inevitably imply the perception of another
category. By contrast, changes in formant frequencies associated with vowel quality result in
the perception of a different vowel, but the newly perceived sound is still a vowel.
One acoustic aspect in the realization of manner contrasts sets manner apart from
laryngeal and (especially) place specifications. Place of articulation is reflected in terms of
spectral properties and is largely independent of duration and overall amplitude. Laryngeal
contrasts are based primarily on the relative timing of glottal and supra-glottal events; the
spectral domain is secondary1. In other words, place and voicing may be reliably cued by a
single acoustic measure. By contrast, manner contrasts are reflected primarily in the
interaction of two physical scales. A primary cue for a number of manner contrasts is rise time
in the amplitude envelope (Shinn & Blumstein 1984; Johnson 1997). Rise time is not
quantifiable in absolute terms on a single physical scale, but must be defined in terms of a
ratio between two scales: amplitude and time. In essence, rise time requires the listener to
compute this ratio, which may be expected be more taxing on cognitive resources than
identifying categories on a single physical dimension. Consequently, we might expect
listeners to be less sensitive to gradience in the realization of manner contrasts than in the
realization of place and laryngeal contrasts.
The relative robustness of the gradient properties associated with place, manner, and
laryngeal features may be seen very easily in the inventory of diacritics in the International
1 Place and laryngeal features have secondary cues that manipulate additional physical scales. Voicing in
particular is reflected in both the amplitude domain; voiceless consonants tend to produce higher amplitude burst
and aspiration, and raise the pitch on neighboring sounds.
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
6
Phonetic Alphabet. The greatest number of diacritics specify adjustments to vowel quality,
which are reflected in spectral properties that have been more robustly identifiable for
phoneticians. There are also a large number of diacritics related to laryngeal features. By
contrast, the IPA has only a small number of manner diacritics. We may conclude that
gradience in the place and laryngeal features is simply easier to identify than gradience in
manner.
In sum, to answer the question posed earlier, “which phonetics is phonological?”, we
suggest that manner-related phonetic features are more categorical in nature, and should be
separated from place and laryngeal features in the construction of phonological
representations.
5 Implementation of a manner-based representational theory
With regard to the phonetics-phonology relationship, manner of articulation appears to be the
most ‘phonological’ of the three major classes of features. Prosodic structure has also been
claimed to be ‘phonological’ rather than phonetic (e.g. Steriade 1997). Consequently, it may
be suggested that manner of articulation is an inherently prosodic or structural specification,
while place and laryngeal specifications are better described as melodic. This idea has
occasionally been proposed in recent years (Steriade 1993, Golston & van der Hulst 1999,
Pöchtrager 2006), but remains outside the mainstream.
5.1 The Onset Prominence representational environment
A structural view of manner is implemented in the Onset Prominence representational
environment (OP; Schwartz 2013a), in which prosodic constituents and segmental
representations are constructed from the same representational materials that define manner
of articulation. Equating manner with prosodic structure thus restricts the primary domain of
the phonetics-phonology interface to place and (some languages’) laryngeal contrasts.
The representational insights of the OP framework rest on two assumptions, the first of
which concerns the choice of phonological primitives. Since stop-initial CV syllables are the
most common syllable type across languages, they are, in slightly modified form, assumed to
be the primitive building block for all segmental and prosodic representations. This building
block is shown in the tree structure in (1), and is constructed from the series of phonetic
events associated with a stop-vowel sequence in initial position.
(1) The Onset Prominence representational hierarchy
The highest layer of the hierarchy is the closure (Closure) of the stop. The release of the stop
produces a portion of aperiodic noise (Noise). Noise is followed by a significant rise in
amplitude as the vowel begins (Vocalic Onset; VO), after which we may observe a portion of
relatively stable formant frequencies (Vocalic Target; VT). This is followed by a decrease in
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
7
amplitude (Offset). A stop-vowel sequence will contain all four layers of the structure in (1).
Place and laryngeal specifications attach to the terminal nodes.
The other assumption crucial to the OP model is related to the lowest two layers of the
hierarchy, the VO and VT nodes. Vowels, which are traditionally described as single
segments, are split into two representational layers. The VO layer represents the initial portion
of a vowel in the CV context, which typically contains acoustic cues to the identity of the
consonant (e.g. Wright 2004). As a result, this node may be built into the representation of
consonants. The VT layer encodes the later portion of vowels. In Section 5.4, we will return
to the ambiguity inherent in this representational split.
5.2 Manner and constituency
From the hierarchy in (1) we derive manner of articulation in terms of the layers of structure
contained by a given segmental tree. This is shown in (2), which provides structures for a
labial stop, nasal, fricative, approximant, and vowel. The binary nodes are active elements in
the individual representations, while the unary nodes, which represent properties that are
absent from the speech signal, serve as place holders to indicate the relative hierarchical
position occupied by a given segmental structure. The segmental symbols may be interpreted
as shorthand for place and laryngeal specifications. Note that these representations encode
both sonority and consonantal strength. More ‘sonorous’ segments are housed lower in the
representational hierarchy, while stronger consonants are higher. We will return to the OP
view of these phenomena shortly.
(2) Manner distinctions in the OP environment
The relationship between prosodic constituents and segmental representations in the OP
environment is illustrated in (3). On the left we see individual segmental structures for
English quick, while on the right we see the word as a single prosodic constituent. The unary
nodes from (2) have been removed in (3) for the sake of visual clarity. Prosodic constituents
are formed when lower level sonorant and vowel structures are absorbed into higher level
obstruent structures. In quick, the glide and vowel are absorbed into the structure of the initial
stop. Stray structures, such as the final /k/ in quick, may be submerged under the preceding
constituent. We will return to submersion in Section 5.5.
(3) Segmental and constituent (right) structures for English quick
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
8
5.3 Sonority and strength
The manner contrasts in (2) allow us to capture generalizations that may be attributed to the
role of sonority in describing the ways in which phonological segments may be arranged in a
linear string. Sonority is generally assumed to be a scalar feature, by which segments may be
ranked according to the following hierarchy:
stops < fricatives< nasals < liquids <glides<vowels.
Sonority is frequently invoked to account for restrictions on sound sequences within and
between syllables. The most sonorous part of a syllable is the nucleus, and according to the
Sonority Sequencing Generalization (SSG; Selkirk 1984), segments decrease in sonority the
further they get from the nucleus. In the case of onset clusters, the SSG mandates rising
sonority (e.g. pr, kl). This restriction is captured by absorption in OP structures – higher level
obstruents absorb liquids and glides, which themselves absorb vowels, as shown in (3) for the
word quick.
Harris (2006) notes certain problems with the sonority proposal, despite its heuristic
usefulness. First of all, a clear phonetic correlate of sonority has been difficult to identify.
Acoustic intensity or degree or articulatory opening are not satisfactory, given the intensity
associated with sibilant fricatives and the full oral closure associated with nasals. Other
proposals based on a conglomerate of features associated with perceptual salience (Clements
1990) fail to capture place-based restrictions on consonant clusters. Nevertheless, the
empirical generalizations of the sonority hierarchy are quite robust, and it is desirable to be
able to express them in a phonological theory, as the OP framework manages to accomplish.
A comparison of the predictions of the OP framework and the sonority hierarchy reveals
differences with respect to the status of nasals in onset clusters. According to the traditional
hierarchy, stop-nasal clusters show a rising sonority slope and should be tolerated in a fair
number of languages. By contrast, in the OP environment both stops and nasals are specified
for the Closure node and should not be subject to absorption. In other words, OP predicts that
stop-nasal clusters should be quite rare, while the traditional hierarchy predicts they should
occur regularly, albeit not as frequently as stop-liquid clusters.
Sonority may be seen as the opposite of phonological strength, another frequently
invoked hierarchy that is claimed to be relevant to phonological theory. Strength is frequently
invoked to describe weakening process such as spirantization of stops and the vocalization of
fricatives (LaVoie 2001). These processes are easily represented in the OP framework: each
represents the loss of the highest active node in the structure. However, as Harris (2009)
points out, as with sonority the status of nasals is problematic for the strength as sonority
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
9
view: weakening of fricatives never produces nasals. This restriction is predicted with OP
representations, in which stops and nasals share a high-level closure node.
5.4 Phonetically-derived phonological ambiguity
Among the more important aspects of the Onset Prominence representational environment is
its incorporation an important phonetic ambiguity with respect to the initial portion of vocalic
segments. In the hierarchy in (1), the segmental affiliation of the VO node is ambiguous. This
single layer of structure may be claimed by multiple segment types. On its own, VO
represents the class of approximants and glides as we see in (2) and (3). In (2), we also see
that VO may be active in the representation of obstruents and nasals, representing formant
transitions that cue consonant place of articulation (e.g. Wright 2004). At the same time, the
VO node is derived from a portion of the signal that is, strictly speaking, vocalic. As a result,
we might expect to find VO built into the representation of vowels as well as consonants. The
trees in (4) present two types of vowel representation, with or without an active VO node2.
(4) Vowels with our without VO specification
The presence or absence of VO in vowel representations offers a useful perspective on the
prosodically ambiguous behavior of onsetless syllables across languages (see Schwartz
2013a). Briefly stated, VO-specification allows initial vowels to satisfy prosodic constraints
requiring onsets. This may be manifested as an apparent ‘empty consonant’ (e.g. Marlett &
Stemberger 1983), or simply as prosodic well-formedness for processes such as stress
assignment or reduplication (cf. Downing 1998 for discussion of prosodically ill-formed
onsetless syllables). Alternatively, VO specification may be associated with glottal marking
on word-initial vowels, as is frequently observed in languages such as German (Wiese 1996),
Czech (Bissiri & Volín 2010), and Polish (Schwartz 2013b). In the case of Polish, VO
specification facilitates the formulation of predictions with regard to the formation of prosodic
boundaries that correspond with orthographic boundaries. VO specification in Polish
strengthens boundaries, preventing processes occurring in English such as sandhi
palatalization (got you ~ gotcha) and resyllabification (find out ~fine doubt).
While glottal marking has traditionally been seen as a phonetic detail, OP
representations suggest that this phonetic strengthening process has phonological origins in
some languages. We now turn to the representation of post-vocalic consonants, in which the
OP perspective suggests that the release of coda stops, traditionally seen as a phonetic process
affecting a non-contrastive feature, may indeed have significant phonological implications.
2 The structures in (4) may raise questions concerning the questions of glide-vowel sequences of the type /wu/
and /ji/, which at first glance would appear to be structurally identical to VO-specified /u/ and /i/. Levi (2008)
identified differences between ‘underlying’ and ‘derived’ glides. For the former we posit that the glide-initial
sequence contains an additional specification. For details, see Schwartz (2012).
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
10
5.5 Submersion and the representation of prosodic weakness
In the basic constituent forming mechanism of the OP environment, lower level vowel and
approximant structures are absorbed into higher-level obstruents. Absorption is possible with
the leftmost structure is higher in the OP hierarchy. In some cases, absorption may not take
place since this condition is not met. In such cases, a different mechanism, submersion, may
be motivated. One example of submertion is shown in (5), in which the second of two /a/
structures is submerged under the first, resulting in a long vowel .
(5) Submersion resulting in a long vowel
Beyond the relatively simple vowel lengthening mechanism shown in (8), submersion
is a process with far-reaching prosodic implications. Before discussing specific structures, it is
important to note that submersion is a form of phonological recursion of the type proposed by
van der Hulst (2010). In his view, codas may be seen as ‘syllables inside syllables’. OP
submersion unifies this view with the representation of long vowels. Additionally, submersion
offers insight into the behavior of consonants in both VC and VCV contexts, with deeper
predictions for the form and behavior of larger prosodic constituents.
First, we turn to the behavior of coda consonants. In (6), we see a string of segmental
structures for the English word click. On the basis of the discussion thus, far, we should
expect the liquid and the vowel to be absorbed into the structure of the initial /k/. As yet,
however, we have not addressed the fate of the final /k/. It may not be absorbed into the
preceding constituent, since its Closure and Noise nodes are higher in the OP hierarchy. In
English, a stop such as /k/ cannot stand on its own as a prosodic constituent, and traditional
representations would assign this segment to the ‘coda’ position.
(6) String of segmental trees for click
Polish has borrowed the word click, reproducing it as klik. In Polish and English however,
there is a systematic difference in the behavior of the final /k/ with regard to the release of the
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
11
stop. In Polish the release is obligatory (e.g. Dukiewicz & Sawicka 1995), while in English it
is frequently suppressed. Such behavior suggests two different representations of the ‘coda’
/k/, shown in (7).
(7) Constituent structures for klik (left) and click
On the right, the final stop in English click is submerged under the preceding constituent. On
the left the final /k/ in Polish klik is not submerged. The result is that the English ‘coda’ is in a
lower prosodic position in the OP hierarchy. Associating lower hierarchical levels with
prosodic weakness, we should expect the English coda to be subject to lenition processes such
as the suppression of release.
Submersion in English is not limited to individual segments – entire ‘syllables’ may
also be lowered under the preceding constituent. In (8) we offer a representation of the
English word pity. This results in a larger ‘foot’ structure in which the onset to the second
syllable is underneath the first syllable. Such a configuration offers an insightful
representation of ‘ambisyllabicity’ (Kahn 1976) and prosodic weakness in languages in which
the second consonant in a trochaic CVCV foot is subject to lenition (see Jensen 2000; Harris
2004). The /t/ in this this word is of course realized as a tap or a glottal stop in various
vernacular dialects of English. A segmentally ‘similar’ word in Polish, PIT-y, which refers to
tax return forms, would not contain submerged structure, the two constituents would be
adjoined at a higher level of structure .
(8) The English word pity
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
12
In sum, our discussion points to an opposition between English and Polish with regard to
relations between individual segmental structures and the primitive CV tree in (1). English
offers the possibility of submersion, associated with long vowels, weak coda consonants, and
weak consonants in VCV contexts. Submersion is absent in Polish. Codas and intervocalic
consonants are generally not subject to weakening, and vowel quantity distinctions are absent.
Our perspective suggests that phonetic properties associated with manner of
articulation may be claimed to be phonological. This view restricts the domain of the
phonetics phonology interface to the realization of place and laryngeal contrasts. At the same
time, certain phenomena that have traditionally been seen as non-phonological may be subject
to re-evaluation. For example, production of glottal marking on word-initial vowels may be
attributable to a phonetically-derived phonological parameter. In traditional accounts, vowel
glottalization entails the insertion of a non-contrastive segment [ʔ]. Our perspective suggests
that in languages in which initial vowels are specified for the VO node such as Polish and
German, vowel glottalization is better analyzed as the strengthening of a phonological
property that is already present in the vowel’s representation.
5.6 Coda stop release is phonological
In what follows, we will return to the question of coda stop release, another process that is
traditionally viewed as non-contrastive. With regard to stop release, languages adopt one of a
limited number of strategies, suggesting that the phenomenon is in fact phonological. Our
discussion will focus on a comparison of English and Polish with Korean, a language in
which coda stop release is obligatorily suppressed. Korean has a particularly restrictive
inventory of coda consonants. While stops are allowed in coda position, they are always
unreleased. In addition, the laryngeal contrast, a three-way opposition among plain voiceless,
voiceless aspirated, and tense or stiff-voiced stops, is neutralized.
To provide some perspective on Korean phonotactic constraints, OP representations
for the three Korean labial stops are proposed in (9). Crucially, the framework allows for the
possibility that different melodic specifications may be housed at different levels of the OP
hierarchy in accordance with their phonetic realization. Place is specified as a [labial]
annotation on the Closure node. This is to be expected since it is the location of the closure
that defines place of articulation. By contrast, laryngeal features, whose phonetic realization
may be impeded by stop closure, may be assigned at lower levels. This is seen in ().
Aspiration, which is of course associated with aperiodic noise, is shown as a [spread glottis]
specification on the Noise node. Tenseness is represented as a [constricted glottis] annotation
on the VO node. This feature is associated with a stiffer voice quality on the onset of the
following vowel (e.g. Ladefoged & Maddieson 1996), so the VO node is a natural structural
position for [cg] specifications. With these representations in mind, we may turn to the
representation of codas in Korean, which are characterized by the presence of place contrasts,
but neutralized manner and laryngeal contrasts.
(9) Korean labial stops; plain (left), aspirated, and tense (right)
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
13
In the OP environment, codes may be attributed to the process of submersion, by which a
stray structure is moved to the bottom of the preceding constituent. Alternatively, the post-
vocalic consonant may resist submersion as we saw in Polish klik. Submerged stops are in a
prosodically weaker position, and are predicted to be subject to lenition processes such as the
suppression of release. Since in Korean, coda stops are always unreleased, we posit that
submersion is the process that forms codas in this language. The submersion of a Korean stiff-
voiced stop would produce the structure in (10). Note the structural nodes associated with the
coda consonant are located at the bottom of the constituent structure.
(10) Submerged stiff-voiced stop in Korean
To explain the fact that Korean maintains place contrasts in coda stops, but neutralizes
laryngeal constrasts, we need only propose a constraint against multiple layers of submerged
structure. In other words, Korean, like many other languages, appears to restrict the size of
syllable rimes. The claim here would be that Korean only allows a single node under the VT
level. The laryngeal specification, which as we saw in (9) is housed on lower-level nodes,
loses its structural housing and is not realized in this position. This is shown in (10) as the
crossed-out labels of the lower Noise and VO nodes. Thus, the apparent mismatch in Korean
between a licensed place contrast and neutralized laryngeal contrast may be explained as a
single constraint on the size of prosodic constituents. With the lower Noise and VO nodes
eliminated in (10), it falls out naturally that coda stops in Korean are always produced without
an audible release. The only remaining part of the stop is Closure. The Korean requirement
that coda stops are unreleased is a systematic element of the language’s prosodic
representation.
Kang (2003) studied epenthesis in post-vocalic stops in English loanwords in Korean.
She provides evidence that the presence of epenthesis in Korean loanwords in English is
closely related to the probability that target language coda stops are produced with an audible
release. Epenthesis should be expected when the target language coda contains a release burst,
which of course in Korean only occurs in syllable onsets. Thus, released stops are adapted
with an additional prosodic constituent whose prominence is enhanced by the epenthetic
vowel.
While in English coda stops are optionally released and in Korean coda stops are
always unreleased, in Polish codas coda stops are always released, except in homorganic
clusters. Thus, for Polish we must claim that coda stops are not submerged under the
preceding rhyme. Rather, they remain at their underlying level of the OP hierarchy as seen in
(7). The absence of submersion in Polish offers a unified explanation of the obligatory release
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
14
of coda stops and the lack of vowel length distinctions, two seemingly unrelated phenomena.
It is interesting to note that epenthesis in the speech of Korean learners of Polish is obligatory
(Dziubalska-Kołaczyk, p.c). It is not subject to the variability found in English loanwords in
Korean.
Kang attributes the link between epenthesis in Korean English and target language
stop release to ‘perceptual similarity’, arguing for the importance of a non-contrastive
phonetic detail, stop release, in the mechanism of loanword adaptation. From the OP
perspective, stop release, though non-contrastive, is not merely a phonetic detail. It is
predictable on the basis of phonological parameters and constraints. The lack of release in
Korean and the obligatory release in Polish are systemic elements of the phonologies of the
two languages. The OP framework offers representational devices in which these
generalizations may be captured.
Final remarks
This paper has examined certain unresolved issues concerning the role of phonetics in
phonological theory. In particular, we addressed the following question: ‘Which phonetics is
phonological?’. Auditory considerations point to the conclusion that those phonetic
properties associated with manner of articulation are purely phonological, while place and
laryngeal-based features constitute the locus of the phonetics-phonology interface. This view
of the phonetics-phonology relationship is implemented in the Onset Prominence framework,
in which prosodic structure is built from manner-based features.
The theoretical considerations outlined in this paper suggest a number of practical
applications. Phonological theory is relevant in a wide range of fields, including language
learning, speech technology, and speech therapy. The OP framework is an explicit model that
allows for the formulation of concrete hypotheses for experimental study into the relation
between competence and performance in speech. As such, the model has clear benefits in any
practical area in which phonology is applied. For instance, in the area of L2 speech
acquisition, new phonological parameters bring new awareness of non-contrastive phonetic
aspects of the target language. Successful acquisition of features such as unreleased stops
(5.6), or the suppression of initial vowel glottalization (5.4) is associated with more target-like
production and higher listener ratings on scales of foreign accentedness. The contribution of
the OP framework lies in the fact that it provides new devices for the formulation of
experimental hypotheses for cross-linguistic comparisons.
References
Abramson A. S. & L. Lisker. 1985. Relative power of cues: F0 shift versus voice timing, in:
Fromkin V. (ed.). Phonetic Linguistics. 25-33. New York: Academic Press.
Aperliński, G. (2012). Is VOT enough? Paper presented at the 6th
International Conference on
Accents of English. Łódź.
Best, C. T. (1995). A direct realist view of cross-language speech perception. In W. Strange
(Ed.), Speech Perception and Linguistic Experience. Baltimore, MD: York Press.
Blevins, J. (2004). Evolutionary phonology. Cambidge: Cambridge University Press.
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
15
Bissiri, M.P. & Volín, J. (2010). Prosodic structure as a predictor of glottal stops before word-
initial vowels in Czech English. In R. Vích [Ed], 20th Czech-German Workshop – Speech
Processing, Prague, 23-28, 2010.
Boersma, P. & S. Hamann. 2009. Loanword adaptation as first-language phonological
perception. In Andrea Calabrese & W. Leo Wetzels (eds.), Loanword phonology. Amsterdam:
John Benjamins.11-58.
Chomsky, N. & M. Halle. 1968. The Sound Pattern of English. New York: Harper & Row.
Clements, G.N., 1990. The role of the sonority cycle in core syllabification. In: Kingston, J.,
Beckman, M.E. (Eds.), Papers in Laboratory Phonology I; Between the Grammar and Physics
of Speech. Cambridge University Press, Cambridge, pp. 283–333.
Crosswhite, K. (2001). Vowel reduction in Optimality Theory. New York: Routledge.
Crystal, T. & A.S. House. (1988). The duration of American English stop consonants: an
overview. Journal of Phonetics 16. 285-294.
Cyran, E. (2013). Polish voicing – between phonology and phonetics. Lublin: Wydawnictwo
KUL.
De Jong, K. & H. Park. (2012). Vowel epenthesis and segmental identity in Korean learners
of English. Studies in Second Language Acquisition 34. 127-155.
Delgutte, B. 1997. Auditory neural processing of speech. In Hardcastle, W. and J.
Laver (eds.), The handbook of phonetic sciences, pp. 507-38. Oxford: Blackwell.
Downing, Laura. 1998. On the prosodic misalignment of onsetless syllables. Natural
Language & Linguistic Theory 16. 1–52.
Dinnsen, D. & J. Charles-Luce. 1984. Phonological neutralization, phonetic implementation
and individual differences. Journal of Phonetics 12. 49–60.
Dukiewicz, L. & I. Sawicka. (1995). Gramatyka współczesnego języka polskiego – fonetyka i
fonologia [Grammar of modern Polish – phonetics and phonology]. Krakow:
Wydawnictwo Instytutu Języka Polskiego PAN.
Flege, J.E.(1995). Second language speech learning: Theory, findings, and problems. In
W.Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language
research (pp. 233–277). Baltimore: York Press.
Flemming, E. 2002. Auditory representations in phonology. New York: Routledge
Golston, C. and H. van der Hulst. 1999. Stricture is structure. In. Hermans, B. and M. van
Oostendorp, eds. The Derivational Residue in Phonological Optimality Theory. Amsterdam:
John Benjamins. 153-173.
Gordon, M. 1990. Syllable weight – phonetics, phonology and typology. Ph.D. dissertation,
UCLA.
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
16
Hagiwara, R. 1994. Three types of American /r/. UCLA Working Papers in Phonetics 88. 55-
62.
Harris, J. 1994. English Sound Structure. Oxford: Blackwell.
Harris, J. 2004. Release the captive coda: the foot as a domain of phonetic interpretation. In J.
Local, R. Ogden & R. Temple (eds.), Phonetic interpretation: Papers in Laboratory Phonology
6, 103-129. Cambridge: Cambridge University Press.
Harris, J. 2006. On the phonology of being understood: further arguments against sonority.
Lingua 116. 1483-1494.
Harris, J. 2009. “Why final obstruent devoicing is weakening”. In: Nasukawa, N. and P.
Backley (eds.), Strength relations in phonology. Berlin: Mouton de Gruyter. 9–46.
Hayes, B. (1999). Phonetically-based phonology – the role of Optimality Theory and
inductive grounding. In Michael Darnell, Edith Moravscik, Michael Noonan, Frederick
Newmeyer, and Kathleen Wheatly, eds., Functionalism and Formalism in Linguistics,
Volume I: General Papers, John Benjamins, Amsterdam, pp. 243-285.
Honeybone, P. 2008. Lenition, weakening and consonantal strength: tracing concepts through
the history of phonology. In Brandão de Carvalho, J., Scheer, T. & Ségéral, P. (eds). Lenition
and Fortition. Berlin: Mouton de Gruyter, 9-93.
Honeybone, P. (2005). Diachronic evidence in segmental phonology: the case of obstruent
laryngeal specifications. In M. van Oostendorp & J. van de Weijer (eds.), The internal
organization of phonological segments, 319-354. Berlin: Mouton de Gruyter.
van der Hulst, H. 2010. A note on recursion in phonology. In van der Hulst, Harry, (ed.).
Recursion and Human Language, 301-342. Berlin: Mouton de Gruyter.
Jakobson, R. (1962). Phonological studies. The Hague: Mouton.
Jensen, J. T. 2000. Against ambisyllabicity. Phonology 17: 187-235.
Johnson, K. 1997. Acoustic and auditory phonetics. Oxford: Blackwell.
Jun, J. (1995) Perceptual and Articulatory Factors in Place Assimilation: An Optimality
Theoretic Approach, Ph.D. dissertation, UCLA.
Kahn, D. (1976). Syllable-based Generalizations in English Phonology. Ph.D. dissertation.
MIT. Bloomington, IN: Indiana University Linguistics Club
Kang. Y. (2003). Perceptual similarity in loanword adaptation: English post-vocalic word-
final stops to Korean. Phonology 20,2: 219-273.
Kaun, A. (1995). An Optimality-Theoretic Typology of Rounding Harmony, Ph.D.
dissertation, UCLA.
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
17
Keating, P. (1990). Phonetic representations in a generative grammar. Journal of Phonetics
18. 321-334.
Labov, W. 1997. Principles of linguistic change: Internal factors. Oxford: Blackwell.
Ladefoged, P. & I. Maddieson. (1996). The sounds of the world’s languages. Oxford:
Blackwell.
Lavoie, L. 2001. Consonant strength: phonological patterns and phonetic manifestations. New
York: Garland.
Levi, S. 2008. Phonemic vs. derived glides. Lingua 118,1956-1978.
Lisker, L. & Abramson, A.S (1964). A cross-language study of voicing in initial stops:
Acoustical measurements. Word , 20, 384-422.
Manaster Ramer, A. 1996. A letter from an incompletely neutral phonologist. Journal of
Phonetics, 24 (4), 477-489.
Marlett, S. & J. Stemberger. (1983). Empty consonants in Seri. Linguistic Inquiry, 14, 617-
639.
Ohala, J. J. 1981. The listener as a source of sound change. In: C. S. Masek, R. A. Hendrick,
& M. F. Miller (eds.), Papers from the Parasession on Language and Behavior. Chicago:
Chicago Ling. Soc. 178 - 203.
Pöchtrager, M. 2006. The structure of length. Ph.D. dissertation. University of Vienna.
Port, R.F. 1996. The discreteness of phonetic elements and formal linguistics: response to A.
Manaster Ramer. Journal of Phonetics, 24(4), 491-511.
Prince, Alan and Paul Smolensky. 1993. Optimality Theory: Constraint Interaction in
Generative Grammar. Ms. Technical Reports of the Rutgers University, Center for Cognitive
Science.
Repp, B. H. 1979. Relative amplitude of aspiration noise as a voicing cue for syllable-initial
stop consonants. Language and Speech 22: 173–189.
Schwartz, G. (2012). Glides and initial vowels in the Onset Prominence representational
environment. Poznań Studies in Contemporary Linguistics 48 (4). 661-685.
DOI: 10.1515/psicl-2012-0029
Schwartz, G. (2013a). A representational parameter for onsetless syllables. Journal of
Linguistics 49 (3), 613-646. DOI: http://dx.doi.org/10.1017/S0022226712000436.
Schwartz, G. (2013b). Vowel hiatus at Polish word boundaries – phonetic realization and
phonological implications. Poznań Studies in Contemporary Linguistics 49 (4). 557-585.
Selkirk, E. (1984). Phonology and syntax – the relation between sound and structure.
Cambridge, MA: MIT Press.
In: Szpyra-Kozłowska, Jolanta; Guz, Ewa; Steinbrich, Piotr; Święciński, Radosław (eds.) Recent developments in applied phonetics. Lublin: Wydawnictwo KUL, 345-371
18
Shinn, P. & S. Blumstein. (1984). On the role of amplitude for the perception of [b] and [w].
Journal of the Acoustical Society of America 75(4). 1243-1252.
Slowiaczek, L.M. and Dinnsen, D.A. 1985. On the neutralizing status of Polish word-final
devoicing. Journal of Phonetics, 13(3). 325-341.
Steriade, D. (1993). Closure, release, and nasal contours. In M. K. Huffman and R. A.
Krakow (Eds.). Nasals, nasalization, and the velum, 401−470. San Diego: Academic Press.
Steriade, D. 1997. Phonetics in phonology: the case of laryngeal neutralization. Ms. UCLA.
Stevens, K. N. 1989. On the quantal nature of speech. Journal of Phonetics, 17(1), pp. 3-45.
Stevens, K. (2002). Toward a model for lexical access based on acoustic landmarks and
distinctive features. Journal of the Acoustical Society of America 111 (4). 1872-1891.
Syrdal, A. and H. Gopal. 1986. A perceptual model of vowel recognition based on the
auditory representation of American English vowels. JASA 79 (4), 1086-1100.
Wiese, Richard (1996), The Phonology of German, Oxford: Oxford University Press.
Wright, R. 2004. Perceptual cue robustness and phonotactic constraints. In Hayes, B., R.
Kirchner and D. Steriade (eds). Phonetically Based Phonology. Cambridge: Cambridge
University Press, 34-57.