Boundary effects in Polish English - implications for similarity in L2 phonology (with Grzegorz...
Transcript of Boundary effects in Polish English - implications for similarity in L2 phonology (with Grzegorz...
1
Boundary effects in Polish English – implications for ‘similarity’ in L2 phonology1
Geoffrey Schwartz1, Grzegorz Aperliński, Anna Balas
1, Arkadiusz Rojczyk
2
1-Faculty of English, Adam Mickiewicz University in Poznań; 2- Institute of English, University of Silesia
0 – Abstract
Experimental phonetic studies examine the extent to which Polish learners of English acquire
phonological processes occurring at word boundaries in the target language. In particular, we
look at linking of vowel-initial words and the suppression of stop release in word-final stops.
Our results suggest that glottalization of vowels, which blocks linking of initial vowels, and
coda stop release production, both constitute L1 phonological interference that learners must
overcome. The phonology of these seemingly unrelated effects may be unified in the Onset
Prominence framework, in which parametric representational mechanisms govern the extent
of phonetic cohesion between segments and the formation of prosodic boundaries.
Implications for the oft-invoked notion of cross-language phonological similarity are also
discussed.
1 - Introduction
For a significant percentage, if not a majority, of studies on second language (L2) phonology,
it is segmental contrasts that constitute the linguistic focus of experimental work. Researchers
have investigated the effects of cross-language differences both in segmental inventories, as
well as the realization of phonological categories. For example, many studies are devoted to
the acquisition of new L2 vowel contrasts by speakers whose L1 lacks the opposition in
question; Escudero & Boersma’s (2004) oft-cited study examines Spanish L1 speakers’
perception of the English /iː/-/ɪ/ contrast. Another area that has received a lot of attention is
the implementation of laryngeal contrasts, measured in terms of the phonetic parameter of
voice onset time (VOT). Studies typically compare VOTs of stops in the speech of bilinguals
whose two languages differ in this parameter, such as French and English (e.g. Flege 1987) or
Greek and English (e.g. Antoniou et al. 2010). Research into the perception and production of
L2 segments has been instrumental in the formation of two of the more influential theoretical
models of L2 speech acquisition. Flege’s Speech Learning Model (SLM; Flege 1995) and
Best’s Perceptual Assimilation Model (PAM; Best 1995, Best & Tyler 2007) facilitate the
formulation of important predictions concerning learners’ success or lack of success in the
acquisition of second language segments and contrasts.
When examining the research spawned by current models of L2 speech, one
encounters a crucial concept that constitutes the foundation of both the SLM and the PAM:
the notion of similarity. Difficulties in acquisition of L2 segments and contrasts are predicted
on the basis of how similar they are to those found in L1. Interestingly, similarity has been
seen as both a help (Best 1995) and a hindrance (Flege 1995) in L2 phonological acquisition.
A question that is not so frequently asked, however, is how similarity is to be defined. Flege
(1995: 264) notes this problem, stating that “an obstacle to testing hypotheses . . . is the lack
of an objective means for gauging degree of perceived cross-language phonetic distance”.
Nevertheless, while there is awareness of the issue, there remain unresolved issues with
regard to assumptions of cross-language phonetic distance (see e.g. Major 2008: 71). In
essence, the question of similarity is one of phonological representation. In what follows, we
1 This research was supported by a grant from the Polish National Science Centre (Narodowe Centrum Nauki).
Project number 2012/05/B/HS2/04036.
2
will offer enhanced representational tools that allow for a new perspective on the question of
cross-language phonetic distance. In particular, we consider the relationship between the
representation of segmental units and that of prosodic constituents and their boundaries.
Although traditional phonetic and phonological descriptions provide largely viable
options for defining segmental similarity between L1 and L2, the effects of external sandhi,
phonological processes observable at word boundaries, have been largely neglected in L2
speech research. For example, at the two previous New Sounds conferences, in Poznań,
Poland and Montreal, fewer than five out of well over 100 presentations could be claimed to
be related to boundary effects. Most of these studies addressed the well-known processes of
liaison and enchaînement in L2 French (e.g. Howard 2006, 2008; Shoemaker 2010; Sturm
2013), beyond which the acquisition of L2 sandhi remains almost entirely unexplored. Zsiga
(2011) provides a literature review that is helpful in assessing the current state of research.
She discusses just a few studies dealing with a small number of L1-L2 pairings (Catalan-
English, Cebrian 2000; English-Russian, Zsiga 2003; Spanish-German, Lleo&Vogel 2004;
Korean-English, Zsiga 2011). In sum, there is a clear need for more research on cross-
language boundary effects, which have clear implications for communicative success in the
target language.
The placement of prosodic boundaries may at first glance appear to be an
unpredictable aspect of the phonology of a given language. However, cross-language
comparisons suggest that languages may differ systematically in their boundary formation
mechanisms. Consider the sequence of /t#j/ in English got you as opposed to Polish kot jest
‘the cat is’. The English example is frequently produced without any boundary at all – the
stop and the glide are joined into an affricate and the form is pronounced gotcha. The /t/
undergoes a sandhi process of palatalization. Conversely, in Polish, such stop-glide
sequences are produced asynchronously (Święciński 2012) – the /t/ is not subject to the same
type of mutating processes that may within words. The /t/ appears to be ‘final’ in the Polish
example, while in English it is resyllabified and assimilates to the /j/. Although /t#j/ is a
‘similar’ sequence in segmental terms, there is a seemingly systematic difference in its
realization in Polish and English. In Polish /t#j/ there is a boundary, while in English there is
not. The question that remains is if there is any way to predict this state of affairs.
We suggest that the appearance of prosodic boundaries between segments has
phonological origins that may be derived within the Onset Prominence representational
environment (OP; Schwartz 2010a, 2010b, 2012, 2013a). The OP framework is a new theory
of phonological representation that encodes manner of articulation as a structural property
(cf. Aperture Theory, Steriade 1993), yielding representations in which segments and
syllables are constructed from the same materials, rather than being linked to a skeleton by
means of association lines. Prosodic boundaries arise when two adjacent segmental structures
may not be joined according to the basic constituent-forming mechanisms of the framework.
With regard to the /t#j/ example discussed above, sandhi palatalization in Polish is blocked
by a process of promotion, a kind of fortition that raises the glide to the highest level of the
OP hierarchy normally occupied by stop closure. A similar mechanism governs the
realization of word-initial vowels in Polish, which show a tendency for glottal marking that
blocks sandhi processes linking C#V and V#V sequences. By contrast, English is
characterized by a recursive submersion process, which is responsible for linking processes
that wipe out boundaries before vowel-initial words (find out ~ fine doubt; see [ʲ]everything).
The process of submersion in English also governs the realization of word-final stop
consonants, which are frequently produced without an audible release.
This paper will present phonetic data from Polish learners of English investigating
two types of boundary phenomena, which at first glance appear to be unrelated. In Section 2,
we will examine the relations between glottalization and linking of word-initial vowels in
3
both C#V and V#V contexts. Our results suggest that vowel glottalization is a form of L2
interference that hinders the acquisition of sandhi linking processes in English. The C#V
data are supplemented by additional measures of consonant voicing to explore the connection
between sandhi linking and the acquisition of ‘final’ voiced obstruents, a well-known
problem for L1 Polish learners. The V#V data include the results of an accent rating test in
which L1 English listeners judged glottalized and non-glottalized hiatus tokens on a scale of
foreign accentedness. In Section 3 we look at the production of stop release of the word-final
consonant in VC#C sequences. Our data indicate that the acquisition of target language
unreleased stops is connected with the acquisition of robust VC formant transitions
suggestive of greater articulatory coordination between the vowel and consonant. Thus, in
both of these empirical areas, native-like production in English entails a tendency for greater
phonetic cohesion between segments. These patterns are predicted by the representational
mechanisms of the OP framework, which we review in Section 4. Finally, in Section 5 we
discuss implications for the oft-invoked notion of cross-language phonological similarity.
2 Vowel-initial words in Polish English
This section will summarize data related to Polish learners’ acquisition of linking processes
involving vowel-initial words in English. In 2.1 we review previous work to provide a
comparison of the patterns of initial vowel realization in L1 English and Polish. An acoustic
study of C#V sequences is described in 2.2. In 2.3 we present a related study of V#V
sequences. Section 2.4 describes an accent rating study in which L1 English listeners judged
linked and glottalized V#V sequences.
2.1 Previous work on vowel-initial words in English and Polish
Our study is based on a claim that Polish and English are characterized by differences in the
representation of word-initial vowels. We return to the phonological underpinnings of this
claim in Section 4. For now, we suggest that word-initial vowels in Polish are inherently
more prominent prosodic entities than their counterparts in English. This claim is compatible
with previous findings that word-initial syllables in Polish are marked by some degree of
phonetic prominence (Dogil 1999; Crosswhite 2003; Newlin-Łukowicz 2012). The
consequence for Polish word boundaries is the blockage of the type of sandhi linking
processes that are common in English. A corollary of our claim is a hypothesis that Polish
initial vowels, as more prominent entities, should be more likely to be produced with
glottalization. That is, glottalization in Polish may be the fortification of a prominent prosodic
position, whereas in English it serves merely to mark the boundaries of larger prosodic
constituents. Consequently, we should expect glottalization in Polish to be more prevalent.
The difference between Polish and English with regard to linking processes involving
word-initial vowels is reflected in traditional textbook descriptions, as well as phonological
studies. With regard to English, process linking initial vowels with the preceding segment are
described. For example, McCarthy (1993) offers a phonological analysis of a process of
hiatus glide insertion in American English (see Ed [sijed], new image [nuwɪmədʒ];
McCarthy, 1993), while Uffmann (2007) discusses linking or intrusive /r/ in non-rhotic
dialects (Uffmann, 2007). For C#V contexts, Cook (2000) as well as Cruttenden describe
processes of liaison or linking (find out ~ fine doubt).2 Flapping of /t/ at C#V word
boundaries is also frequently described (e.g. Harris 1994). In the presence of such linking
2 The terms liaison and linking are used interchangeably in this paper.
4
processes, phrase-medial vowel glottalization is uncommon in English (Dilley et al. 1996).
By contrast, descriptions of Polish phonology make no mention of any linking processes.
Rubach & Booij (1990) note that C#V sequences in Polish are not resyllabified. Meanwhile,
Dukiewicz & Sawicka (1995), as well as Gussmann (2007), suggest that glottal stops are the
norm for word-initial vowels, and may even appear word-internally.
Our claim is also reflected in English pronunciation textbooks with regard to
glottalization in English. Cruttenden (2001) states that the use of glottal stops in English is
not as common as in some other languages, e.g. German, and is limited to emphatic
utterances. Lecumberri & Maidment (2000: 64) see utterance-initial glottal stop insertion in
emphatic speech as a universal phenomenon. Wells (2008: 345) claims that glottalization is
optionally used to add emphasis to syllable-initial vowels or to avoid hiatus in neighboring
syllables (V#V).
In recent years, some experimental studies have suggested that traditional English
linking processes may be in decline. Britain & Fox found that traditional hiatus fillers [j w r]
have been replaced by glottalization in culturally and linguistically diverse areas in the South
of England (cf. Britain & Fox 2008). Meanwhile, Davidson & Erker (2014) present evidence
against ‘glide insertion’ by English speakers in New York City (see 4.2 for additional
discussion of the Davidson & Erker study). Glottalization of hiatus was often observed when
the word-initial vowel is stressed. In other cases, they found that VV and V#V sequences
differ systematically from VjV and V#jV sequences.
Experimental phonetic studies also suggest that glottalization is more prevalent in
Polish than in English. Dilley et al. (1996) found a phrase-medial glottalization rate of around
17% in a corpus American English radio speech. By contrast, Malisz et al. (2013) found a
phrase–medial glottalization rate of just under 35%, about twice that of the rate found by
Dilley et al. (1996) in American English. Schwartz (2013b) found an even higher rate for
phrase-medial V#V tokens in a sentence reading task. The Schwartz (2013b) study also
examined spectral balance of hiatus in V#V contexts. Hiatus tokens without visible
glottalization were compared with initial-syllable CV tokens. The results revealed that the
hiatus vowels were produced with a higher spectral balance. This was true both overall and
across the V2 contexts. Raised spectral balance has been identified as a cue to word stress in
a number of languages (e.g. Sluijter & van Heuven 1996; Crosswhite 2003; Plag et al. 2011),
so this finding points to the prosodic prominence of Polish word-initial vowels even when no
glottalization is visible on waveform and spectrogram displays. In American English,
Garellek (2012) has found no spectral balance effect in phrase-medial postion; it was limited
to phrase-initial position. In sum, the common thread running through textbook descriptions
and phonetic studies is that Polish initial vowels are susceptible to glottalization, which
blocks the application of other linking processes characterized by modal voice quality.
Transferring these patterns to the domain of SLA allows for the formulation of a
hypothesis that glottalization of initial vowels in the speech of Polish learners of English may
be a manifestation of L1 phonological interference that hinders the acquisition of target
language sandhi processes. In what follows we shall present acoustic phonetic studies
investigating this possibility.
2.2 Production of C#V sequences by Polish learners of English
This section presents an acoustic study of C#V sequences produced by Polish learners of
English (for a related study, see Schwartz et al. 2014). The aim is to measure the degree of
acquisition of sandhi linking processes. In addition, we present acoustic measures of voicing
of the word-final obstruent in these sequences. Since word-final devoicing is a typical feature
5
of Polish-accented English, and linking results in a loss of ‘final’ status, we should expect to
find a relationship between the acquisition of final voicing and sandhi linking.
2.2.1 Participants
Our analysis is based on recordings of twelve Polish students majoring in English studies at
the University of Silesia (Uniwersytet Śląski; UŚ), Sosnowiec and Adam Mickiewicz
University (Uniwersytet im. Adama Mickiewicza; UAM), Poznań. Six of the twelve subjects
were first year students, while the other eight were in higher years. In the first year of English
studies in Polish universities, students receive intensive pronunciation instruction (2*90
minutes/week). In the second year, this instruction is reduced to one meeting a week, while
students in the third year and above no longer receive explicit instruction in pronunciation.
The division into first/higher years was intended to group the students according to whether
or not they had completed the intensive first year instruction. The first year group had
completed one semester of instruction at the time of the recordings. The higher years group
had completed 3 or 4 semesters.3 None of the participants had spent significant time in an
English-speaking country. All of the participants had received passing grades in their English
pronunciation courses.
2.2.2 Materials and procedure
Data were elicited from a sentence list containing 35 English C#V sequences (see Appendix
1) in which the consonant was a voiced obstruent. The list also contained fillers, as well as
tokens used in a separate experiment. The recordings were made in soundproof recording
studios in the English departments at each of the two universities.4 The studios were
equipped with a computer monitor upon which the stimulus data were presented. First,
subjects were instructed to read the sentence presented on a slide in a Power Point
presentation. Afterwards, the presentation of the slide was accompanied by an audio
recording of one of five native speakers of English producing the stimulus with a linked C#V
production. Students were instructed to imitate the model voice. The acoustic analysis was
made on the tokens produced in the imitation task. A total of 420 tokens were collected for
analysis, of which 3 were excluded for technical reasons.
The imitation task had two primary goals. The first goal was to exert control over the
analyzed data. Glottal marking in L1 English is frequently dependent on prosodic position
(Dilley et al. 1996), which is difficult to control for in the elicitation of L2 speech. By
repeating after L1 speakers, the participants’ task was to embed the vowel-initial word in the
same prosodic context. By the same token, the task also allows us to factor out other possible
influences on glottalization, such as word frequency or the function/content word distinction.
Secondly, imitation tasks have been found to elicit phonetic convergence (Goldinger 1997;
Shockley et al. 2004; Pardo et al. 2012; for L2 studies see Rojczyk 2012, 2013; and Rojczyk
et al. 2013), by which speakers approximate non-contrastive phonetic details. As such, we
might expect the task to neutralize proficiency-based differences in non-contrastive features
3 Polish dialect background was not taken into account in our analysis. Eight of the twelve students were raised
in either the Upper Silesia or Great Poland provinces, which have been described as regions that feature a sandhi
voicing process before word-initial vowels and sonorants (see e.g. Dejna 1973). According to our auditory
impressions of these subjects’ Polish, sandhi voicing was largely absent. If, for our subjects, sandhi voicing was
indeed a factor, the effect would be to produce more voicing in final obstruents. The results to be presented
presently suggest that this is unlikely. 4 The recording studio at UAM is equipped with an Edirol UA-25 USB audio interface, while the UŚ studio
features a Sound Devices USBPre2 interface. Both devices allow for high-quality recordings directly onto a
laptop computer. At both univsities, recordings were made at a sampling rate of 48 kHz, 24-bit quantization.
6
such as vowel glottalization. That is, we might expect both groups of students to produce
similar non-glottalized tokens after the model voice. By contrast, if advanced learners
produce less glottalization in the imitation task, we might safely attribute glottalization to L1
phonological interference. In other words, proficiency-based differences that survive the
imitation task may be claimed to reflect the level of phonological acquisition.
2.2.3 Acoustic analysis
Acoustic analysis was performed manually with the help of the Praat program (Boersma &
Weenink 2011). The analysis focused on voicing parameters associated with the final
consonant, as well as the realization of the vowel. With respect to voicing, the following
measurements were made.
Duration of the preceding vowel (in milliseconds). The vowel duration measurements
included all post-vocalic sonorant consonants in which robust formant structure could
be identified.
Duration of final obstruent (in milliseconds); These measurements included both stop
closure and burst noise in the case of plosives and affricates, and frication noise in the
case of fricatives and affricates. In other words, a single measure of consonant
duration combined all aspects of obstruent articulation.
Duration of periodicity (vocal fold vibration) during the obstruent articulation (in
milliseconds).
From these measurements, voicing of the consonant was quantified according to two
calculated variables.
V/C ratio: duration of the preceding vowel divided by duration of the consonant. V/C
ratios are higher when the consonant is voiced. This effect is greater in English than in
other languages (Chen 1970), and is crucial for L1 perception. (Port & Dalby 1982)
%Voiced: the duration of the voiced period of the consonant divided by the total
consonant duration multiplied by 100. This measure allows us to describe how much
of a given consonant was in fact voiced.
With regard to the realization of the vowel in the C#V sequences, the goal of the
acoustic examination was to determine the presence or absence of glottalization. When no
glottalization or pause was visible on the vowel, the tokens were coded as Linked or Liaised.
An example of an L-annotated token is given in Figure 1, which contains a token of the
sequence showed everyone. The closure period of the stop, which is fully voiced, is selected
in the display.
7
Figure 1 – Example of token coded as Liaison; showed everyone
With regard to the glottalized tokens, we observed variable realization of glottal events (cf.
Redi & Shattuck-Hufnagel 2001). In Figure 2 we see a full glottal stop on the word-initial
vowel in showed everyone. Figure 3 shows non-modal phonation over the first 40-50 ms
after the release of the stop on the initial vowel of the name Alice, taken from the sentence,
I’m afraid Alice will be late.
Figure 2 – Glottalized with full glottal stop; showed everyone
8
Figure 3 – Glottalized token without full glottal stop; afraid Alice
Three of the four authors of the present paper were involved in annotating the acoustic
data. With regard to the calculated voicing parameters, a one way ANOVA revealed no
significant differences in either V/C ratio (F[2,315] = 0.98; p=.379) or %Voiced (F[2, 315]
=2.17; p=.120) in the measures of the three annotators. Some inter-annotator differences were
found in the durational measures from which the voicing parameters were calculated,
including vowel duration (F[2, 315]=4.93; p=.009) and voicing duration (F[2,315] = 5.72;
p=.005), but not consonant duration (F[2,315] = 0.81; p=.449). In particular, one annotator’s
vowel duration and voicing duration measures were shorter than those of the other annotators.
Post-hoc Bonferroni tests revealed that one of three pairwise differences was significant for
vowel duration, and two of three were significant for voicing duration. Nevertheless, these
differences were not reflected in the calculated voicing parameters.
With regard to the coding of the initial vowels, the calibration analyses revealed a
high level of agreement in the case of tokens coded as liaised (unanimous agreement in
88.5% of cases; Fleiss’ Kappa = 0.72 for two categories, liaised vs. glottalized). In some
cases glottalization was visible on the vowel, but not immediately. These tokens appeared to
contain an intrusive vowel, in which a short vocoid is visible between the final consonant and
the glottal event. We interpret intrusive vowels as an attempt to produce a voiced obstruent
before a glottalized vowel (see Schwartz et al 2014). For the present analysis, all of these
were coded simply as ‘glottalized’ to avoid annotation discrepancies.
2.2.4 Results
The first set of results we will present shows glottalization/linking rates as a function of
learner group. The first year group produced glottalized vowels in just under 60% of the
tokens, and linked C#V sequences in just over 40% of the tokens. The advanced group
showed the opposite pattern, producing less glottalization (37.7%) and more linked C#V
sequences (62.3%). A binary logistic regression analysis with linking as the dependent
variable and learner group (Advanced) as a predictor variable showed a significant effect of
learner group (B=.889; p<.001). The linking/glottalization results are summarized graphically
in Figure 4.
9
Figure 4 – C#V context glottalization rates by learner group
The results for the voicing parameters as a function of learner group are shown in Figures 5
and 6. Figure 5 shows the mean values for %Voiced (47.7 for First Year, 81.0 for Advnced).
A one-way ANOVA revealed that this difference was significant (F[1,415]=149, p<.001).
Figure 5 - %voiced measure by learner group. Error bars denote 95% confidence interval
In Figure 6 we see the mean values for V/C duration as a function of learner group. The First
Year group had an average V/C ratio of 1.96, compared to 3.28 for the Advanced group
(F[1,415]=72.6, p<.001)
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
First Year Advanced
Linked C#V
Glottalized
10
Figure 6 – V/C ratio by learner group. Error bars denote 95% confidence interval
The next set of results concerns the link between the realization of the word-initial vowel and
the voicing of the preceding consonant. For both the %Voiced and V/C ratio measures, one-
way ANOVAs revealed that linked tokens showed a higher degree of voicing than glottalized
tokens. The mean results for %Voiced are summarized in Figure 7. The results for V/C ratio
are shown in Figure 8. Liaised tokens were voiced on average through 75.2% of their
duration, while voicing in the glottalized tokens averaged 52.3% (F[1,415]=56.7, p<.001).
The results for V/C ratio as a function of glottalization/linking are given in Figure 8. Linked
tokens showed a V/C ratio of 3.01, while the preceding vowel in glottalized tokens was on
average 2.2 times longer than the consonant (F[1,415]=24.6, p<.001).
Figure 7 - %voiced measures by vowel realization. Error bars denote 95% confidence interval
11
Figure 8 – V/C ration as a function of linking. Error bars denote 95% confidence interval.
Finally, we checked the extent to which Linking was correlated with voicing in the
productions of the individual participants. Spearman’s rho was calculated between the
percentage of linked items produced by each individual and each speaker’s mean values for
the voicing parameters. In both cases, the correlation coefficient was significant (ρ=.627,
p=.029 for V/C ratio; ρ=.666, p=.018 for %voiced).
2.2.5. Discussion
Our findings may be summarized as follows. In the production of both linking and final
obstruent voicing, Advanced learners produced more native-like tokens than the First Year
group. In an imitation task in which some degree of phonetic convergence is to be expected,
remaining differences between learner groups may be attributed to phonological factors.
Since both linking and final voicing revealed differences between groups, we suggest that
glottalization, which blocks linking, must be a form of phonological interference. We will
return to the phonological discussion in Sections 4 and 5.
A phonological interpretation is also suggested by the robust relationship between
linking and final voicing. The devoicing of final obstruents is a phonetically motivated
process. Linking in L1 English serves to alter the prosodic position of the voiced obstruent to
facilitate voicing. Since final devoicing is a notorious problem for Polish learners, we suggest
that learners who receive explicit instruction in C#V linking will be better equipped to
overcome problems with final voiced obstruents. A classroom-based study is underway to
test this prediction.
2.3. V#V production study
A related study was carried out with learners productions of V#V sequence. The participants
and data-gathering procedure were the same as in the C#V study described above, while the
stimulus materials are given in the appendix. The stimuli included 26 V#V sequences (see
Appendix 2). A total of 312 tokens were collected for analysis. As in the C#V study, an
imitation task was employed with an eye to controlling for effects of prosodic position and
other factors.
2.3.1. Acoustic analysis
12
In the vowel hiatus study, tokens were toked as either Modally voiced, suggesting linking of
the two vowels, or Glottalized. Again, we did not distinguish between full glottal stops and
glottalized tokens (see Schwartz et al. 2013 for a more detailed look at the realization of
glottalized tokens). Again, 3 of 4 authors were involved in the annotation of glottalization.
Calibration analyses revealed a high level of inter-annotator agreement (90.5%, Fleiss’ Kappa
= 0.78 for two categories, Modal vs. Glottalized).
2.3.2. Results
As with the C#V study, the first year group showed more glottalization than the advanced
group. A binary logistic regression analysis with Modal voice as the dependent variable and
Learner Group as the predictor variable revealed a significant effect of Learner Group on the
likelihood of linking; the Advanced group was more likely to produce linked tokens with a
modal voice quality (B=.784, p=.003). These results are shown graphically in Figure 9. The
advanced group produced linked vowels in 81% of the tokens, while the first year group
linked only 66% of the V#V tokens.
Figure 9 – V#V glottalization rates
2.3.3. Discussion
As in the case of the C#V study, the imitation task performed by the students did not
neutralize learner group differences in hiatus linking. More advanced students produced
more linking and less glottalization. One potentially significant observation, however, is that
the hiatus context appeared to be more conducive to linking than the C#V context. Overall,
74% of the hiatus tokens were produced without glottalization, while 51% of the C#V tokens
were linked. This may be attributable to the presence of word-internal vowel hiatus in Polish,
which is typically produced without glottalization. In addition, it is worth noting one aspect
of the differences between the C#V and V#V studies. In particular, the C#V data revealed a
close link between linking and the suppression of L1 final devoicing, which is an
‘established’ feature of a Polish accent in English. In the case of vowel hiatus, there is no
such connection with other segmental phenomena. Thus, the hiatus data raise questions about
the degree to which the failure to suppress glottalization in V#V sequences might contribute
to the perception of a foreign accent by L1 English listeners. At this point we take up this
question.
2.4. Hiatus linking and glottalization in foreign accent ratings
20
30
40
50
60
70
80
90
first year advanced
%Modal
13
The aim of this experiment was to assess the effect of vowel hiatus realization on the
perceived accentedness of non-native speech (cf. Derwing & Munro, 1997; Munro, 2008).
To explore the relationship between these variables, an on-line listening test was carried out.
2.4.1 Participants and stimuli
Forty-two native speakers of English participated in the foreign accent rating study. The
experimental stimuli were taken from recordings obtained in the production study. Each
stimulus consisted of utterances of between three and seven words. The total number of
tokens used was 36. Out of these, 12 pairs (i.e. 24 tokens) contained non-native utterances
with vowel hiatuses. That is, each member of the pair was the same utterance, but one was
produced with a modal realization of the hiatus, while the other had a glottalized realization. The tokens were checked by two professional pronunciation teachers for additional
segmental errors. Four glottalized tokens were judged to contain additional pronunciation
errors, while three tokens with modal realization were found to exhibit additional errors.
Appendix 3 provides a summary of the utterance pairs and errors identified. Aside from the
12 pairs, the stimuli contained six additional tokens of native English utterances taken from
target recordings used for the repetition task in the production study. Finally, six other non-
native tokens without vowel hiatus were used as distractors.
2.4.2 Procedure
The survey was built using Google Forms software and were additionally customized by
hand to include audio files and additional data validation. The surveys were made available
online and distributed using their URL addresses. Cooke et al. (2013) discuss the merits and
drawbacks associated with web-based speech perception studies. We feel that the merits
outweigh the drawbacks. One advantage is the highly voluntary nature of the participants.
Another advantage is the feasibility of recruiting a large number of L1 English listeners in a
non-English speaking country, particularly those who do not have everyday experience with
Polish-accented English. Nevertheless, we are in the process of attempting to replicate this
study in an on-site experiment. The survey consisted of two parts. The first part was comprised of a short linguistic
background questionnaire that was followed by the second part that included perception
tasks. Each perception task contained a recording and a question to answer, i.e. ‘Which of the
descriptions below best describes the speaker's pronunciation?’. The task was to answer the
question by choosing one of the five available options. Listeners rated the degree of non-
native accent in the utterances (5 – clearly native; 1 – clearly foreign). All descriptions and
instructions in both surveys were delivered in English. While completing the surveys listeners could not compare two recordings to each
other. Only one question could be viewed at a time and the participants could not continue to
the next question without first answering the current one. It was also impossible to return to a
previously answered question. The participants could only rely on their own impressions
when making perception judgments. In addition, each survey attempt was unique in the sense
that the order of all perception tasks was randomized when the survey was accessed. This
helped limit the effect of question order on the responses.
As mentioned above, the experiment was conducted via the internet. Consequently,
the participants completed the tasks remotely on their own computers. The survey was
distributed among the participants using the following URL address: http://goo.gl/Hsypq7
14
Participants were instructed to complete the survey only once and to use headphones to listen
to the stimuli in order to ensure sound fidelity. All participants were unaware of the purpose
of the experiment. When participants accessed the surveys they were first asked to provide information
about their linguistic background and then they moved on to the experiment proper that
consisted of 36 perception tasks. No indication of this division was given to the participants. When completing the surveys, the participants were presented with only one question at a
time. In each question, the participants were instructed to do the following: Click on a play button and listen to a short recording of an utterance. The recordings
could be replayed if needed.
Answer the question by choosing one of the five available options numbered 1–5. The
participants could not proceed without first answering the current question. No
feedback was presented for answering a question. Click on ‘next’ to continue to the next question.
The participants continued until they answered all the questions. At this point, the ‘submit’
button became active and the listeners could click on it to send their responses. A
confirmation screen concluded the experiment.
2.4.3 Data Analysis
A total of 1512 responses were obtained in the experiment (36 tokens * 42 listeners = 1512).
The responses to filler tokens without vowel hiatuses were excluded from the analysis. The
remaining responses were used to calculate the mean values of comprehensibility and
accentedness ratings for each participant. The means were then analyzed by taking into
account the following factors: Realization (whether the hiatuses in the utterance tokens were produced with a non-
native glottal or modal realization or were produced by a native speaker, i.e. had a
native modal realization)
Other errors (whether the stimuli contained other errors identified by pronunciation
teachers that could have influences the rating) The mean comprehensibility and accentedness rating values for all stimuli types (glottal vs.
modal vs. native) were calculated and compared using a repeated measures ANOVA to
determine the effect vowel of hiatus realization (glottalized vs. modal vs. native) on
perceived accentedness of L2 English speech. In addition, the same means and statistical
analyses were obtained for tokens with and without other pronunciation errors in order to
evaluate the effects of these errors on the listeners’ judgments.
2.4.4 Results
The mean values of the accentedness ratings for all three stimuli types are presented in Figure
10. For the mean values of the accentedness rating, the lowest score was found for
glottalized tokens (M = 2.23, SD = .47), followed by modal tokens (M = 2.96, SD = .52).
Tokens produced by native speakers obtained the highest scores for accentedness (M = 4.51,
SD = .56).
15
Figure 10. Mean Accentedness ratings of glottalized, modal and native tokens.
A repeated measures ANOVA was conducted with the Accentedness scores as the dependent
variables and the hiatus realization (glottalized vs. modal vs. native) as the independent
variable. The results of the analysis are discussed below.
The analysis of the means of accentedness ratings (N = 42) showed violation of
sphericity in Mauchly’s test, χ2(2) = 31.36, p < .001. Degrees of freedom were corrected
using Greenhouse-Geisser estimates of sphericity (ε = .65). A significant effect of hiatus
realization on perceived accentedness of an utterance was found, F(1.30, 53.13) = 257.25, p
< .001. Significant differences between all pairs of token types were found in post-hoc
Bonferroni tests, p < .001. The results indicate that when L1 English listeners were asked to
assess the degree of non-native accentedness of utterances, judgments were affected by the
way vowel hiatus was realized. Specifically, the results suggest that non-native utterances
with glottalized vowel hiatuses are perceived as the most foreign by native speakers of
English, followed by non-native utterances with modal realizations. Utterances produced by
native speakers were perceived as the least foreign. Since some of the non-native tokens were judged to contain additional pronunciation
errors, we carried out an additional analysis to investigate whether the results reported above
were indeed due to vowel glottalization rather than other factors. Specifically, mean values
were obtained for non-native glottalized and modal tokens with and without other errors to
determine whether the same trends would be identified for utterances without errors and to
explore the effect of other errors on listeners’ judgments. The mean values of the perceived accentedness ratings for glottalized and modal
tokens with and without other errors are shown in Figure 11. Glottalized tokens without other
errors were given lower scores (M = 2.29, SD = .53) than modal tokens without pronunciation
errors (M = 3.13, SD = .51). When both token types contained other pronunciation errors, the
difference was smaller but still noticeable (glottalized: M = 1.92, SD = .57; modal: M = 2.35,
SD = .66).
0
1
2
3
4
5
Glottalized Modal Native
Mean AccentednessRating
16
Figure 11.Mean ratings of accentedness for tokens with and without other segmental errors
The statistical analysis involved a paired-samples T-Test carried out for accentedness
with a number of within-subject conditions. The values for tokens without other
pronunciation errors were compared in order to confirm whether the same tendencies could
be attested as for all analyzed tokens. Significant differences between glottalized (M = 2.29,
SD = .53) and modal tokens (M = 3.13, SD = .50) were found, t(41) = 13.07, p < .001. These
results match the ones reported above, confirming that glottalized utterances were perceived
as more foreign accented by native speakers even when there were no other pronunciation
errors involved.The same analyses were conducted for tokens with other pronunciation
errors. Significant differences were found between glottalized (M = 1.92, SD = .57) and
modal tokens (M =2.35, SD = .66), t(41) = 4.00, p < .001. This implies that vowel
glottalization continued to play a crucial part in in increasing the foreign accentedness of
even in comparison to modal tokens containing other foreign accent features.
The effect of the glottalization of vowel hiatus on foreign accent perception is further
evidenced when the scores for tokens without other errors and for tokens with other
pronunciation errors are compared. A paired samples T-Test demonstrated that the difference
between mean accentedness scores for glottalized tokens without other errors (M = 2.29, SD
= .53) and modal tokens with errors (M =2.35, SD = .66) was non-significant, t(41) = -.62, p
> .05. These results indicate that the glottalization of vowel hiatus, even alone, was just as
indicative of a foreign accent as the segmental errors in modal tokens.
2.4.5 Discussion
With regard to accentedness ratings, our results point to a direct effect of hiatus glottalization.
Overall glottalized tokens were rated lower than modal tokens (2.23 vs 2.96). These
differences held in tokens both with and without other segmental errors. Interestingly, the
magnitude of this difference was greater in the case of the tokens without additional errors.
Such tokens may be claimed to have isolated the effects of the glottalized hiatus realization.
Finally, in the accentedness ratings, modal tokens with other errors were judged the same or
even slightly more native-like (2.35) than glottalized tokens without other errors (2.29). This
finding suggests that L1 English listeners are particularly sensitive to target-like boundary
realization in making accentedness judgments, and that boundary effects such as hiatus
linking are indeed an aspect of L1 English phonology.
2.5. Final remarks on word-initial vowels
0
0.5
1
1.5
2
2.5
3
3.5
with other errors without other errors
Glottalized
Modal
17
The results of the production studies and the accent rating test suggest that for Polish learners
of English, the suppression of glottalization in word-initial vowels represents an important
stage of phonological acquisition, facilitating the learning of target language sandhi linking
processes. In Section 4, we shall show how these facts fall out from aspects of phonological
representations within the Onset Prominence framework. Before proceeding to the
phonological discussion, however, we now turn to data from a seemingly unrelated
phenomenon: the realization of word-final stop consonants in C#C contexts.
3 Stop release and vowel quality in VC#C sequences
This section will summarize data from VC#C sequences in the speech of Polish learners of
English (for a related study, see Schwartz et al. 2014). In particular, we look at the link
between the production of plosive release and the production of the preceding vowel. The
basic hypothesis is that Polish learners learn to suppress release bursts in L2 English, a
process that goes hand in hand with the acquisition of robust VC formant transitions
suggestive phonetic cohesion.
3.1. Stop release in cross-language phonology
Although coda stop release is not a contrastive property, certain cross-language patterns
suggest that it is more than just a phonetic detail. Stated briefly, languages seem to adopt one
of three systems. Stop release may be (1) obligatory, (2) optional, or (3) always absent. Polish
exemplifies the first pattern (with the exception of homorganic CC sequences), English
follows the second pattern, while in Korean coda stops are always unreleased. Such
systematic differences suggest that coda stop release is phonological and subject to
parametric variation. Due to its non-contrastive status however, there has been only a small
amount of work on stop release in the literature on cross-language phonological interaction.
One important study is by Kang (2003), who looked at English loanword into Korean.
She showed that the probability of epenthetic vowels following English coda stops in
loanwords into Korean is strongly correlated with the probability of stop release in the speech
of L1 speakers. When the source language stops are more likely to be released, such as when
the stop is voiced, the Korean adaptations tended to appear with an epenthetic vowel and an
additional syllable (pad /pæ.tɨ/). By contrast, the adaptations of English words with a
voiceless stops, which are more likely to be unreleased, matched with the native Korean
pattern of suppressed stop release (pack /p k /). Clearly Korean listeners hear released
stops as containing a full ‘syllable’. In the context of L1 Polish speakers learning English,
previous studies by Bergier (2010) and Rojczyk et al. (2013) have studied the production of
L2 English unreleased stops. To our knowledge, only the Schwartz et al. (2014) study has
investigated the link between the suppression of stop release and the quality of the preceding
vowel.
3.2 Participants, procedure, and materials
Data were collected from 12 Polish learners of English, divided into two groups of six along
similar criteria to the initial vowel studies described in Section 2. Data were elicited by
presenting stimulus tokens on a Power Point slide. However, this study did not involve an
imitation task – participants read what was on the slide without audio input. The students had
one ‘practice’ round before they produced the tokens used in the analysis. The linguistic
material was made up of 18 VC#C sequences with equal distribution of the primary place of
18
articulation (labial, coronal, dorsal) in the first consonant. Care was taken to eliminate
homorganic consonant sequences, which constitute the only context in which stop release in
Polish may be suppressed (Dukiewicz & Sawicka 1995). The vowel in 12 of the tokens was
/æ/, while it was /ɪ/ in the other 6 tokens. A total of 216 tokens (12 speakers * 18 examples)
was collected for analysis.
3.3. Acoustic analysis
Acoustic analysis was performed by hand with the help of the Praat program (Boersma &
Weenink 2011) and focused on both the release of the post-vocalic stop consonant and the
VC formant transitions. With regard to stop release, tokens were tagged as either Yes or No,
on the basis of whether there was both a visual spike in the waveform as well as an audible
burst. With regard to the vowels in the VC sequences, the following acoustic measurements
were made.
Overall vowel duration (including pre-vocalic /r/ and /l/) in milliseconds
Duration of VC transition in milliseconds, from the end of the steady-portion of the vowel
to the onset of stop closure.
From these measurements, one additional measure of VC formant transitions was calculated,
%VC, defined as the transition duration divided by the vowel duration multiplied by 100.
That is, the measure defines the percentage of the vowel occupied by the VC transition.
3.4. Results
The first set of results shows the rate of stop release as a function of learner group. First year
students produced an audible release burst in 90.7% of the tokens, while the advanced
produced release bursts at a rate of 63.4%.
The next set of results looked at %VC as a function of stop release across learner
groups. A one way ANOVA revealed that the VC transition occupied on average 34.5% of
the vowel duration in unreleased tokens, while in the released tokens the mean %VC was
27.3% (p<.001). This is shown in Figure 12. Another one way ANOVA revealed that the
advanced group produced a slightly higher mean value for %VC (30.2%) than the first year
group (27.8%). This difference was significant (p=.027), and is illustrated in Figure 13.
19
Figure 12 - %VC as a function of stop release. Error bars denote 95% confidence interval
Figure 13 - %VC by learner group. Error bars denote 95% confidence interval.
To evaluate the role of learner group and %VC in determining the likelihood of stop release
production, a binary logistic regression analysis was performed with release as the dependent
variable and %VC and Learner Group as predictor variables. Both predictor variables turned
out to be significant (p<.001).
3.5 Discussion
The results of the stop release study may be summed up as follows. The likelihood of stop
release in VC#C sequences is linked to the %VC parameter, the percentage of the preceding
vowel occupied by the VC transitions. If more of the vowel taken up by the transition, the
the chances that the stop may be produced without a release burst. This pattern may be
understood in the context of Lindblom’s (1990) H&H theory of speech production, which
20
posits that speaker behavior is governed by a principle of ‘sufficient discriminability’. In
other words, speakers spend the minimal effort possible to produce a comprehensible
utterance. With more more robust VC transitions, listeners may identify the final stop without
a release burst, so speakers can spare the effort required to produce the burst. The acoustic
patterns associated with the %VC parameter suggest that in unreleased tokens, there is greater
phonetic coordination between the word-final consonant and the preceding vowel. Thus, the
fact that the Advanced group produced more unreleased stops, as well as more higher %VC
measures, suggests that the acquisition process involves the learning of this phonetic
coordination. In what follows, we shall investigate the phonological considerations that
underlie the empirical patterns we have observed.
4. Boundary formation in the Onset Prominence representational environment
In this section we will review the representations and mechanisms of the OP framework, in
order to derive the boundary effects we have observed in the speech of Polish learners of
English. We shall see that OP representations offer a perspective from which the formation of
prosodic boundaries may be linked to other segmental features such as vowel glottalization
and the behavior of coda stops. A mechanism of submersion in English allows for a unified
account of the seemingly unrelated phonetic properties we have examined, as well as the
acquisition patterns on the part of Polish learners.
4.1. Preliminaries
The Onset Prominence framework is a theory of phonological representation in which
segments and prosodic constituents (such as ‘syllables’) are constructed from the same
building block, a hierarchy of phonetic events shown in (1).
(1) The Onset Prominence representational hierarchy
Each node of the structure is derived from a phonetic event associated with a stop-vowel
sequence. The highest level node is associated with stop closure (Closure), this is followed by
burst/frication/aspiration noise after the stop release (Noise). The Noise node is followed by
the Vocalic Onset (VO) node, which derives from the initial portion of vowels in a CV
sequence. VO is followed by the Vocalic Target (VT) level. The tree in (1) represents a
unversal CV ‘syllable’ in which the C is a stop. The representational split between VO and
VT encodes the built-in ambiguity of CV transitions with regard to the consonant-vowel
distinction. While phonetically vocalic, the initial portion of vowels typically carries acoustic
information crucial for the identification of the preceding consonant. Thus, the VO node may
be contained in either consonant or vowel representations (see 4.2., and Schwartz 2013a).
21
The representations of individual segments are extracted from the hierarchy in (1). A
sample of these is given in (2). The segmental symbols act as shorthand for place and
laryngeal specifications, which are for the most part beyond the scope of this paper. Manner
of articulation is represented in the structural levels contained in a given segmental tree,
creating a partial scale of sonority and consonantal strength. Binary nodes are active in a
given segment’s representation, while unary nodes act as placeholders that define hierarchical
levels. Stops contain Closure, Noise, and parametrically VO. Nasals lack an aperiodic noise
element, so the Noise node is not active in /m/. Fricatives lack complete closure, so the
Closure node is not active in /f/. Approximants such as /w/ contain the Vocalic Onset node
(VO), but are missing the top two nodes associated with obstruents. Vowels are specified
under the Vocalic Target (VT) node. One item that should be noted at this time is that
terminal nodes of the structures that are not shown with melodic specifications are not
necessarily ‘empty’. Thus, in the structure of /p/, a labial specification that is assigned at the
Closure level, ‘trickles’ down the structure to occupy the Noise and VO nodes (see Schwartz
2014).
(2) Manner distinctions in the OP environment
By considering the relationship between the building block in (1) and the individual
segmental trees in (2), we can establish motivation for representational adjustments that
produce well-formed prosodic constituents. This may be thought of as follows. The tree in (1)
is a well-formed CV ‘syllable’ against which the well-formedness of individual segmental
structures in (2) is measured. In (3) we see a string of segmental structures /t/ and /a/. Neither
of these segments contain the entire four layer hierarchy, motivating a process of absorption,
by which the lower-level /a/ is merged with the higher level /t/ to its left, resulting in a well-
formed constituent /ta/.5 The /t/ lacks an active VT node, while the /a/ lacks the higher level
nodes. By absorbing the /a/ into the /t/, well-formed CV is produced.
(3) Absorption in a /ta/ sequence
5 Reversing the order of these segments would require either the submersion or promotion mechanisms that will
be discussed in due course.
22
The structures in (3) suggest a minimality requirement for well-formed structures, given in
(4).
(4) MINIMAL CONSTITUENT (MC) – A minimal prosodic constituent contains active (binary)
nodes both above and below the VT level
The MC requirement eliminates the need for constraints such as ONSET, PEAK, and *CODA
that impose requirements or restrictions on the segmental content of ‘syllables’. The
empirical generalizations that such constraints are intended to describe may be read directly
from OP representations. Additionally, the sharing of representational materials provides a
perspective from which a theory of boundary formation and representation may be
formulated. Prosodic domains are built from combinations of segmental trees, and boundaries
fall out naturally from the mechanisms involved. Parameters associated with these
mechanisms allow us to make predictions about where boundaries will arise.
4.2 Submersion in English yields sandhi linking and optional stop release
A single parametric mechanism, submersion, active in English and absent in Polish, may
account for the apparently diverse data observed in this paper. We shall start by illustrating
the most basic form of submersion involving ‘coda’ consonants, after which we shall
motivate submersion in vowel-initial words.
In (5), we see a string of segmental structures for the English word click. By
absorption, the vowel and the lateral /l/ are joined with the initial /k/ into a single structure.
Due to its high position in the representational hierarchy, the final /k/ may not undergo
absorption into the preceding constituent. In words of this shape, English licenses a process
of submersion, which places the final /k/ underneath the constituent built down from the
initial /k/, as is shown in the right-most structure.6
(5) Submersion of final /k/ in English click
Submersion of post-vocalic consonants is parametric. It is absent in Polish. In (14) we see
Polish klik ‘click’ on the left, alongside the English click with a submerged coda.
(6) Polish klik (left) and English click
6 For an analogous proposal by which coda consonants may reflect syllabic recursion, see van der Hulst (2010).
23
On the right, the final stop in English click is submerged under the preceding constituent. On
the left the final /k/ in Polish klik is not submerged, but is joined with preceding segments at a
higher level of structure that is not shown. From the representations in (6), it is predicted that
coda stops in Polish and English should behave differently, since they are structurally
distinct. The English coda is in a lower prosodic position in the OP hierarchy. Associating
lower hierarchical levels with prosodic weakness, we should expect the English coda to be
subject to lenition processes such as the repression of stop release. In addition, since the
submerged coda stops are joined into the same constituent structure as the preceding vowel in
English, the representations capture phonetic interaction between the vowel and consonant in
VC sequences. This was seen in our phonetic data in the %VC measure. The submersion
parameter has other implications for English and Polish phonology that are largely beyond
the scope of this paper. Submerged structures are associated with prosodic weight (absent in
Polish) and lenition in VCV contexts (absent in Polish). For discussion see Schwartz
(submitted). Now, we turn to the issue of linking of vowel-initial words.
Submersion is a mechanism that repairs structures that do not satisfy the MC
requirement by joining them with the preceding structure. In this connection, a key question
in determining sandhi application in English as opposed to Polish concerns the prosodic
status of word-initial vowels. That is, do word-initial vowels satisfy MC? We claim that this
another parametric decision. Word-initial vowels in Polish contain the VO node and satisfy
MC, word-initial vowels in English do not (see Schwartz 2013a). These parameters are
shown in (7). The tree on the left representing the Polish vowel, is a well-formed structure.
The tree on the right, representing the English vowel, is not. The result of these parameters is
that vowel-initial words in English are subject to sandhi linking processes while in Polish
they are not.
(7) VO parameters for vowels
Sandhi linking with initial vowels in English is due to submersion. Consider the English
phrase keep out. The /p/ is typically linked to the following vowel. This is represented by the
24
submersion of the structure for out, shown in (8). Since out does not contain active nodes
above the VT level, it does not meet the MC requirement. The sub-minimal out is therefore
submerged below the structure of keep, producing a well-formed constituent that we see on
the right. Analogous structures in Polish contain VO and are not subject to submersion.
(8) Submersion in keep out
Notice that the /p/ in keep has already undergone submersion, and is housed under the VT
level. If it had not, the /p/ would be built down from the Closure level, and the context for
absorption would be created. Absorption would mean that the final /p/ would become initial.
This, however, cannot be the case for English, since the /p/ in keep out is not aspirated – keep
out is clearly distinct from key pout. The linking that occurs in English is not the same
process as enchaînement or liaison that is found in French, which we would represent with
absorption. Thus, the /p/ in keep must already be submerged.
We have seen that submersion, a process absent in Polish, is responsible for sandhi-
linking in vowel-initial words in English. At this point we must consider more closely the
links between the OP representations and the phonetic data that we and others have observed.
The first issue to be discussed concerns the status and frequency of vowel glottalization, after
which we shall turn to the issue of linking in V#V sequences.
Although we have observed a tendency for more frequent and robust vowel
glottalization in Polish than in English, we should not expect an absolute opposition between
the two languages. That is, not all Polish initial vowels are glottalized, and some English
initial vowels are. In OP structures, glottalization may be represented as the activation of the
closure node, and the addition of some kind of specification denoting properties of non-modal
phonation. This is shown in (9). The tree on the left is a glottalized vowel of the type in
Polish containing VO, while on the right we see the a glottalized vowel without VO. Note
that both of these structures satisfy MC – they are both well-formed constituents. The
difference is that in Polish, glottalization fortifies a structure that is already well-formed. In
English, Closure activation repairs an ill-formed structure to denote higher-level constituent
edges. Thus, glottalization in the two languages has a different status that contributes to the
observed tendencies. In English it is a phrase-level phenomenon, while in Polish it may reach
further down into the realm of segmental representation. However, these differences do not
imply that the opposition between the two languages is absolute.
(9) Glottalization in VO-specified Polish vowel (left) and English vowel (right)
25
In the case of word-boundary vowel hiatus in English, OP representations offer a
useful perspective from which linking glides are distinct from lexical glides (cf. Levi 2008).
This is shown in (10). The two left-most structures show a sequence of two vowels in which
the second is subject to submersion. The second structure from the right contains the vowel
sequence, which may have produced the percept of a glide in previous descriptions. The
right-most structure contains a ‘lexical’ glide. English pronunciation textbooks have noted a
distinction between lexical and linking glides. Here that distinction is captured.
(10) Submersion in /ua/ sequence, as well as lexical /w/
Capturing the contrast between linking and lexical glides can aid phonological interpretation
of boundary phenomena. Consider Davidson & Erker (2014), who provide evidence against
‘glide insertion’ at hiatus sequences, showing that hiatus glides are distinct from lexical
glides. As an alternative, they suggest that glottalization is the preferred hiatus filler in
American English, as suggested by a relatively high glottalization rate (45%) in their data.
However, their glottalization rate may have been skewed by the fact that 18 of 24 of their
stimulus items contained stressed word-initial vowels. With the representations advocated
here, we do not need glottalization as an alternative to ‘glide insertion’. Non-glottalized
hiatus is captured, and is distinct from lexical glides. That is, we can argue against glide
insertion, as Davidson & Erker successfully do, without making the controversial claim that
glottalization is the preferred hiatus filler.
5 OP and similarity in L2 phonology
In our discussion so far, we have seen how the representations of the OP environment may
offer tools for predicting cross-language differences in boundary effects. At this point we turn
to the wider implications of the OP representational environment for the study of second
26
language speech. Our ultimate goal is to offer new perspectives on the oft-invoked principle
of similarity in L2 speech learning. As an introduction to this discussion, it is worth
considering the OP perspective on what is to our knowledge the only serious proposal
regarding interlanguage boundary effects: Cebrian’s (2000) Word Intergrity constraint.
5.1 Word Integrity vs. L1 interference.
Our studies have noted a significant degree of glottalization of word-initial vowels in the
speech Polish learners of English. We suggested that OP representational parameters may
account for these patterns. As an alternative to our representational hypothesis, the observed
glottalization in the speech of Polish learners of English may be claimed as evidence for a
more general Word Integrity constraint in the speech of L2 learners (Cebrian 2000). To
address this question, we will look at two previous studies of L2 learners by Cebrian (2000)
and Lleo & Vogel (2004). In both cases, there is evidence that learners suppress L1 processes
that weaken prosodic boundaries. However, it is not obvious that these results support the
proposed WI constraint. In the first case there is a simpler alternative explanation. In the
second case, learners’ failure to acquire a target language boundary strengthening process
casts doubt on the WI hypothesis (see Zsiga 2011 for additional doubts about the WI
proposal).
Cebrian’s proposal of Word Integrity was borne out of a discrepancy in the transfer of
L1 Catalan phonological processes into L2 English. Catalan is characterized by two processes
that neutralize laryngeal contrasts: final obstruent devoicing and voicing assimilation at word
boundaries. Examples are given in (11).
(11) Word-final devoicing and voicing assimilation in Catalan (Cebrian 2000)
vas ‘glass’
vazos ‘glasses’
vas petit ‘small glass’
vaz gran ‘big glass’
Final devoicing is a domain-internal process. Voicing assimilation is a sandhi process that
spans prosodic constituents. In the speech of Catalan learners of English, Cebrian found that
devoicing is widespread, but the assimilation process is not. It thus appears as though the
sandhi process is not a candidate for L1 interference, leading Cebrian (2000: 19) to claim the
existence of “an interlanguage prosodic constraint that treats every word as a separate unit
and prevents the synchronization of sounds belonging to different words”. On the face of it,
this would appear to be quite a strong claim. However, there is an alternative explanation for
Cebrian’s data.
The relatively low rates of transfer for voicing assimilation might reflect learners’
successful acquisition of short-lag VOT in the target language /b d g/, and the suppression of
the pre-voicing that is found in the L1 realizations of these stops. In other words, pre-voicing
facilitates regressive voicing assimilation, so the absence of this process suggests that
learners may have acquired the laryngeal qualities of the L2 stops, instead of transferring
their pre-voiced L1 stops. The final devoicing process remains however, since the acquisition
of VOT associated with the initial stops does not necessarily imply the acquisition of distinct
cues relevant for final stops. This is particularly true for English, in which final voiced
obstruents are often partially devoiced, yet remain distinct from the underlying voiceless
obstruents due to their duration relative to the preceding vowel (e.g. Port & Dalby 1982). In
short, the apparent discrepancy in transfer between L1 final devoicing and L1 voicing
27
assimilation may be attributable to the realization of laryngeal contrasts, and does not
necessitate any universal claims of Word Integrity.
In a study with more direct bearing on our work on initial glottalization, LLeo &
Vogel (2004) investigated the acquisition of L2 German by Spanish L1 speakers. Their
results were somewhat ambiguous with regard to the Word Integrity proposal. The focus of
their analysis included two phonological processes with implications for WI. One of these
was a sandhi spirantization process in L1 Spanish, by which word-initial voiced stops are
realized as fricatives between vowels. Another process they looked at was harter Einsatz or
‘glottal stop insertion’ on initial vowels in L2 German (e.g. Wiese 1996). Suppression of
spirantization accompanied by the acquisition of glottalization, both resulting in stronger
word-initial segments, would provide strong support for the WI hypothesis. Lleo & Vogel’s
analysis revealed conflicting results. The learners in their study managed for the most part to
suppress the spirantization process (around 80%), yet their acquisition of harter Einsatz was
less successful (under 50%). A universal WI constraint would predict that Spanish learners
should have little trouble acquiring harter Einsatz, which serves to enhance Word Integrity.
In accordance with the representational perspective advocated here, we postulate that
Word Integrity cannot be a uniform constraint that exerts equal force across all L1
backgrounds. Rather, we should expect differing degrees of Word Integrity as a function of
L1. The relative frequency of vowel glottalization in various L1s may therefore act as a
predictor of greater Word Integrity. To investigate this, we are engaged in a comparison of
L1 German and English speakers learning Polish as an L2. If English and German speakers
produce L2 Polish initial vowels with comparable rates of glottalization, we may conclude
that an L2 WI constraint may override the L1 English tendency for linking at word
boundaries.
In the present study, we looked at the vowel glottalization by Polish learners in L2
English, exploring the hypothesis that it is a form of L1 interference. Since the glottalization
of vowels has received only limited attention in the literature on L2 speech, it is worth
considering the phenomenon from the perspective of current models of second language
phonology. In this connection, we must consider the concept of cross-language similarity.
5.2 Vowel glottalization and cross-linguistic similarity
A comparison of the acquisition of harter Einsatz and the suppression of spirantization in
Lleo & Vogel’s study of Spanish learners of German may serve as an illustration of the
problematic concept of ‘similarity’ that is employed by current models of L2 speech. With
regard to both phonological and phonetic parameters, it is not obvious if either of these L2
phenomena is more similar to the corresponding structure in L1. Characterizing harter
Einsatz as ‘glottal stop insertion’, the appearance of a new segment, implies that this process
in L2 should be less similar to L1 than the suppression of spirantization. Under this view, the
former process produces a new segment [ʔ], while the latter simply adjusts the manner
specification of an existing segment. If [ʔu] is taken as a postional allophone of /u/, then the
two processes would have similar phonological status - both the glottalized vowel and the
unspirantized stop are variants of L1 phonemes appearing in a new prosodic context.
Alternatively, the relative infrequency of phonemic glottal stops across languages suggests
that the glottalized vowel should be more similar, since it involves the addition of a feature
that is non-contrastive in most languages. Thus, phonologically, one might argue for any of
three different interpretations with regard to the cross-language similarity of the two
processes.
In phonetic terms, it is equally difficult to gauge the relative similarity of the two
processes. Assuming equivalence in F1-F2 vowel space, a glottalized and non-glottalized
28
vowel should have similar formant frequencies, so they might be classified as similar. Yet the
voice source characteristics that distinguish them contribute to differences in perceived
loudness and have been identified as a cue to prominence in many languages (Sluijter & van
Heuven 1996; Kreimann & Gerratt 2010) and may be assumed to be perceptually robust.
Spirantization eliminates stop closure, an important acoustic landmark in the speech signal
(see Stevens 2002), yet stops produced with incomplete closure (e.g. Crystal & House 1988)
are typically perceived as stops, suggesting that listeners reconstruct the closure (cf. Ohala
1981). Thus, from the phonetic point of view, one could argue that either of the two target
language phenomena are more similar. In sum, defining similarity often requires weighing
the effects of conflicting phonetic and phonological parameters.
From the perspective afforded by the Onset Prominence framework, we gain new
tools for comparing segmental and prosodic representations across languages, and explicit
new parameters for the definition of similarity. Consider the trees in (12), which shows OP
representations for the German stop and glottalized vowel (first and third trees from the left),
alongside L1 Spanish representations of the spirantized stop (2nd
from the left) and the
unglottalized vowel (far right). The glottalized vowel (2nd
tree from the right) is represented
as a VO-specified vocalic structure with additional glottal specification an active Closure
node. The spirantized stop (2nd
tree from the left) is characterized by a deactivated (unary)
Closure node.
(12) German stop (left) vs Spanish spiranatized stop; German initial vowel vs. Spanish vowel
The structures in (12) suggest that harter Einsatz is less similar to the corresponding
L1 structure than the supression of spirantization. The dissimilarity is based on structural
differences among the pairs of corresponding sounds. For L1 speakers of Spanish, vowel
glottalization represents a more dramatic prosodic change, with two additional active layers
of structure, than the suppression of spirantization. Moreover, the structural change is
enhanced by an additional glottal specification on the VO node. By contrast, the structural
change involved in the repression of spirantization is less dramatic, involving only the
activation of the Closure node without any alteration to the underlying levels of the hierarchy.
Thus, the glottalized vowel is further from the L1 structure since its representational changes
entails both greater structural displacement and an additional melodic specification.
One of the hypotheses of Flege’s Speech Learning Model predicts that sounds that are
similar across languages may be subject to equivalence classification that prevents the
formation of a new perceptual category by L2 learners. Flege (1987) observed compromise
values in the speech of English-French bilinguals both in both vowel quality and VOT
parameters, which he attributed to equivalence classification. If harter Einsatz produces a
sound that is less similar to the corresponding L1 structure than the suppression of
29
spirantization, Lleo & Vogel’s results for Spanish learners of German would appear to be at
odds with the SLM. Our representations suggest that equivalence classification should be
more likely in the case of the non-spirantized stop, which is more similar to the L1 structure.
Consequently, the SLM might predict that L1 Spanish learners of German should have less
success in the suppression of spirantization. This is, of course, in opposition to Lleo &
Vogel’s findings.
An examination of the features that have been found to undergo equivalence
classification may point to a possible reconciliation of these apparently conflicting
predictions. Vowel quality, VOT, and the contrast between /l/ and /r/ constitute the phonetic
focus of a majority of the documented cases of equivalence classification. In the OP
environment, these properties differ crucially from spirantization and vowel glottalization
shown in (11) in that they are based solely on melodic specifications rather than structural
properties. Thus, we propose a modification to the SLM hypothesis by which equivalence
classification should impede the acquisition of melodically similar, rather than structurally
similar sounds. We suggest that parameters such as formant frequencies and VOT are by
nature more gradient than those associated with manner, and should be more likely to show
compromise values predicted by equivalence classification.7
5.3 Coda behavior and interlanguage epenthesis
In Section 3 we presented data on stop release in Polish English to suggest that the
acquisition of unreleased coda stops, as represented by submersion, is an integral part of the
learning process. The representations discussed in Section 4 provide a perspective from
which ‘coda’ consonants come in two varieties: submerged and non-submerged. Unreleased
coda stops are associated with submerged structures. Unsubmerged structures are always
released. These differences are represented prosodically in the OP environment, yet concern
the behavior of individual segments. That is, OP representations can describe the relationship
between segmental and prosodic representation in a way that traditional structures cannot. In
what follows, we shall examine the implications of this perspective, with particular attention
to Korean speakers’ treatment of coda consonants both in their L1 and in L2 English. First,
we shall consider perceptual epenthesis, by which Korean listeners hear vowels after English
coda stops (De Jong & Park 2012). We shall then consider Korean phonotactic constraints
with respect to contrast neutralization.
De Jong & Park (2012) studied perceptual epenthesis by Korean learners of English.
Their stated goal was to test the predictions of two explanations for epenthesis. According to
the first, which attributes epenthesis to ‘functional reparsing’, the process is motivated by the
need to license consonant manner and laryngeal contrasts in a new prosodic position.8 That is,
it serves to ease the task of identifying the consonant in question. In the other model, under
the heading of ‘perceptual misanalysis’, epenthesis is the reinterpretation of consonant release
as the ‘onset’ to an additional syllable, whose ‘nucleus’ is filled by an epenthetic vowel. De
7 Recent research has provided evidence that bidirectional phonetic drift attributable to equivalence
classification may be due to language context (Grosjean 1998). Antoniou et al. conducted two studies in which
they examined VOT values for early bilingual speakers of Greek and Australian English. The first experiment
(Antoniou et al 2010) was conducted in monolingual mode in the two languages, and monolingual-like
realization was observed in both. In a second experiment (Antoniou et al 2011), phonetic code switching was
induced by manipulating the alphabet used in the eliciting the stimulus, and L1 interference on L2 was observed,
but L1 was largely unaffected.
8 Since Korean features unreleased stops in which place is encoded in coda positions, it might be expected that
Korean listeners should be quite accurate at identifying the place of articulation of coda consonants.
30
Jong and Park note that authors who have argued for these models often claim that segment-
based and syllable-based explanations for epenthesis are incompatible. To test the two models De Jong & Park (2012) carried out a perceptual experiment in
which L1 Korean listeners engaged in two tasks with stimuli from American English: a
syllable counting task and a segment identification task. Their predictions with regard to the
aforementioned models were as follows. A negative relationship between segment
identification and syllable counting would provide support for the contrast-based model. That
is, more accurate segment identification comes at the cost of less accurate prosodic parsing,
since the additional syllable is claimed to aid in identification of the newly reparsed
consonant. No correlation between the two tasks would support the syllable-based model.
Finally, they posit that a positive relationship between the two tasks would support an
alternative model in which segmental identification and prosodic parsing are two aspects of
the same perceptual process. Their findings showed a positive correlation between syllable
counting and segmental features across individual subjects. Learners who were more accurate
in the syllable counting task (i.e. less likely to hear epenthetic vowels) tended to be more
accurate in identifying segmental features. Thus, it appears that neither of the two tested
models was supported. Rather, as they point out, ‘the present results suggest an overall model
. . . wherein listeners jointly interpret the details of segments and the syllabic position from an
integrated percept (De Jong & Park, 2012: 150)’.
The integrated percept proposed by De Jong & Park is captured in the Onset
Prominence representational environment, in which segmental and syllabic structures are
constructed from the same hierarchy. Crucially, manner of articulation is a prosodic
specification as well as a segmental feature. Instead of a segmental specification attaching to
prosodic structure by means of association lines, manner is prosodic structure. Thus, any
percept of manner is inevitably integrated with prosody, as De Jong & Park propose. It is also
worth noting that manner accuracy showed the most robust correlation with syllable counting
accuracy in De Jong & Park’s experiment (2012: 145, Figure 4).
Epenthesis in the adaptation of English codas into Korean has been classified as a case
of ‘unnecessary repair’ (Kang 2011), since L1 Korean does allow coda stops. However,
Korean codas are restricted in two important ways. First, the suppression of stop release is
obligatory. In addition, manner and laryngeal contrasts are neutralized. To provide some
perspective on Korean coda restrictions constraints, OP representations for the three Korean
labial stops are proposed in (9). Crucially, the framework allows for the possibility that
different melodic specifications may be housed at different levels of the OP hierarchy in
accordance with their phonetic realization. Place is specified as a [labial] annotation on the
Closure node. This is to be expected since it is the location of the closure that defines stop
place of articulation. By contrast, laryngeal features, whose phonetic realization may be
impeded by stop closure, may be assigned at lower levels. This is shown in (13). Aspiration,
which is of course associated with aperiodic noise, is shown as a [spread glottis] specification
on the Noise node. Tenseness is represented as a [constricted glottis] ([cg]) annotation on the
VO node. This feature is associated with a stiffer voice quality on the onset of the following
vowel (e.g. Ladefoged & Maddieson 1996), so the VO node is a natural structural position for
[cg] specifications. With these representations in mind, we may turn to the representation of
codas in Korean, which are characterized by the presence of place contrasts, but neutralized
manner and laryngeal contrasts.
(13) Proposed representations of Korean /p/, /pʰ/, and /p*/
31
In the OP environment, Korean codas may be attributed to submersion, producing the
structure in (14).
(14) Korean CVC sequence
To explain the fact that Korean maintains place contrasts in coda stops, but
neutralizes laryngeal constrasts, we need only propose a constraint against multiple layers of
submerged structure. That is, Korean, like many other languages, appears to restrict the size
of syllable rimes. The claim would be that Korean only allows a single node under the VT
level. The laryngeal specification loses its structural housing and is not realized in this
position. This is shown in (14) as the crossed-out labels of the lower Noise and VO nodes.
Thus, the apparent mismatch in Korean between a licensed place contrast and neutralized
laryngeal contrast may be explained as a single constraint on the size of prosodic
constituents.9 With the submerged Noise and VO nodes eliminated in (14), it also falls out
naturally that coda stops in Korean are always produced without an audible release. The only
remaining part of the stop is Closure; the lower nodes associated with stop release have been
eliminated.
Kang (2003) provides evidence that the presence of epenthesis in English loanwords
in Korean is closely related to the probability that the English coda stops are produced with
an audible release. Epenthesis is expected when the target language coda is more likely to
contain a release burst, which in Korean only occurs in syllable onsets. Thus, released stops
are adapted with an additional prosodic constituent whose prominence is enhanced by the
epenthetic vowel. In other words, when Korean listeners hear a release burst, the attribute it
to a non-submerged structure. Thus, from this perspective epenthesis is not at all an
‘unnecessary’, but is required to repair released consonants that are not followed by a vowel.
9 This restriction may capture the fact that coda fricatives are also realized as unreleased stops. Space
restrictions prevent us from explaining this in detail.
32
Polish requires stop release in codas, and as such does not submerge its codas. Thus,
epenthesis after coda stops is standard in the speech of Korean learners of Polish
(Dziubalska-Kołaczyk, p.c.). It is not subject to the same variability found in Korean English.
Kang attributes the link between epenthesis in Korean English and target language
stop release to ‘perceptual similarity’, arguing for the importance of a non-contrastive
phonetic detail, stop release, in the mechanism of loanword adaptation. From the OP
perspective, stop release, though non-contrastive, is not merely a phonetic detail. It is
predictable on the basis of phonological parameters. The lack of release is surely a systemic
element of Korean phonology; it should be representable in phonological terms. The OP
environment offers phonological tools for the representation of such non-contrastive
phenomena.
5 Conclusion
This paper has presented data on vowel-initial words and final stop consonants in the speech
of Polish learners of English. Although at first glance these phenomena seem to be unrelated,
the representations of the OP framework offer a unified explanation for the observed patterns.
In both areas, L2 acquisition involves learning submersion, a mechanism that facilitates the
phonetic coordination of neighboring segments, and allows for predictions with regard to
boundary formation in Polish and English.
33
References
Antoniou, M.,Best,C.T.,Tyler,M.D.,&Kroos,C. (2010). Language context elicits native-like
stop voicing in early bilinguals’ productions in both L1 and L2. Journal of Phonetics,
38, 640-653.
Antoniou, M.,Best,C.T.,Tyler,M.D.,&Kroos,C. (2011). Inter-language interference in VOT
production by L2-dominant bilinguals. Journal of Phonetics, 39, 558-570.
Best, C. T. (1995). A direct realist perspective on cross-language speech perception. In W.
Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language
research (pp. 171-204). Timonium, MD: York Press.
Best, C. T., & Tyler, M. D. (2007). Nonnative and second-language speech perception:
Commonalities and complementarities. In M. J. Munro & O.-S. Bohn (Eds), Second
language speech learning – the role of language experience in speech perception and
production (pp. 13-34). Amsterdam: John Benjamins.
Bissiri, M. P., Lecumberri, M. L., Cooke, M., & Volín, J. (2011). The role of word-initial
glottal stops in recognizing English words. Proceedings of Interspeech 2011. Florence,
Italy.
Boersma, P. & Weenink, D. (2011). Praat: doing phonetics by computer. [Computer
program].Version 5.2.18.
Britain, D. & S. Fox. (2008). Vernacular universals and the regularisation of hiatus
resolution. Essex Research Reports in Linguistics, 57, 1-42.
Cebrian, J. (2000). Transferability and productivity of L1 rules in Catalan-English
interlanguage. Studies in Second Language Acquisition, 22, 1-26.
Chen, M. (1970). Vowel length variation as a function of the voicing of the consonant
environment. Phonetica, 22, 129-159.
Cook, A. (2000). American Accent Training. Hauppauge, NY: Barron’s.
Crosswhite, K. (2003). Spectral tilt as a cue to word stress in Macedonian, Polish, and
Bulgarian. In Sole, M.J., D. Recasens & J. Romero (eds.). Proceedings of XVth
International Congress of Phonetic Sciences, Barcelona, 767-770.
Cruttenden, A. 2001. Gimson’s Pronunciation of English (6th
ed.). London: Arnold.
Crystal, T. & A.S. House. (1988). The duration of American English stop consonants: an
overview. Journal of Phonetics, 16, 285-294.
Davidson, Lisa & Daniel Erker (2014). Hiatus resolution in American English: the case
against glide insertion. Language 90 (2). 482-514.
Dejna, K. (1973). Dialekty Polskie [Polish dialects]. Krakow: Ossolineum.
Dogil, G. (1999). The phonetic manifestation of word stress in Polish, Lithuanian, Spanish,
and German. In H. van der Hulst (ed.), Word Prosodic Systems in the Languages of
Europe (pp. 273-311). New York: Mouton de Gruyter.
Downing, Laura. 1998. On the prosodic misalignment of onsetless syllables. Natural
Language & Linguistic Theory, 16, 1-52.
De Jong, K. & H. Park. (2012). Vowel epenthesis and segmental identity in Korean learners
of English. Studies in Second Language Acquisition, 34, 127-155.
Dilley, L., S. Shattuck-Hufnagel & M. Ostendorf. (1996). Glottalization of word-initial
vowels as a function of prosodic structure. Journal of Phonetics, 24, 423-444.
Dukiewicz, L. & I. Sawicka. (1995). Gramatyka współczesnego języka polskiego – fonetyka i
fonologia [Grammar of modern Polish – phonetics and phonology]. Krakow:
Wydawnictwo Instytutu Języka Polskiego PAN.
Escudero, P. & Boersma, P. (2004). Bridging the gap between L2 speech perception research
and phonological theory. Studies in Second Language Acquisition, 26, 551-585.
34
Flege, J.E. (1987). The production of ‘new’ and ‘similar’ phones in a foreign language:
evidence for the effect of equivalence classification. Journal of Phonetics, 15, 47-65.
Flege, J.E.(1995). Second language speech learning: Theory, findings, and problems. In
W.Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language
research (pp. 233–277). Baltimore: York Press.
Garellek, M. (2012). Word-initial glottalization and voice quality strengthening.
UCLA Working Papers in Phonetics, 111, 92-122.
Goldinger, S. (1997). Perception and production in an episodic lexicon. In Johnson, K.
Mullennix, J. W. (eds.). Talker Varioability in Speech Processing (pp. 33-66). San
Diego: Academic Press.
Goldinger, S. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological
Review, 105 (2), 251-279.
Gussmann, E. (2007). The phonology of Polish. Oxford: Oxford University Press.
Hall, N. (2006). Cross-linguistic patterns of vowel intrusion. Phonology, 23, 387-429.
Harris, J. (1994). English Sound Structure. Oxford: Blackwell.
Howard, M. (2006). Variation in advanced French interlanguage: A comparison of three
(socio) linguistic variables. The Canadian Modern Language Review/La revue
canadienne des langues vivantes, 62 (3), 379-400.
Howard, M. (2008). 'On the role of naturalistic and classroom exposure in the acquisition of
socio-phonological variation: A longitudinal study of French liaison'. Journal of
Applied Linguistics, 5 (2), 159-179.
Johnson, K. (1997). Speech perception without speaker normalization: an exemplar model. In
K. Johnson & J. Mullennix, (Eds.). Talker variability in speech processing (pp. 145-
166). San Diego: Academic Press.
Kang, Y. (2003). Perceptual similarity in loanword adaptation: English post-vocalic word-
final stops to Korean. Phonology 20 (2): 219-273.
Kang, Y. (2011) Loanword phonology. In van Oostendorp, Marc, Colin Ewen, Elizabeth
Hume, and Keren Rice, eds., Companion to Phonology. Wiley-Blackwell.
Kreimann, J. & B. Gerratt. (2010). Perceptual sensitivity to first harmonic amplitude in the
voice source. Journal of the Acoustical Society of America, 128 (4), 2085-2089.
Lecumberri, M. & J. Maidment (2000). English transcription course. Oxford: Routledge.
Levi, S. (2008). Phonemic vs. derived glides. Lingua, 118,1956-1978.
Lindblom, B. 1990. Explaining phonetic variation: a sketch of the H&H theory. In W. J.
Hardcastle and A. Marchal (eds.) Speech Production and Speech Modelling. The
Netherlands: Kluwer Academic. 403-439.
LLeo, C. & I. Vogel. (2004). Learning new segments and reducing domains in German
L2 phonology: The role of the Prosodic Hierarchy. International Journal of
Bilingualism, 8, 79-104.
McCarthy, J. 1993. A case of surface constraint violation. Canadian Journal of Linguistics,
38.169-95.
Malisz, Z, M. Żygis & B. Pompino-Marschall. (2013). Rhythmic structure effects on
glottalisation: a study of different speech styles in Polish and German. Laboratory
Phonology, 4, 119-158.
Major, R. (2008). Transfer in second language phonology: a review. In: Hansen Edwards, J.
& M. Zampini. Phonology and Second Language Acquisition (pp. 63-94). Amsterdam:
John Benjamins.
Marlett, S. & J. Stemberger. (1983). Empty consonants in Seri. Linguistic Inquiry, 14, 617-
639.
Nespor, M. and I. Vogel (1986). Prosodic Phonology. Dordrecht: Foris.
35
Newlin-Łukowicz, L. (2012). Polish Stress: looking for phonetic evidence of a bidirectional
system. Phonology, 29(2), 271-329.
Ohala, J. (1981). The listener as a source of sound change. In: C. S. Masek, R. A. Hendrick,
& M. F. Miller (eds.), Papers from the Parasession on Language and Behavior
(pp.178-203). Chicago: Chicago Linguistic Society.
Pardo, J. S., Gibbons, R., Suppes, A. & Krauss, R. M. (2012). Phonetic convergence in
college roommates. Journal of Phonetics, 40, 190-197.
Plag, I., G. Kunter, & M. Schramm. (2011). Acoustic correlates of primary and secondary
stress in North American English. Journal of Phonetics,39, 362-374.
Port, R. & J. Dalby. (1982). C/V ratio as a cue for voicing in English. Journal of the
Acoustical Society of America, 69, 262-274.
Redi, L. & S. Shattuck-Hufnagel. (2001). Variation in the realization of glottalization in
normal speakers. Journal of Phonetics, 29, 407-429.
Rojczyk, A. (2012). Phonetic and phonological mode in second-language speech. VOT
imitation. Paper presented at EUROSLA 2012, Poznań, Poland, September 5-8 2012.
Rojczyk, A. (2013). Phonetic imitation of L2 vowels in a rapid shadowing task. In Lewis, J &
LeVelle, K. (eds). Proceedings of the 4th Pronunciation in Second Language Learning
and Teaching Conference (pp. 66-76). Ames, IA: Iowa State University.
Rojczyk, A., Porzuczek, A. & Bergier, M. (2013). Immediate and distracted imitation in
second-language speech: Unreleased plosives in English. Research in Language, 11, 3-
18.
Rubach, J. & G. Booij (1990b). Syllable structure assignment in Polish. Phonology 7. 121-
158.
Scheer, T. (2008). Why the Prosodic Hierarchy is a diacritic and why the Interface must be
Direct. In Hartmann, Jutta, Veronika Hegedus & Henk van Riemsdijk, (eds.). Sounds of
Silence: Empty Elements in Syntax and Phonology (pp. 145-192). Amsterdam: Elsevier.
Schwartz, G. (2010a). Auditory representations and the structures of GP 2.0. Acta Linguistica
Hungarica, 57, 381-397.
Schwartz, G. (2010b). Phonology in the signal – unifying cue and prosodic licensing. Poznań
Studies in Contemporary Linguistics, 46(4), 499-518.
Schwartz, G. (2012a). Glides and initial vowels within the Onset Prominence representational
environment. Poznań Studies in Contemporary Linguistics, 48 (4), 661-685.
Schwartz, G. (2012b). Initial glottalization and final devoicing in Polish English. Research in
Language, 10, 159-171.
Schwartz, G. (2013a). A representational parameter for onsetless syllables. Journal of
Linguistics, 49 (3), 613-646. DOI: http://dx.doi.org/10.1017/S0022226712000436.
Schwartz, G. (2013b). Vowel hiatus at Polish word boundaries – phonetic realization and
phonological implications. Poznań Studies in Contemporary Linguistics, 49 (4), 557-
585.
Schwartz, Geoffrey & Grzegorz Aperliński (2014). The phonology of CV transitions. In
Eugeniusz Cyran & Jolanta Szpyra-Kozłowska (eds.) Crossing phonetics-phonology
lines. Newcastle: Cambridge Scholars Publishing. 277-298.
Schwartz, Geoffrey, Anna Balas & Arkadiusz Rojczyk (2014a). Stop release in Polish
English – implications for prosodic constituency. Research in Language 12(2). 131-
144.
Schwartz, Geoffrey, Anna Balas & Arkadiusz Rojczyk (2014b). External sandhi in L2
segmental phonetics – final (de)voicing in Polish English. Concordia Working Papers
in Applied Linguistics 5. 637-649.
Shockley, K., Sabadini, L. & Fowler, C. A. (2004). Imitation in shadowing words. Perception
and Psychophysics, 66, 422-429.
36
Shoemaker, E. (2010). Nativelike Attainment in L2 Listening: The segmentation of spoken
French. In Dziubalska-Kołaczyk, K., M. Wrember & M. Kul (eds.) Proceedings of the
6th international symposium on second language speech, New Sounds 2010, Poznań,
Poland 1-3 May 2010. 433-438.
Sluijter, Agaath M.C. & Vincent J. van Heuven. 1996. Spectral balance as an acoustic
correlate of linguistic stress. Journal of the Acoustical Society of America, 100, 2471-
2485.
Steriade, Donca. 1993. Closure, release, and nasal contours. In M. K. Huffman and R. A.
Krakow (Eds.). Nasals, nasalization, and the velum (pp. 401−470). San Diego:
Academic Press.
Strycharczuk, P. (2012). Phonetics-phonology interactions in pre-sonorant voicing. Ph.D.
dissertation. University of Manchester.
Sturm, J. (2013). Liaison in L2 French: The effects of instruction. In. J. Levis & K. LeVelle
(Eds.). Proceedings of the 4th
Pronunciation in Second Language Learning and
Teaching Conference, Aug. 2013 (pp. 157-166). Ames, IA: Iowa State University.
Święciński, R. (2012). Acoustic aspects of palatalization in Polish and English – a study in
laboratory phonology. Ph.D. dissertation. Marie Curie-Skłodowska University, Lublin.
Uffmann, C. 2007. Intrusive [r] and optimal epenthetic consonants. Language Sciences, 29.
451-76.
Umeda, N. (1978). Occurrence of glottal stops in fluent speech. Journal of the Acoustical
Society of America, 64, 81-94.
Wells, J.C. (2008). Longman Pronunciation Dictionary (3rd
ed.). Harlow: Pearson Education
Limited Wiese, R. (1996). The phonology of German. Oxford: Clarendon Press.
Wright, R., S. Frisch & D. Pisoni. (1997). Speech Perception. Research on Spoken Language
Processing, Progress Report No. 21. Ms., Indiana University.
Wright, R. (2004). Perceptual cue robustness and phonotactic constraints. In Hayes, B., R.
Kirchner and D. Steriade (eds). Phonetically Based Phonology. Cambridge: Cambridge
University Press, 34-57.
Zsiga, E. (2003). Relearning consonant timing. Studies in Second Language Acquisition, 25,
399-432.
Zsiga, E. (2011). External sandhi in a second language: The phonetics and phonology of
obstruent nasalization in Korean and Korean-accented English. Language 87. 289 -
345.
37
Appendix 1 – Stimuli for C#V production experiment
They had evenings free
The child had red ears
They made everyone stay quiet.
There is a big dark cloud overhead.
I found out too late about the party.
The band bowed after playing the song.
Frank showed everyone his new pad.
Hard apples are my favorite.
The kids made excellent cookies.
Bill stayed after class to talk to the teacher.
I’m afraid Alice will be late.
The band played easy songs to dance to.
We paid everyone about two pounds.
I tried everything but I couldn’t make it work.
Brad even forgot the car keys
They should arrive around eight.
The judge ordered us to pay the fine
I’ve had easier tests than this one.
We stayed out all night
I tried out the new computer
She was all tired out after work
They earned equal amounts of money
Her friend Eve is very nice
I tried eel for the first time in a Japanese restaurant
Ted's apples are hard and sour
Rob avoids Alice’s uncle
Mary's earrings are made of aluminium
I bought five extra pounds of apricots
Peg's other sister likes to ride every day
George often sings after school
Fred's aunt is 80 years old
Jazz always was Adam’s favorite music
Today's express train was over three hours late
Fred always fills up his tank
Appendix 2 – stimuli for V#V production experiment
1-stay out 2-try out
3-see all 4-way out 5-go out 6-know everything
7-saw everything 8-be unable
9-know Alex 10-how interesting
38
11-know after 12-grow excellent 13-try each 14-they always 15-go every day
16-they actually 17-car always 18-your idea 19-the other 20-the end
21-they asked 22-be afraid
23-tell me all 24-I often 25-sure I 26-holiday activities
Appendix 3 – Stimuli for V#V listening test, with additional segmental errors identified by
pronunciation teachers
Utterance Errors in glottal token Errors in modal token
and they asked me – –
I go every day – –
his car always breaks
down – short VOT in /k/,
devoiced /z/ in his
and always
I’ll know after the exam vowel quality in
know –
I’m sure I heard a
telephone ringing – ng in ringing
I often imagine things vowel quality, ng –
I saw everything – –
see all the pictures unaspirated /p/ vowel quality in all
why did you stay out all
night – –
they actually like quality of /æ/ in
actually
–
they always have
something prepared – –
we’ll be unable to send – –