Neural correlates of acoustic cues of English lexical stress

10
Neural correlates of acoustic cues of English lexical stress in Cantonese-speaking children Xiuhong Tong a , Catherine McBride a,, Juan Zhang b , Kevin K.H. Chung c , Chia-Ying Lee d , Lan Shuai e , Xiuli Tong f a Psychology Department, The Chinese University of Hong Kong, Shatin, Hong Kong b Faculty of Education, University of Macau, Macao c Department of Special Education and Counselling, The Hong Kong Institute of Education, Hong Kong d The Institute of Linguistics, Academia Sinica, Taiwan e Department of Electrical and Computer Engineering, Johns Hopkins University, United States f Division of Speech and Hearing Sciences, The University of Hong Kong, Pokfulam, Hong Kong article info Article history: Accepted 2 September 2014 Available online xxxx Keywords: English lexical stress processing Prosody development MMN p-MMR abstract The present study investigated the temporal course of neural discriminations of acoustic cues of English lexical stress (i.e., pitch, intensity and duration) in Cantonese-speaking children. We used an event- related potential (ERP) measure with a multiple-deviant oddball paradigm to record auditory mismatch responses to four deviants, namely, a change in pitch, intensity, or duration, or a change in all three acoustic dimensions, of English lexical stress in familiar words. In the time window of 170–270 ms, we found that the pitch deviant elicited significant positive mismatch responses (p-MMRs) and that the duration deviant elicited a mismatch negativity (MMN) response as compared with the standard. In the time window of 270–400 ms, the intensity deviant elicited a significant p-MMR, whereas both the duration and the three-dimension changed deviants elicited significant MMNs. These results suggest that Cantonese-speaking children are sensitive to either single or convergent acoustic cues of English words, and that the relative weighting of pitch, intensity and duration in stress processing may correlate with different ERP components at different time windows in Cantonese second graders. Ó 2014 Elsevier Inc. All rights reserved. 1. Introduction Research on speech has typically focused on how phonetic seg- ments such as vowels and consonants are encoded during speech perception (e.g., Mesgarani, Cheung, Johnson, & Chang, 2014). There has been little work on the discrimination of suprasegmental features of speech, such as lexical stress in English. Lexical stress refers to the relative emphasis or prominence of syllables within words or of words in sentences, such as PREsent 1 (n 0 pre-zəntn; gift) and preSENT (npri- 0 zentn)(Fry, 1955, 1958; Selkirk, 1980). Although behavioral research on native English-speaking adults’ perception and production of English lexical stress has suggested that stress is acoustically related to pitch (i.e., fundamental frequency [F0]), dura- tion, and intensity (e.g., Crystal, 1969; Kehoe, Stoel-Gammon, & Buder, 1995), the neural correlates of encoding pitch, intensity and duration during English lexical stress processing in children remains poorly understood. In particular, no study has yet examined the neu- ral discriminations of acoustic cues of English lexical stress in chil- dren whose first language is a tonal language, such as Cantonese speakers learning English as a second language. In this study, we thus used an event-related potential (ERP) measure to explore neural discriminations of English lexical stress cues (i.e., pitch, intensity and duration) in Cantonese-speaking children. We focused on whether Cantonese-speaking second graders acquiring English as a second language can use these three acoustic cues in English stress percep- tion; and to what extent the weight of each cue varies with unfold- ing of stress perception in those second graders, as well as what neural markers would be associated with each acoustic cue. Within the last decades, researchers have become more inter- ested, both theoretically and in empirical work, in stress percep- tion and production in both native and non-native speakers. Empirical evidence on perception of lexical stress in adult native speakers of English suggests that F0, duration, and intensity are the main acoustic correlates of English stress perception (e.g., Fry, 1958; Kehoe et al., 1995; Mol & Uhlenbeck, 1955; Morton & Jassem, 1965). For example, Fry (1958) found that among the three http://dx.doi.org/10.1016/j.bandl.2014.09.004 0093-934X/Ó 2014 Elsevier Inc. All rights reserved. Corresponding author. E-mail address: [email protected] (C. McBride). 1 The capitalized letters represent stress syllables. Brain & Language 138 (2014) 61–70 Contents lists available at ScienceDirect Brain & Language journal homepage: www.elsevier.com/locate/b&l

Transcript of Neural correlates of acoustic cues of English lexical stress

Brain & Language 138 (2014) 61–70

Contents lists available at ScienceDirect

Brain & Language

journal homepage: www.elsevier .com/locate /b&l

Neural correlates of acoustic cues of English lexical stressin Cantonese-speaking children

http://dx.doi.org/10.1016/j.bandl.2014.09.0040093-934X/� 2014 Elsevier Inc. All rights reserved.

⇑ Corresponding author.E-mail address: [email protected] (C. McBride).

1 The capitalized letters represent stress syllables.

Xiuhong Tong a, Catherine McBride a,⇑, Juan Zhang b, Kevin K.H. Chung c, Chia-Ying Lee d, Lan Shuai e,Xiuli Tong f

a Psychology Department, The Chinese University of Hong Kong, Shatin, Hong Kongb Faculty of Education, University of Macau, MacaocDepartment of Special Education and Counselling, The Hong Kong Institute of Education, Hong Kongd The Institute of Linguistics, Academia Sinica, TaiwaneDepartment of Electrical and Computer Engineering, Johns Hopkins University, United StatesfDivision of Speech and Hearing Sciences, The University of Hong Kong, Pokfulam, Hong Kong

a r t i c l e i n f o a b s t r a c t

Article history:Accepted 2 September 2014Available online xxxx

Keywords:English lexical stress processingProsody developmentMMNp-MMR

The present study investigated the temporal course of neural discriminations of acoustic cues of Englishlexical stress (i.e., pitch, intensity and duration) in Cantonese-speaking children. We used an event-related potential (ERP) measure with a multiple-deviant oddball paradigm to record auditory mismatchresponses to four deviants, namely, a change in pitch, intensity, or duration, or a change in all threeacoustic dimensions, of English lexical stress in familiar words. In the time window of 170–270 ms, wefound that the pitch deviant elicited significant positive mismatch responses (p-MMRs) and that theduration deviant elicited a mismatch negativity (MMN) response as compared with the standard. Inthe time window of 270–400 ms, the intensity deviant elicited a significant p-MMR, whereas both theduration and the three-dimension changed deviants elicited significant MMNs. These results suggest thatCantonese-speaking children are sensitive to either single or convergent acoustic cues of English words,and that the relative weighting of pitch, intensity and duration in stress processing may correlate withdifferent ERP components at different time windows in Cantonese second graders.

� 2014 Elsevier Inc. All rights reserved.

1. Introduction

Research on speech has typically focused on how phonetic seg-ments such as vowels and consonants are encoded during speechperception (e.g., Mesgarani, Cheung, Johnson, & Chang, 2014).There has been little work on the discrimination of suprasegmentalfeatures of speech, such as lexical stress in English. Lexical stressrefers to the relative emphasis or prominence of syllables withinwords or of words in sentences, such as PREsent1 (n0pre-zəntn; gift)and preSENT (npri-0zentn) (Fry, 1955, 1958; Selkirk, 1980). Althoughbehavioral research on native English-speaking adults’ perceptionand production of English lexical stress has suggested that stress isacoustically related to pitch (i.e., fundamental frequency [F0]), dura-tion, and intensity (e.g., Crystal, 1969; Kehoe, Stoel-Gammon, &Buder, 1995), the neural correlates of encoding pitch, intensity andduration during English lexical stress processing in children remains

poorly understood. In particular, no study has yet examined the neu-ral discriminations of acoustic cues of English lexical stress in chil-dren whose first language is a tonal language, such as Cantonesespeakers learning English as a second language. In this study, wethus used an event-related potential (ERP) measure to explore neuraldiscriminations of English lexical stress cues (i.e., pitch, intensity andduration) in Cantonese-speaking children. We focused on whetherCantonese-speaking second graders acquiring English as a secondlanguage can use these three acoustic cues in English stress percep-tion; and to what extent the weight of each cue varies with unfold-ing of stress perception in those second graders, as well as whatneural markers would be associated with each acoustic cue.

Within the last decades, researchers have become more inter-ested, both theoretically and in empirical work, in stress percep-tion and production in both native and non-native speakers.Empirical evidence on perception of lexical stress in adult nativespeakers of English suggests that F0, duration, and intensity arethe main acoustic correlates of English stress perception (e.g.,Fry, 1958; Kehoe et al., 1995; Mol & Uhlenbeck, 1955; Morton &Jassem, 1965). For example, Fry (1958) found that among the three

62 X. Tong et al. / Brain & Language 138 (2014) 61–70

cues, F0 is the most important cue for English stress perception,followed by duration and intensity. Bolinger (1965) also arguedthat F0 is the strongest cue in English stress perception, and thatboth duration and intensity are only secondary. In addition,stressed syllables are characterized as having increased magni-tudes of F0, longer duration, and greater intensity relative tounstressed syllables (e.g., Klatt, 1976; Lieberman, 1960).

There is increasing interest in perception of non-native lexicalstress contrasts in adult listeners (e.g., Frost, 2011; Peperkamp &Dupoux, 2002; Wang, 2008). For example, Peperkamp andDupoux (2002) reported that French adult speakers showed stress‘‘deafness” in English lexical stress discrimination, because Frenchis a language with predictable stress while English has an unpre-dictable stress pattern. In related work, Peperkamp, Vendelin andDupoux (2010) showed that adult speakers of Standard French,Southeastern French, Finnish, and Hungarian, all of which havefixed stress patterns, had difficulties in perceiving stress contrasts.In contrast, there was no such ‘‘stress deafness” found in adultSpanish speakers whose native language has unpredictable stress.Frost (2011) argued that French and English native speakers maynot process stress in the same way. These studies of ‘‘stress deaf-ness” have been focused on adult speakers of languages with pre-dictable stress such as French versus adult speakers of languagewith unpredictable stress, such as Spanish. Little is known aboutwhether tone language speakers, whose L1 is a non-stressed lan-guage, such as Cantonese, are sensitive to English lexical stress,in particular, to the different acoustic dimensions including pitch,intensity, and duration. Thus, we move one step further by inves-tigating whether Cantonese-speaking children are sensitive tothree different acoustic correlates of English lexical stress (i.e.,pitch, intensity and duration), and whether a similar order of per-ceived relative importance (F0–duration–intensity) would beobserved in young Cantonese-speaking children.

There have been only a few empirical studies on English lexicalstress perception in Chinese2 learners of English (e.g., Chan, 2007;Wang, 2008). Wang (2008) evaluated the effects of F0, duration,and intensity on English stress perception in Mandarin Chineselearners of English and native English speakers. Results demon-strated that all three cues had a significant influence on Englishstress perception for native English speakers, but only F0 was foundto be important for Chinese learners of English. Similar findings wereobtained in adult Cantonese learners of English by Chan (2007) whofound that Cantonese speakers used F0 as the primary cue in Englishstress perception, but the native English speakers used spectral bal-ance (i.e., the distribution of intensity over the frequency spectrum)as the most important cue in stress perception.

The finding that Chinese learners of English rely more on F0than other acoustic cues indicates some transfer of reliance on F0from the L1 tonal language to L2 stress (e.g., Nguyen & Ingram,2005; Pennington & Ellis, 2000). More specifically, perceptual stud-ies of Chinese lexical tone suggest that F0 is the primary acousticcue for Chinese tone perception (e.g., Khouw & Ciocca, 2007;Vance, 1976). Chinese speakers, therefore, may transfer the strat-egy in perceiving lexical tone to English stress perception (e.g.,Wang, 2008). This possibility of transfer is also supported by stud-ies on English stress production in Chinese speakers, revealing thatChinese speakers may adopt the strategies used in their native toneproduction task to produce English stress (Zhang, Nissen, & Francis,2008). For example, Zhang et al. (2008) provided extensive acousticanalyses of English stress production by Mandarin Chinese speak-ers and English speakers and demonstrated that Mandarin Chinesespeakers used the acoustic cues of F0, duration, and intensity in a

2 In the present study the word ‘‘Chinese” is used as a blanket term referring to thedistinct languages of Mandarin and Cantonese.

similar manner as native English speakers in stress production.That is, both Chinese speakers and native English speakers pro-duced stressed syllables with a higher F0, longer duration, andgreater intensity than unstressed syllables. These findings suggestthat F0, duration, and intensity are all implicated in English stressperception in L2 English learners. Moreover, the relative impor-tance of these acoustic cues in stress perception in L2 learnersmay be influenced by the native tone languages. Studies of childlearners of English lexical stress perception-particularly the moresubtle perception of pitch, intensity, and duration-are needed tofurther explore these possibilities.

Another important issue yet to be examined to date is the neu-ral markers of the acoustic cues of stress in stress perception. Inparticular, we know of no research that has systematically manip-ulated the three different acoustic correlates of English lexicalstress including F0, intensity and duration and evaluated theireffects on perception of English lexical stress in Cantonese-speak-ing children who are English learners. Therefore, in this study,we adopted an ERP measure to explore the neural discriminationsof English stress and further evaluate the relative importance ofthree acoustic correlates of English stress (i.e., F0, duration, andintensity) in stress perception in Cantonese-speaking second grad-ers acquiring English as a second language.

It is widely known that the ERP measure is an approach with avery fine temporal resolution; it can be used to represent thebrain’s response to either a passive or an eliciting input. In ERPstudies of speech perception, the auditory passive oddball para-digm is often used to examine participants’ discriminative abilityin speech perception and production with either single or multipledeviants (e.g., for reviews see Cheour, Leppänen, & Kraus, 2000;Näätänen, 2001; Näätänen, Paavilainen, Alho, Reinikainen, &Sams, 1989; Näätänen, Pakarinen, Rinne, & Takegata, 2004). Inthe passive oddball paradigm, participants are usually presentedwith a stream of frequent stimuli (standard) and infrequent stimulidiffering in some discriminable change (for reviews, see Cheouret al., 2000; Näätänen, Paavilainen, Rinne, & Alho, 2007). A specificERP component, i.e., the mismatch negativity (MMN), is oftenobserved in this paradigm by subtracting the ERP responses to fre-quent stimuli (standard) from those of infrequent stimuli (deviant)(e.g., Chandrasekaran, Gandour, & Krishnan, 2007; Cheour et al.,1997; Näätänen et al., 1989). The MMN is found to distribute overthe fronto-central electrodes with a peak in the time window ofbetween 150 ms and 250 ms from the change onset of the stimuliin adults, and it reflects automatic, pre-attentive cortical process-ing. The MMN is suggested to be an indicator of the participant’sability to discriminate between the standard and the deviant; theMMN has been found to become smaller or disappear as the degreeof deviance between the standard and deviant is reduced (for areview, see Näätänen et al., 2007). The MMN, which can beobtained irrespective of participants’ attention or the behavioraltask administered, is a useful tool to use to examine auditory orspeech perception in infants and children, who are limited inattention or motivation (e.g., Cheour et al., 2000; Kuhl, 1998; Leeet al., 2012; Morr, Shafer, Kreuzer, & Kurtzberg, 2002). However,previous research has predominantly focused on investigations ofthe segmental level of speech such as vowels and consonants (fora review, see Näätänen et al., 2007), so less is known about thebrain responses to suprasegmental features, such as English lexicalstress in Cantonese-speaking children who learn English as a sec-ond language.

There have been a few ERP studies on neural discriminations ofGerman stress in German monolinguals (Weber, Hahne, Friedrich,& Friederici, 2004). For example, Weber et al. (2004) used an MMNparadigm to investigate German-speaking adults’ and 4- and 5-month-old infants’ ERP responses to trochaic (on the first syllable)and iambic (on the second syllable) stress patterns in two-syllable

X. Tong et al. / Brain & Language 138 (2014) 61–70 63

German pseudowords. In the trochaic condition, an iambic CVCVitem /baba:/ was frequently presented and was occasionallyreplaced by the trochaic deviant CVCV item /ba:ba/; in an iambiccondition, the trochaic /ba:ba/ was assigned as the standard andthe iambic CVCV item was assigned as the deviant. The authorsreported that a typical MMN was observed for both the trochaicitem and the iambic item, and they also found that 4-month-oldinfants did not show reliable responses to either condition. But asignificant mismatch positive response (p-MMR) was observedfor the trochaic item in 5-month-old infants.

The p-MMR is usually observed in oddball studies of infants’and children’s speech perception in the time window between150 and 450 ms with a similar topographic distribution as the typ-ical MMN (e.g., Cheng et al., 2013; Friederici, Friedrich, & Weber,2002; Jing & Benasich, 2006; Lee et al., 2012; Maurer, Bucher,Brem, & Brandeis, 2003). However, what the p-MMR reflects andwhen it may be present or absent is still debatable (e.g., Chenget al., 2013; Lee et al., 2012; Maurer et al., 2003; Shafer, Yan, &Datta, 2010). There are several accounts proposed to explain themechanism of this component. For example, it is suggested thatthe p-MMR may act as an analogy of sorts to the adult-like P3a,and reflect distractibility or an involuntary attention shift or theautomatic categorization of stimuli (e.g., Alho, Sainio, Sajaniemi,Reinikainen, & Näätänen, 1990; He, Hotson, & Trainor, 2009;Shestakova, Huotilainen, & Cheour, 2003). Other researchers pro-pose that the p-MMR may reflect a recovery from refractoriness,indexing the detection and encoding of the acoustic properties ofa stimulus in connections in the primary auditory cortex (e.g.,Escera, Alho, Winkler, & Näätänen, 1998). In addition, someresearchers suggest that the p-MMR found in children might havethe same functional nature as the typical MMN found in adults,which may reflect additional or increased neural activation to devi-ants relative to standards (e.g., Maurer et al., 2003). Recently, therehave been several studies demonstrating that the absence or pres-ence of the p-MMR is correlated with the features of deviants suchas deviance size (e.g., Cheng et al., 2013; Lee et al., 2012; Maureret al., 2003). Nonetheless, MMN and p-MMR may serve as indica-tors of speech perception at both the segmental and suprasegmen-tal levels.

Taken together, the present study extended previous researchto Cantonese-speaking second graders who acquire Cantonese lex-ical tone and English lexical stress, two distinct suprasegmentalfeatures, in parallel. We systematically examined the neural dis-criminations of changes in acoustic features of English lexicalstress using an ERP measure. Such an investigation may help toclarify the impact of first language experience on the neural mech-anisms underlying English lexical stress processing. Cantonese andEnglish represent two extremes of the world’s languages in termsof suprasegmental phonology (tone versus non-tone; stressed ver-sus non-stressed). Cantonese is a tone language, and there are sixdistinctive lexical tones (up to nine depending upon how onecounts it): tone 1-high level (55), tone 2-high rising (25), tone 3-mid level (33), tone 4-low falling (21), tone 5-low rising (23), tone6-low level (22).3 Lexical tone can minimally contrast words. Forexample, one monosyllable /fu/ can represent six words/meaningsof /fu55/膚(skin), /fu25/虎(tiger), /fu33/褲(trousers), /fu21/符(symbol),/fu23/婦(woman), and /fu22/父(father) (e.g., Bauer & Benedict, 1997;Tong, McBride, & Burnham, in press). Despite the difference betweenCantonese and English, Cantonese lexical tones share certain acous-tic and functional similarities with English lexical stress. Acousti-cally, although the primary acoustic correlate of lexical tone isfundamental frequency (F0) (e.g., Bauer & Benedict, 1997; Tong

3 Chao (1947) first transcribed lexical tone in a numerical notational system byusing five levels (from lowest 1 to highest 5) to describe relative height, shape andduration of pitch contour.

et al., in press), duration and intensity are also related to Cantonesetone (e.g., Ng, Gilbert, & Lerman, 2000; Wu & Xu, 2010). Functionally,just as the variation of pitch on contiguous syllables (i.e., Englishstress) can result in changes in meaning, the variation of pitch in sin-gle syllables (i.e., Cantonese tone) can distinguish meanings forwords. Given the clear evidence showing that L2 learners tend tomake reference to acoustic cues that are actively involved in bothL1 and L2 in L2 speech perception (Nguyen & Ingram, 2005), it is the-oretically interesting to investigate the neural process of Cantonese-speaking children’s encoding of different acoustic cues including F0,duration and intensity during English lexical stress processing.

Thus, in the present study, we tested whether the MMN, whichis usually elicited in the oddball paradigm in segmental levelspeech and auditory research, would be a marker of neural dis-crimination of English lexical stress perception in Cantonese-speaking children, and whether Cantonese-speaking childrenwould show different brain responses to F0, duration, and intensityof different acoustic cues of English lexical stress. We expected thatCantonese-speaking second graders might show neural sensitivityto all three acoustic cues of English lexical stress such as F0, inten-sity, and duration given that these three acoustic cues are accessi-ble in their L1 and L2 suprasegmental phonology.

In addition, we are also interested in examining the relativeweights of the three acoustic cues in English lexical stress percep-tion for Cantonese-speaking second graders. We expected that theimportance of the three cues might vary with unfolding of thestress process. Given that the MMN reflects the automatic, pre-attentive cortical processing of auditory or speech signals, the pres-ent study thus might identify neural patterns that correspond tothe changes of different acoustic cues (F0, intensity and duration)of English lexical stress. The present study thus sought to provideevidence on how different acoustic cues are encoded and used inEnglish lexical stress to highlight the neural markers of stress per-ception in Cantonese-speaking children. This would be informativefor understanding the neural correlates of English lexical stressperception in young L2 learners whose native language is a tonallanguage. Such an investigation of Cantonese-speaking children’sneural processing of acoustic cues of English lexical stress mightadditionally provide some practical ideas as well. The linguistic dif-ferences, in particular, suprasegmental phonology, between Eng-lish and Cantonese provide a challenge for second language (L2)learners, and also highlight the task that Cantonese-speaking chil-dren face in shifting their linguistic attention to the specific pho-netic features of second language which are not distinctive intheir native language. These difficulties partly motivated this studyby focusing on identifying the neural marker that might best cor-respond to the acoustic cues used in English lexical stress process-ing. With this knowledge, investigators might then consider howto aid Cantonese children optimally in the acquisition of Englishlexical stress. This might also be clinically informative for identify-ing English L2 learners who have difficulty in English lexical stressperception. That is, the mismatch responses to acoustic cues ofEnglish stress might serve as neural indicators of English stressperception difficulty for Cantonese learners (Tong, Tong, &McBride-Chang, 2013).

2. Methods

2.1. Participants

Participants were 18 Hong Kong second grade children (9 girlsand 9 boys) ranging in age from 7 years 4 months to 8 years3 months (M = 7; 10, SD = 3 months). According to parents’ reports,all children were typically developing without any history of neu-rological, psychiatric, brain injury, or hearing problems, or learningdifficulties.

64 X. Tong et al. / Brain & Language 138 (2014) 61–70

This age range was selected because of clear evidence of thegrowth in children’s ability in using pitch cues to make lexical dis-tinctions in English words in this age (Quam & Swingley, 2014).Furthermore, the 7- to 8-year-olds in Hong Kong had had morethan 3 years’ experience in learning English as a second language,and this enabled us to explore how Cantonese children use differ-ent acoustic cues to perceive stress. Hong Kong Chinese childrenhave typically begun to learn English at the age of 3 years or evenearlier in Hong Kong (McBride-Chang et al., 2008). Hong Kong chil-dren are formally taught to learn English in kindergarten, either bylocal English teachers whose native language is Chinese, or bynative English speakers. Native English speakers are commonlyrecruited to teach English in kindergartens (Leung, Lim, & Li,2013). Moreover, some Hong Kong families have Filipina womenas domestic helpers to look after their children. The Filipinas oftenspeak English to the children. Hong Kong parents are very moti-vated to speak with their children in English. Thus, Hong Kong Chi-nese children in this age range have typically had relativelyextensive exposure to English.

2.2. Stimuli and design

A multiple-feature oddball paradigm was adopted in the pres-ent study. The multiple-feature paradigm has been widely used

Am

plitu

deFr

eque

ncy

5000

(a) Standard (b) Pitc

00 300

Time (ms)

.5

0

-.5

(d) Intensity-changed

0.5

-0.5

0

Am

plitu

deFr

eque

ncy

00 300

Time (ms)

5000

Fig. 1. Waveforms (the upper row) and spectrograms (the lower row) of the stress stimuland the blue line represents the F0 features of stress. In the spectrograms, the dark areas iinterpretation of the references to color in this figure legend, the reader is referred to th

to access auditory and speech processing in both children andadults (e.g., Cheng et al., 2013; Lee et al., 2012; Maurer et al.,2003; Näätänen et al., 2004). This paradigm has thus been identi-fied as being stable for examining auditory and speech perception.The basic assumption of the oddball paradigm is that the deviants,which differ from the standard in one respect, can strengthen thememory trace for the standard with regard to those attributes theyhave in common (e.g., Näätänen et al., 2004).

In the present study, we manipulated the stimuli that differedin only one dimension: (1) pitch (i.e., fundamental frequency[F0]); (2) intensity; (3) duration, and that differed in three dimen-sions (pitch, intensity and duration) of the stress of a disyllabicEnglish word. Therefore, there were four deviants including changein pitch, or in intensity or in duration only as well as a change in allthree acoustic cues (pitch, intensity and duration) in the presentstudy (see Fig. 1).

Experimental stimuli were generated with the word pairMOther and toDay. This word pair was selected for two primaryreasons. First, this word pair differed in stress pattern: MOther isdisyllabic with a trochaic stress pattern (a stressed syllable fol-lowed by an unstressed one), whereas toDAY is a disyllabic wordwith an iambic stress pattern (an unstressed syllable followed bya stressed one). This is in accordance with the rationale for design-ing stress stimuli for young children in a very recent study of

h-changed (c) Duration-changed

(e) Three-dimension-changed

i used in the present study. In the spectrograms, the yellow line represents intensity,ndicate the time and frequency points where the acoustic energy is the highest. (Fore web version of this article.).

X. Tong et al. / Brain & Language 138 (2014) 61–70 65

English-speaking children’s stress perception (Quam & Swingley,2014). Second, these two words are familiar words for 7- to8-year-old Hong Kong Cantonese children. There is empirical evi-dence showing that young children encode the phonetic detail offamiliar words for lexical distinction (Swingley & Aslin, 2000).

The words mother and today were produced by a female nativeEnglish speaker in a soundproof room. The original sound ofMOther was assigned as the standard (see Fig. 1a). The four devi-ants (pitch, intensity, duration) were constructed on the basis ofthe original sound of MOther with reference to the acoustic param-eters of toDAY. We extracted the prosodic information from theword ‘‘toDAY” and it served as a template to synthesize deviantstimuli. All deviant stimuli were constructed using the softwarePraat (Boersma & Weenink, 2004).

The pitch deviant for the target word MOther was generated onthe basis of the pitch contour of today by maintaining the intensityof the whole word (i.e., 75 dB). The original mean pitch values were216.70 and 180.20 Hz for first and second syllables, respectively.After adjustments, the mean pitch values were 165.61 and2020.78 Hz for first and second syllables, respectively (see Fig. 1b).

The duration was also adjusted according to the word toDAY.The original duration of the first syllable of /0mʌ/ was around148 ms, and the second syllable of /ðə/ was around 152 ms. Tomatch with the template word today, the duration values wereadjusted to be 98 ms and 228 ms, for the first syllable /0mʌ/ andthe second syllable /ðə/, respectively (see Fig. 1c).

For word toDAY, the intensity value was around 76 dB for thefirst syllable /tə/ and it was approximately 80–81 dB for the secondsyllable /0deɪ/. For the word MOther, the intensity value was76.78 dBSPL and 74.48 dBSPL for the first syllable /0mʌ/ and thesecond syllable /ðə/, respectively. We then averaged intensity ofeach syllable and results in the adjusted intensity 69.35 dBSPLand 78.38 dBSPL for the two syllables of the deviants (see Fig. 1d).

For the three-dimension-changed deviant, the intensity valueswere set to 67.57 dBSPL and 76.65 dBSPL for the first and secondsyllables, respectively (see Fig. 1e). All stimuli were lengthenedto 300 ms with an SOA of 600 ms. The waveforms and spectro-grams of standard and deviant tokens are shown in Fig. 1.

2.3. Procedure

Participants were tested individually in an electrically shieldedERP lab by a trained Cantonese-speaking experimenter and thefirst author, who has experience in children’s ERP testing. Priorto the testing, the caregiver was asked to complete a consent formand a parental questionnaire. The parental questionnaire served tohelp us to collect information about children’s language experienceand any history of language and learning difficulties. Followingcompletion of these forms, the preparation of the EEG recordingbegan.

After preparation, participants were given instructions aboutthe experimental procedure. Participants were seated in a comfort-able chair in front of and at a distance of 80 cm from the computermonitor. The participants were asked to watch a movie entitled‘‘The Mole” in silence while listening to the experimental stimuli.

The stimuli were binaurally delivered via headphones. Thestimuli were presented through three blocks, each of which con-sisted of 515 trials, starting with 15 trials of standard. The standardand four types of deviants including single dimensional change ofpitch, intensity, duration and three-dimensional change weremixed in each of the three blocks consisting of 40% deviants (10%for each deviant) and 60% standard. Thus, there were 150 trialsfor each type of deviant in total. The orders of the trials within ineach block were pseudo-randomized. The orders of the trialswithin each block were pseudo-randomized. Each block lastedapproximately 8 min. Participants were given 2 min for a break

between blocks during the experiment. During the experiment,participants were asked to refrain from moving in order to mini-mize EEG artifacts. The whole experiment, including preparationand breaks, lasted approximately 1.5 h.

2.4. EEG recordings

Electroencephalographic (EEG) signals were recorded from 64Ag/AgCI electrodes fitted on an elasticized cap. The 64 electrodeswere arranged according to the international 10–20 electrode sys-tem with reference to an electrode located between Cz and CPz.The activities of the right and left mastoids were also recorded.The vertical electrooculogram (EOG) was obtained from below ver-sus above the left eye (vertical EOG) and the left versus right lateralorbital rim (horizontal EOG). During recording, the electrodeimpedance was kept below 15 KX. The EEG and EOG signals wereamplified with a band-pass of 0.05–70 Hz and digitized on-line at asampling rate of 1000 Hz.

2.5. Data analysis

The EEG data were analyzed off-line using Scan 4.5 software.The EEG data were re-referenced to the average of both mastoids,which is commonly used in the MMN literature (e.g., Nenonen,Shestakova, Huotilainen, & Näätänen, 2003; Näätänen et al.,2004; Paavilainen, Simola, Jaramillo, Näätänen, & Winkler, 2001;Pakarinen, Huotilainen, & Näätänen, 2010). The continuous datawere filtered with a 0.3–30 Hz band-pass. Epochs of 600 ms fromstimulus onset were averaged separately for each condition, usinga 100 ms pre-stimulus as baselines. The first 15 standard trials andepochs with artifacts exceeding ±100 lm were rejected automati-cally. In addition, the standard trials that immediately followedthe deviant trials were excluded from averaging. Individual aver-ages included at least 97 accepted trials.

The mean amplitudes were computed separately for each par-ticipant and each condition in two time windows including 170–270 and 270–400 ms at the electrodes of F3, Fz, F4, FC3, FCz, andFC4. Repeated measures ANOVAs with experimental condition(standard versus four deviants), site (frontal, frontocentral) andhemisphere (left, middle, right) as within-subject factors were per-formed. If there were any significant interactions, one-way ANO-VAs were performed to unpack the interaction. To examinewhether all deviants elicited significant MMNs, the planned com-parisons were performed between the pitch-changed deviant andstandard, intensity-changed deviant and standard, duration-chan-ged deviant and standard, and three-dimension-changed deviantand standard. Moreover, planned comparisons were performedon the differences obtained by subtracting each deviant, i.e.,pitch-changed, intensity-changed, duration-changed, three-dimension-changed, from standard, in order to examine whetherthe MMNs elicited by each deviant were the same or different interms of mean amplitude in each time window. For each ANOVA,the Greenhouse-Geisser adjustment to the degrees of freedomwas used to correct for the violations of sphericity associated withrepeated measures.

3. Results

Fig. 2 shows the grand average of ERPs at the electrode of F3 forall types of stimuli. Fig. 3 shows the difference waveform of eachdeviant minus standard at the electrode of Fz. Fig. 4 shows thetopographic maps of each deviant minus standard. The topo-graphic voltage maps were obtained by subtracting the standardfrom the four types of deviants.

(a) (b)

(c) (d)

-100 100 200 300 400 500 600

5

-5

0ms

µVStandard

Difference Deviant

Fig. 2. ERP waveforms to the standard and deviant stimuli for (a) F0-changed deviant, (b) duration-changed deviant, (c) intensity-changed deviant, and (d) three-dimension-changed deviant at the electrode of F3.

-100 0 100 200 300 400 500 600

5

2.5

-2.5

-5

F0 - Standard

Intensity - Standard

Duration - Standard

Three-dimension -Standard

ms

µV

Fig. 3. ERP difference waveforms for the F0 deviant minus standard, intensitydeviant minus standard, duration deviant minus standard and three-dimension-changed deviant minus standard at the electrode of Fz.

66 X. Tong et al. / Brain & Language 138 (2014) 61–70

Visual inspection of the grand average of ERPs suggested that aprominent positive mismatch response (p-MMR) was identifiedbetween the pitch-changed deviant and standard in the time win-dow from 170 to 270 ms. The duration-changed deviant elicited anobvious negative mismatch response (MMN) in both the timewindows from 170 to 270 ms and 270 to 400 ms. It appeared thata p-MMR occurred between the intensity-changed deviant andstandard in the time window from 270 to 400 ms. And an MMNseemed to be prominent in the time window from 270 to 400 msfor the three-dimension-changed deviant relative to the standard.By visual detection, combining the typical time windows forp-MMR and MMN in children, we performed the statisticalanalyses on the data collected in these two time windows.

3.1. Analyses of 170–270 ms time window

There was a significant main effect of experimental condition inthe time window from 170 to 270 ms, F (4,68) = 10.52, p < .001,gp2 = .38. Planned comparisons further revealed that the meanamplitude of the pitch-changed deviant (M = 9.59 lm) was morepositive than the standard (M = 7.61 lm) (p < .01), and that themean amplitude of the duration-changed (M = 5.93 lm) deviantwas more negative than the standard (M = 7.61 lm) (p < .01). Also,there was a significant site effect in this time window, F (1,17) =14.74, p < .01, gp2 = .46. Planned comparisons further showed thatthe mean amplitude of the frontocentral site (M = 8.17 lm) was

more negative than the frontal site (M = 7.55 lm) (p < .01). More-over, a significant interaction between hemisphere and site wasfound, F (2,34) = 14.93, p < .001, gp2 = .47. The simple main effectanalyses further revealed that the mean amplitude for the lefthemisphere was more negative than the right hemisphere at thefrontal sites (p < .05). At the frontocentral site, the mean amplitudefor the right hemisphere was more negative than the ones for theleft hemisphere (p < .05), and middle line (ps < .05), respectively.

In addition, the analysis on the differences elicited by each devi-ant in the time window of 170–270 ms showed that the differenceobtained from the pitch-changed deviant was significantly morepositive than the ones elicited by the other three deviants(ps < .05). Also, the difference elicited by the duration-changeddeviant (M = �1.68 lm) was significantly more negative than thedifferences obtained from the pitch-changed deviant(M = 1.99 lm) (p < .05), intensity-changed deviant (M = .58 lm)(p < .05), duration-changed deviant (M = .38) (p < .05), and three-dimension-changed deviant (M = .38 lm) (p < .05).

3.2. Analyses of 270–400 ms time window

In this time window, the main effect of experimental conditionwas significant F (4,68) = 12.86, p < .001, gp2 = .43. Follow-up pair-wise comparisons indicated that the mean amplitude of the inten-sity-changed deviant (M = 4.41 lm) was more positive than themean amplitude of the standard (M = 2.27 lm) (p < .01), and thatthe mean amplitude of the duration-changed deviant(M = .36 lm) was more negative than the mean amplitude of thestandard (M = 2.27 lm) (p < .01). The mean amplitude of thethree-dimension-changed (M = 1.08 lm) was also more negativethan the standard (M = 2.27 lm) (p < .01). Also, there was a signifi-cant site effect in this time window F (1,17 = 7.91, p < .05, gp2 = .32.Planned comparisons showed that the mean amplitude of thefrontocentral site (M = 2.49 lm) was more negative than the frontal(M = 1.98 lm) (p < .05). There was no significant interaction foundin this time window (ps > .05).

In addition, we found that the difference elicited by the inten-sity-changed deviant (M = 2.18 lm) was more positive than thedifferences elicited by the other deviants, and that the differenceselicited by the duration-changed (M = �1.87 lm), and the

(a)

(b)

(c)

+3

0

-3

(d)

170-194 195-219 220-244 245-269 270-309 310-349 350-389 msFig. 4. Maps display the topographic distribution of the mean amplitude in the two analysis time windows from 170 to 400 ms for the deviant minus standard difference for(a) F0-changed deviant, (b) duration-changed deviant, (c) intensity-changed deviant, and (d) three-dimension-changed deviant.

X. Tong et al. / Brain & Language 138 (2014) 61–70 67

three-dimension-changed deviant (M = �1.14 lm) were more neg-ative than the differences elicited by the pitch-changed deviant(M = 86 lm), and intensity-changed deviant (ps < .05).

4. Discussion

In the present study we investigated Cantonese-speaking sec-ond graders’ brain correlates of the automatic detection of viola-tions in English lexical stress using an ERP measure with amultiple-deviant oddball paradigm. We manipulated three acous-tic correlates of stress including F0, duration, and intensity in a realEnglish disyllable word. We aimed to understand whether Canton-ese-speaking second graders were able to use the three acousticcorrelates of English lexical stress in stress perception and if so,what the neural markers were that were associated with theacoustic correlates, and whether the neural markers associatedwith each cue would be different from each other with an unfold-ing of the temporal course during English lexical stress perception.Our ERP results showed that in the time window from 170 to270 ms, a violation in F0 exhibited a significant positive mismatchresponse, and that the violation in duration elicited a typicalnegative mismatch negativity, the typical MMN. In the timewindow of 270–400 ms, the violations in intensity elicited asignificant p-MMR, whereas the violations occurring in all threedimensions elicited a significant MMN. These results indicated thatCantonese-speaking second graders are sensitive to the acousticchanges in F0, duration, and intensity of English lexical stressduring English lexical stress perception. Also, the changes of theacoustic correlates of English lexical stress may be associated withdifferent ERP components, which may reflect the fact that thediscriminability between the three acoustic cues and the standard

may vary from each other at different stages during English lexicalstress processing.

The present findings have important implications for under-standing English lexical stress perception in Cantonese-speakingchildren. First, these results demonstrate that Cantonese-speakingchildren depend upon all three acoustic cues, i.e., F0, duration, andintensity, in English lexical stress perception. This finding is inaccordance with previous results found for L1 adults’ stress per-ception, in which a rise of F0, longer duration, higher intensityand fuller vowel quality were observed to correlate with stressedsyllables in speakers whose L1 was English (e.g., Lieberman,1967; Medress, Skinner, & Anderson, 1971). There are cleardifferences that exist between Cantonese and English in phonologyat both the segmental and suprasegmental levels, however(e.g., Chan & Li, 2000; So & Dodd, 1995). For example, unlikeEnglish, which is a stress-timed language with fixed stress,Cantonese is a tonal language. Then why do Cantonese-speakingsecond graders show sensitivity to F0, duration and intensity, amanner that is similar to native speakers’ English stress perceptionobserved in previous research? Prior studies on L2 acquisition havesuggested that there is a transfer from L1 to L2 in L2 acquisition(e.g., Cisero & Royer, 1995; Durgunoglu, Nagy, & Hancin-Bhatt,1993; Gottardo, Yan, Siegel, & Wade-Woolley, 2001). Such transfercan occur at all levels such as phonology, syntax, semantics andpragmatics (e.g., Brenders, van Hell, & Dijkstra, 2011; Durgunogluet al., 1993; McBride-Chang, Bialystok, Chong, & Li, 2004;McBride-Chang, Cheung, Chow, Chow, & Choi, 2006; Meisel,1997). In particular, the transfer at the phonology level is muchmore remarkable than that of other linguistic levels at bothsegmental and suprasegmental levels (e.g., Ellis, 1994). Thus,Cantonese-speaking children may transfer the ability in tone

68 X. Tong et al. / Brain & Language 138 (2014) 61–70

perception to stress perception. That is, Cantonese-speaking chil-dren may perceive English stress with reference to their familiarL1 acoustic correlates of lexical tones.

Additionally, our results suggest that Hong Kong Cantonese-speaking children show different sensitivity to the three acousticcues in English lexical stress perception as reflected by differentERP components at different processing stages. In other words,the weights of the three acoustic cues may not be the same andmay vary with the unfolding processing stages in stress perceptionin Cantonese-speaking children. In the present study, in the170–270 ms time window, we found more positive ERPs in theF0 deviant relative to the standard, and more negative ERPs forthe contrast between duration-changed deviant and standard.However, no significant ERP effects were found for the contrastbetween intensity-changed and all three dimension-changed devi-ants. In contrast, in the 270–400 ms interval, a robust p-MMR waselicited by the intensity-changed, the duration-changed, and allthree dimension-changed deviants also elicited significant MMNs,but neither p-MMR nor MMN was observed for the F0-changeddeviant in this time window.

The components of p-MMR and MMN have been observed inseveral previous auditory and speech perception studies (e.g.,Cheng et al., 2013; Dehaene-Lambertz, 2000; Lee et al., 2012;Maurer et al., 2003) and they have been successfully used to exam-ine a variety of phonetic differences, such as frequency, intensity,duration, sound location or rhythm of a sound or speech signal;these have been suggested to be valuable tools in speech percep-tion (e.g., Cheour et al., 2000; Maurer et al., 2003; for a review,see Näätänen et al., 2007). The MMN might be an outcome of acomparison process between a new deviant stimulus and a mem-ory trace formed by the standard stimulus in the auditory system.As the discrimination becomes more difficult, the MMN gets smal-ler or disappears (for a review, see Näätänen et al., 2007). Forexample, when Gomes et al. (1999) examined children and adults’brain responses to difficult, medium, and easy deviants (1050,1200, 1500 deviants versus 1000 Hz standard), they found thatan MMN was only observed for medium and easy deviants, butnot for difficult deviants when children were ignoring the stimuli.

Although the function of p-MMR is still debated, the absenceand presence of p-MMR is associated with the features of stimulisuch as the deviance size and phonological saliency of speech (e.g., Cheng et al., 2013; Lee et al., 2012; Maurer et al., 2003). Forexample, Lee et al. (2012) reported that p-MMRs were observedfor the small vowel contrast of /di/ with /da/ in 5-year-old childrenwith a distribution in the midline and right hemispheric sites.Thus, researchers have proposed that the MMN may reflectenhanced and more mature discrimination ability, whereas p-MMR is associated with the more difficult discrimination in chil-dren. In the present study, the p-MMR was elicited by the F0 devi-ant in the time window from 170 to 270 ms, and by the intensitydeviant in the time window from 270 to 400 ms. MMNwas elicitedby the duration deviant at both the two time windows as well as bythe all three dimension changed deviants in the time window from270 to 400 ms. Although our findings cannot distinguish which cueis more prominent than the others, it is important to note that thedeviance between F0 and standard, and intensity and standardseems smaller and more difficult to discriminate; in contrast, thediscriminability between duration and standard tends to be easyto detect by Cantonese-speaking second graders. In other words,Cantonese-speaking children show different sensitivity to thethree acoustic cues in stress perception at the automatic and pre-attentive stages.

Our results appear to be a bit different from the findings fromWang (2008). These findings were that F0 was the most prominentcue for Chinese-speaking adults in English lexical stress percep-tion. Two potential explanations may account for this discrepancy.

First, the difference in L1 language background may be the mostimportant reason. More specifically, the L1 for participants inWang’s study was Mandarin, but Cantonese was the L1 for partic-ipants in our study. Although Mandarin and Cantonese are bothtonal languages, they differ in several ways in their respective pho-nological systems. For example, Mandarin only has four tones,while Cantonese has six (up to nine) tones (depending on howone counts it), which makes Cantonese tones more difficult to dis-tinguish compared to Mandarin tones for non-native speakers,indeed even for speakers of tonal languages. Thus, apart from theF0, Cantonese-speaking children may also use other acoustic cuessuch as duration and intensity in tone perception. They may trans-fer those skills to English stress perception.

Second, adults were the participants in Wang’s study (2008). Incontrast, the participants in our study were second graders with amean age of around 7 years old. Children may adopt different strat-egies in perceiving English stress from adults. That is, children mayuse different cues to perceive English stress because of their rela-tively unstable representations of both L1 and L2 speech acousticfeatures compared to adults. In fact, studies on stress in native Eng-lish-speaking children suggest that even native English-speakingchildren could not completely master the complexities of wordstress until about 12 years old (Kehoe, Stoel-Gammon, & Buder,1995). Similarly, a prior developmental study in tone perceptionsuggests that Cantonese children achieve adult-like performancein lexical tone perception by the age of 10 (Ciocca & Lui, 2003).Thus, our findings suggest that second grade Cantonese speakersmay not perceive stress in the same manner as adults.

It is also essential to highlight that our results argue especiallyfor the status of the three acoustic cues in stress perception in Can-tonese-speaking children. The influence of the three acoustic cuesseems to vary with the unfolding process of stress perception. Forexample, in this study we found that the acoustic cue of durationseems to affect stress perception as early as 170 ms and lasts until400 ms after the onset of the stimuli, which is associated with atypical MMN component; but the intensity cue seems to impactstress perception in the time window from 270 to 400 ms afterthe stimuli onset, which is indexed by a p-MMR component. Thissuggests that Cantonese-speaking children may depend more upona duration cue at the early stage of stress perception. Interestingly,children seems less dependent upon the cue of F0 at the early stageof stress perception, namely from the 170 to 270 ms time window,in the present study, reflected by the p-MMR component, but theyseemed to react more with the F0 cue in the late stage, namelyafter 400 ms in the present study. Although there was no mainexperimental effect found in the time window from 400 to600 ms, the independent t-test for the MMN obtained from thepitch-changed deviant and standard indicated that there was a sig-nificant MMN effect (p < .05). However, we are aware that althoughour study was among the first to investigate the neural markers ofacoustic cues of English stress in second Cantonese graders, it maynot have been conclusive regarding the status of acoustic cues instress perception in L2 children learners. There may be other pos-sibilities leading to the different brain responses of different acous-tic cues in English stress perception. For example, the developmentof language skills in both L1 and L2 is highly likely to affect the dis-criminability of acoustic cues of stress in stress perception. Theinfluence of language skills in both L1 and L2 needs to be furthertested in future research by directly comparing the neural discrim-ination of English stress in different age groups in comparison tothose of native English speaker groups.

In summary, our ERP results demonstrate that Cantonese-speaking second graders are sensitive to F0, duration, and inten-sity, in stress perception, as reflected by MMN or p-MMR. On theone hand, this finding shows that the MMN and p-MMR couldbe a valuable tool to investigate the neural discrimination of

X. Tong et al. / Brain & Language 138 (2014) 61–70 69

suprasegmental features of speech in children. On the other hand,this finding suggests that Cantonese second graders are able todetect the acoustic violation of English stress of spoken words,which may indicate that Cantonese second graders have a long-term memory representation of the acoustic cues of English stress.In addition, our results show that the influence of the three acous-tic cues on English stress perception may vary with the unfoldingprocess of stress, reflected by either MMN or p-MMR. Practically,the MMN or p-MMR elicited by the violation of acoustic cues ofEnglish stress may serve as an indicator for diagnosing difficultiesin L2 English learning among Hong Kong Chinese–English bilingualreaders. However, in this study, we manipulated the stress in a realEnglish disyllable, and we did not control for the lexical propertiesof the word such as the word frequency. It remains unclearwhether lexical properties would influence the temporal courseof stress perception in Chinese-speaking children. Also, we usedwords with an iambic stress pattern. Brain activity may be differ-ent with different stress patterns. Future work might attempt toextend our findings on the words with trochaic stress patterns. Itmay be also valuable to use pseudowords as stimuli to excludethe influence of lexical properties.

Acknowledgments

This research was supported by the General Research Fund ofthe Hong Kong Special Administrative Region Research GrantsCouncil (CUHK: 451811) and Collaborative Research Fund of theHong Kong Special Administrative Region Research Grants Council(CUHK: 2300035) to Catherine McBride. We thank the researchassistants for help with data collection, and children and parentsfor their participation.

References

Alho, K., Sainio, K., Sajaniemi, N., Reinikainen, K., & Näätänen, R. (1990). Event-related brain potential of human newborns to pitch change of an acousticstimulus. Electroencephalography and Clinical Neurophysiology/Evoked PotentialsSection, 77(2), 151–155. http://dx.doi.org/10.1016/0168-5597(90)90031-8.

Bauer, R. S., & Benedict, P. K. (1997). Modern cantonese phonology (Vol. 102).Walter de Gruyter.

Boersma, P., & Weenink, D. (2004). Praat: Doing phonetics by computer (Version4.2) [Computer program]. Retrieved 04.03.04.

Bolinger, D. (1965). In D. Bollinger (Ed.), Pitch accent and sentence rhythm in forms ofEnglish: Accent, morpheme, order. Cambridge, MA: Harvard U.P.

Brenders, P., van Hell, J. G., & Dijkstra, T. (2011). Word recognition in child secondlanguage learners: Evidence from cognates and false friends. Journal ofExperimental Child Psychology, 109(4), 383–396. http://dx.doi.org/10.1016/j.jecp.2011.03.012.

Chan, M. K. (2007). The perception and production of lexical stress by Cantonesespeakers of English. Unpublished Master Thesis, University of Hong Kong.

Chan, A. Y., & Li, D. C. (2000). English and Cantonese phonology in contrast:Explaining Cantonese ESL learners’ English pronunciation problems. LanguageCulture and Curriculum, 13(1), 67–85. http://dx.doi.org/10.1080/07908310008666590.

Chandrasekaran, B., Gandour, J. T., & Krishnan, A. (2007). Neuroplasticity in theprocessing of pitch dimensions: A multidimensional scaling analysis of themismatch negativity. Restorative Neurology & Neuroscience, 25(3/4), 195–210.

Chao, Y. R. (1947). Cantonese Primer. Cambridge, Mass: Harvard University Press.Cheng, Y. Y., Wu, H. C., Tzeng, Y. L., Yang, M. T., Zhao, L. L., & Lee, C. Y. (2013). The

development of mismatch responses to Mandarin lexical tones in early infancy.Developmental Neuropsychology, 38(5), 281–300. http://dx.doi.org/10.1080/87565641.2013.799672.

Cheour, M., Alho, K., Sainio, K., Reinikainen, K., Renlund, M., Aaltonen, O., et al.(1997). The mismatch negativity to changes in speech sounds at the age of threemonths. Developmental Neuropsychology, 13(2), 167–174. http://dx.doi.org/10.1080/87565649709540676.

Cheour, M., Leppänen, P. H. T., & Kraus, N. (2000). Mismatch negativity (MMN) as atool for investigating auditory discrimination and sensory memory in infantsand children. Clinical Neurophysiology, 111(1), 4–16. http://dx.doi.org/10.1016/S1388-2457(99)00191-1.

Ciocca, V., & Lui, J. (2003). The development of the perception of Cantonese lexicaltones. Journal of Multilingual Communication Disorders, 1(2), 141–147.

Cisero, C. A., & Royer, J. M. (1995). The development and cross-language transfer ofphonological awareness. Contemporary Educational Psychology, 20(3), 275–303.

Crystal, D. (1969). Prosodic systems and intonation in English. Cambridge: CambridgeUniversity Press.

Dehaene-Lambertz, G. (2000). Cerebral specialization for speech and non-speechStimuli in Infants. Journal of Cognitive Neuroscience, 12(3), 449–460. http://dx.doi.org/10.1162/089892900562264.

Durgunoglu, A. Y., Nagy, W. E., & Hancin-Bhatt, B. J. (1993). Cross-language transferof phonological awareness. Journal of Educational Psychology, 85(3), 453. http://dx.doi.org/10.5353/th_b3688930.

Ellis, R. (1994). The study of second language acquisition. Oxford University Press.Escera, C., Alho, K., Winkler, I., & Näätänen, R. (1998). Neural mechanisms of

involuntary attention to acoustic novelty and change. Journal of CognitiveNeuroscience, 10(5), 590–604. http://dx.doi.org/10.1162/089892998562997.

Friederici, A. D., Friedrich, M., & Weber, C. (2002). Neural manifestation of cognitiveand precognitive mismatch detection in early infancy. NeuroReport, 13(10),1251–1254. http://dx.doi.org/10.1097/00001756-200207190-00006.

Frost, D. (2011). Stress-cues to relative prominence in English and French: Aperceptual study. Journal of the International Phonetic Association, 41(1), 67–84.http://dx.doi.org/10.1017/s0025100310000253.

Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress.The Journal of the Acoustical Society of America, 27(4), 765–768. http://dx.doi.org/10.1121/1.1917773.

Fry, D. B. (1958). Experiments in the perception of stress. Language & Speech, 1(2),126–152. http://dx.doi.org/10.1177/002383095800100207.

Gomes, H., Sussman, E., Ritter, W., Kurtzberg, D., Cowan, N., & Vaughan, H. G. Jr.,(1999). Electrophysiological evidence of developmental changes in the durationof auditory sensory memory. Developmental Psychology, 35(1), 294. http://dx.doi.org/10.1037/0012-1649.35.1.294.

Gottardo, A., Yan, B., Siegel, L. S., & Wade-Woolley, L. (2001). Factors related toEnglish reading performance in children with Chinese as a first language: Moreevidence of cross-language transfer of phonological processing. Journal ofEducational Psychology, 93(3), 530. http://dx.doi.org/10.1037/0022-0663.93.3.530.

He, C., Hotson, L., & Trainor, L. J. (2009). Maturation of cortical mismatch responsesto occasional pitch change in early infancy: Effects of presentation rate andmagnitude of change. Neuropsychologia, 47(1), 218–229. http://dx.doi.org/10.1016/j.neuropsychologia.2008.07.019.

Jing, H., & Benasich, A. A. (2006). Brain responses to tonal changes in the first twoyears of life. Brain and Development, 28(4), 247–256. http://dx.doi.org/10.1016/j.braindev.2005.09.002.

Kehoe, M., Stoel-Gammon, C., & Buder, E. H. (1995). Acoustic correlates of stress inyoung children’s speech. Journal of Speech Hear Research, 38(2), 338–350.

Khouw, E., & Ciocca, V. (2007). Perceptual correlates of Cantonese tones. Journal ofPhonetics, 35(1), 104–117. http://dx.doi.org/10.1016/j.wocn.2005.10.003.

Klatt, D. H. (1976). Linguistic uses of segmental duration in English: Acoustic andperceptual evidence. The Journal of the Acoustical Society of America, 59, 1208.http://dx.doi.org/10.1121/1.380986.

Kuhl, P. K. (1998). Effects of language experience on speech perception. The Journalof the Acoustical Society of America, 103(5), 2931. http://dx.doi.org/10.1121/1.422159.

Lee, C. Y., Yen, H. L., Yeh, P. W., Lin, W. H., Cheng, Y. Y., Tzeng, Y. L., et al. (2012).Mismatch responses to lexical tone, initial consonant, and vowel in Mandarin-speaking preschoolers. Neuropsychologia, 50(14), 3228–3239. http://dx.doi.org/10.1016/j.neuropsychologia.2012.08.025.

Leung, C. S. S., Lim, S. E. A., & Li, Y. L. (2013). Implementation of the Hong Konglanguage policy in pre-school settings. Early Child Development and Care, 183(10), 1381–1396.

Lieberman, P. (1960). Some acoustic correlates of word stress in American English.The Journal of the Acoustical Society of America, 32(4), 451–454. http://dx.doi.org/10.1121/1.1936148.

Lieberman, P. (1967). Intonation, Perception and Language (Research Monograph, 38).Cambridge, MA: M.I.T Press.

Maurer, U., Bucher, K., Brem, S., & Brandeis, D. (2003). Altered responses to tone andphoneme mismatch in kindergartners at familial dyslexia risk. NeuroReport, 14(17), 2245–2250. http://dx.doi.org/10.1097/00001756-200312020-00022.

McBride-Chang, C., Bialystok, E., Chong, K. K. Y., & Li, Y. (2004). Levels ofphonological awareness in three cultures. Journal of Experimental ChildPsychology, 89(2), 93–111. http://dx.doi.org/10.1016/j.jecp.2004.05.001.

McBride-Chang, C., Cheung, H., Chow, B. W.-Y., Chow, C. S.-L., & Choi, L. (2006).Metalinguistic skills and vocabulary knowledge in Chinese (L1) and English(L2). Reading and Writing, 19, 695–716. http://dx.doi.org/10.1007/s11145-005-5742-x.

McBride-Chang, C., Tong, X., Shu, H., Wong, A. M. Y., Leung, K. W., & Tardif, T. (2008).Syllable, phoneme, and tone: Psycholinguistic units in early Chinese and Englishword recognition. Scientific Studies of Reading, 12(2), 171–194.

Meisel, J. M. (1997). The acquisition of the syntax of negation in French andGerman: Contrasting first and second language development. Second LanguageResearch, 13(3), 227–263. http://dx.doi.org/10.1191/026765897666180760.

Mesgarani, N., Cheung, C., Johnson, K., & Chang, E. F. (2014). Phonetic featureencoding in human superior temporal gyrus. Science, 343(6174), 1006–1010.

Medress, Skinner, T. E., & Anderson, D. E. (1971). Acoustic correlates of wordstress. 82nd Meeting of Acoustical Society of America, paper k3, DenverColorado, U.S.A.

Mol, H., & Uhlenbeck, E. M. (1955). The linguistic relevance of intensity in stress.Lingua, 5, 205–213. http://dx.doi.org/10.1016/0024-3841(55)90010-3.

Morr, M. L., Shafer, V. L., Kreuzer, J. A., & Kurtzberg, D. (2002). Maturation ofmismatch negativity in typically developing infants and preschool children. Earand Hearing, 23(2), 118–136. http://dx.doi.org/10.1097/00003446-200204000-00005.

70 X. Tong et al. / Brain & Language 138 (2014) 61–70

Morton, J., & Jassem, W. (1965). Acoustic correlates of stress. Language & Speech, 8(3), 159–181.

Näätänen, R. (2001). The perception of speech sounds by the human brain asreflected by the mismatch negativity (MMN) and its magnetic equivalent(MMNm). Psychophysiology, 38(1), 1–21. http://dx.doi.org/10.1111/1469-8986.3810001.

Näätänen, R., Paavilainen, P., Alho, K., Reinikainen, K., & Sams, M. (1989). Do event-related potentials reveal the mechanism of the auditory sensory memory in thehuman brain? Neuroscience Letters, 98(2), 217–221. http://dx.doi.org/10.1016/0304-3940.

Näätänen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity(MMN) in basic research of central auditory processing: A review. ClinicalNeurophysiology, 118(12), 2544–2590. http://dx.doi.org/10.1016/j.clinph.2007.04.026.

Näätänen, R., Pakarinen, S., Rinne, T., & Takegata, R. (2004). The mismatch negativity(MMN): Towards the optimal paradigm. Clinical Neurophysiology, 115(1),140–144. http://dx.doi.org/10.1016/j.clinph.2003.04.001.

Nenonen, S., Shestakova, A., Huotilainen, M., & Näätänen, R. (2003). Linguisticrelevance of duration within the native language determines the accuracy ofspeech-sound duration processing. Cognitive Brain Research, 16(3), 492–495.http://dx.doi.org/10.1016/s0926-6410(03)00055-7.

Ng, M. L., Gilbert, H. R., & Lerman, J. W. (2000). Fundamental frequency, intensity,and vowel duration characteristics related to perception of Cantonesealaryngeal speech. Folia Phoniatricaetlogopaedica, 53(1), 36–47. http://dx.doi.org/10.1159/000052652.

Nguyen, T. T. A., & Ingram, J. (2005). Vietnamese acquisition of English word stress.TESOL Quarterly, 39(2), 309–319. http://dx.doi.org/10.2307/3588314.

Paavilainen, P., Simola, J., Jaramillo, M., Näätänen, R., & Winkler, I. (2001).Preattentive extraction of abstract feature conjunctions from auditorystimulation as reflected by the mismatch negativity (MMN). Psychophysiology,38(2), 359–365. http://dx.doi.org/10.1111/1469-8986.3820359.

Pakarinen, S., Huotilainen, M., & Näätänen, R. (2010). The mismatch negativity(MMN) with no standard stimulus. Clinical Neurophysiology, 121(7), 1043–1050.http://dx.doi.org/10.1016/j.clinph.2010.02.009.

Pennington, M. C., & Ellis, N. C. (2000). Cantonese speakers’ memory for Englishsentences with prosodic cues. The Modern Language Journal, 84(3), 372–389.http://dx.doi.org/10.1111/0026-7902.00075.

Peperkamp, S., & Dupoux, E. (2002). A typological study of stress ‘deafness’. In C.Gussenhoven & N. Warner (Eds.). Laboratory phonology (Vol. 7, pp. 203–240).Berlin: Mouton de Gruyter.

Peperkamp, S., Vendelin, I., & Dupoux, E. (2010). Perception of predictable stress: Across-linguistic investigation. Journal of Phonetics, 38(3), 422–430.

Quam, C., & Swingley, D. (2014). Processing of lexical stress cues by young children.Journal of Experimental Child Psychology, 123, 73–89.

Selkirk, E. O. (1980). The role of prosodic categories in English word stress. LinguisticInquiry, 11(3), 563–605. http://dx.doi.org/10.2307/4178179.

Shafer, V. L., Yan, H. Y., & Datta, H. (2010). Maturation of speech discrimination in 4-to 7-yr-old children as indexed by event-related potential mismatch responses.Ear and Hearing, 31(6), 735–745. http://dx.doi.org/10.1097/aud.0b013e3181e5d1a7.

Shestakova, A., Huotilainen, M., & Cheour, M. (2003). Event-related potentialsassociated with second language learning in children. Clinical Neurophysiology,114(8), 1507–1512.

So, L. K., & Dodd, B. J. (1995). The acquisition of phonology by Cantonese-speakingchildren. Journal of Child Language, 22, 473–496. http://dx.doi.org/10.1111/j.1365-2788.1994.tb00439.x.

Swingley, D., & Aslin, R. N. (2000). Spoken word recognition and lexicalrepresentation in very young children. Cognition, 76(2), 147–166.

Tong, X., McBride, C., & Burnham, D. (in press). Cues for lexical tone perception inchildren: Acoustic correlates and phonetic context effects. Journal of Speech,Language, and Hearing Research. http://dx.doi.org/10.1044/2014_jslhr-s-13-0145.

Tong, X., Tong, X., & McBride-Chang, C. (2013). A tale of two writing systems:Double dissociation and metalinguistic transfer between Chinese and Englishword reading among Hong Kong children. Journal of Learning Disabilities. http://dx.doi.org/10.1177/0022219413492854.

Vance, T. J. (1976). An experimental investigation of tone and intonation inCantonese. Phonetica, 33(5), 368–392. http://dx.doi.org/10.1159/000259793.

Wang, Q. (2008). Perception of English stress by Mandarin Chinese learners of English:An acoustic study. Unpublished Doctoral Dissertation. University of Victoria.

Weber, C., Hahne, A., Friedrich, M., & Friederici, A. D. (2004). Discrimination of wordstress in early infant perception: Electrophysiological evidence. Cognitive BrainResearch, 18(2), 149–161. http://dx.doi.org/10.1016/j.cogbrainres.2003.10.001.

Wu, W. L., & Xu, Y. (2010). Prosodic focus in Hong Kong Cantonese without post-focus compression. Speech Prosody, 2010. http://dx.doi.org/10.1515/tlr-2012-0006.

Zhang, Y., Nissen, S. L., & Francis, A. L. (2008). Acoustic characteristics of Englishlexical stress produced by native Mandarin speakers. The Journal of theAcoustical Society of America, 123(6), 4498–4513. http://dx.doi.org/10.1121/1.2902165.