Segmental and suprasegmental features in speech perception in Cantonese-speaking second graders: An...

11
Segmental and suprasegmental features in speech perception in Cantonese-speaking second graders: An ERP study XIUHONG TONG, a CATHERINE McBRIDE, a CHIA-YING LEE, b JUAN ZHANG, c LAN SHUAI, d URS MAURER, e and KEVIN K. H. CHUNG f a Psychology Department, The Chinese University of Hong Kong, Hong Kong, China b The Institute of Linguistics, Academia Sinica, Taipei, Taiwan c Faculty of Education, University of Macau, Macau, China d Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, USA e Department of Psychology, University of Zurich, Zurich, Switzerland f Department of Special Education and Counselling, Hong Kong Institute of Education, Hong Kong, China Abstract Using a multiple-deviant oddball paradigm, this study examined second graders’ brain responses to Cantonese speech. We aimed to address the question of whether a change in a consonant or lexical tone could be automatically detected by children. We measured auditory mismatch responses to place of articulation and voice onset time (VOT), reflecting segmental perception, as well as Cantonese lexical tones including level tone and contour tone, reflecting suprasegmental perception. The data showed that robust mismatch negativities (MMNs) were elicited by all deviants in the time window of 300–500 ms in second graders. Moreover, relative to the standard stimuli, the VOT deviant elicited a robust positive mismatch response, and the level tone deviant elicited a significant MMN in the time window of 150–300 ms. The findings suggest that Hong Kong second graders were sensitive to neural discriminations of speech sounds both at the segmental and suprasegmental levels. Descriptors: Speech perception, Cantonese, Segmental, Suprasegmental, MMN, p-MMR The speech signal consists of segmental and suprasegmental infor- mation (Miller, 1978). In linguistics, the segmental features of speech are defined as “any discrete unit that can be identified, either physically or auditorily, in the stream of speech” (Crystal, 2003, pp. 408–409), such as consonants and vowels, which occur in a distinct temporal order. The suprasegmental features are concerned with units such as tone, stress, pitch, and length intonation, which always accompany the production of segmental features (Fox, 2000). Recently, there has been increasing neurophysiological research examining the neural discrimination of speech sounds at both segmental and suprasegmental levels in both adults and children (e.g., Chandrasekaran, Gandour, & Krishnan, 2007; Chandrasekaran, Krishnan, & Gandour, 2007; Gomes et al., 1999; Korpilahti, Krause, Holopainen, & Lang, 2001; Maurer, Bucher, Brem, & Brandeis, 2003; Näätänen, Jacobsen, & Winkler, 2005; Näätänen et al., 1997; Näätänen, Paavilainen, Rinne, & Alho, 2007; Shafer, Morr, Kreuzer, & Kurtzberg, 2000; Shafer, Schwartz, & Kurtzberg, 2004). In adults, converging evidence shows that perception of both segmental and suprasegmental information is highly automatic, which can be reflected by the mismatch negativ- ity (MMN), a component of the event-related potential (ERP). Specifically, the MMN is believed to be a marker of neural dis- crimination of speech perception at both the segmental level and the suprasegmental level (for a review, see Näätänen et al., 2007). However, the nature of the neurophysiological mechanisms of speech perception at both segmental and suprasegmental levels remains unclear in children thus far (Lee et al., 2012). Notably, few studies have investigated the neural discrimination of different speech contrasts at both the segmental and suprasegmental levels in the same group of school-aged children. In particular, little is known about the neural correlates of different speech information in Chinese children, whose native language is a tonal language, which differs from English, a nontonal language (e.g., Lee et al., 2012; Meng et al., 2005). In the present study, we were interested in Hong Kong Chinese second graders’ brain responses to different speech contrasts at both the segmental and suprasegmental levels. Specifically, we recorded children’s brain activity in response to place of articulation and voice onset time (VOT) of an initial consonant in a consonant-vowel (CV) Chinese syllable, reflecting segmental perception, as well as Cantonese lexical tones including level tone and contour tone, reflecting suprasegmental perception. A multiple-deviant oddball paradigm was adopted in the present study. We aimed at investigating the question of whether Chinese second graders were sensitive to neural discrimination of changes in consonant and lexical tones of This research was supported by the General Research Fund of the Hong Kong Special Administrative Region Research Grants Council (CUHK: 451811) to Catherine McBride. We would like to thank the three anony- mous reviewers for their helpful comments on earlier version of this article. We also thank all helpers on the data collection, and children and parents for their participation. Address correspondence to: Catherine McBride, Psychology Depart- ment, The Chinese University of Hong Kong, Shatin, Hong Kong, China. E-mail: [email protected] Psychophysiology, •• (2014), ••–••. Wiley Periodicals, Inc. Printed in the USA. Copyright © 2014 Society for Psychophysiological Research DOI: 10.1111/psyp.12257 1

Transcript of Segmental and suprasegmental features in speech perception in Cantonese-speaking second graders: An...

Segmental and suprasegmental features in speech perception inCantonese-speaking second graders: An ERP study

XIUHONG TONG,a CATHERINE McBRIDE,a CHIA-YING LEE,b JUAN ZHANG,c LAN SHUAI,d URS MAURER,e

and KEVIN K. H. CHUNGf

aPsychology Department, The Chinese University of Hong Kong, Hong Kong, ChinabThe Institute of Linguistics, Academia Sinica, Taipei, TaiwancFaculty of Education, University of Macau, Macau, ChinadDepartment of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, USAeDepartment of Psychology, University of Zurich, Zurich, SwitzerlandfDepartment of Special Education and Counselling, Hong Kong Institute of Education, Hong Kong, China

Abstract

Using a multiple-deviant oddball paradigm, this study examined second graders’ brain responses to Cantonese speech.We aimed to address the question of whether a change in a consonant or lexical tone could be automatically detected bychildren. We measured auditory mismatch responses to place of articulation and voice onset time (VOT), reflectingsegmental perception, as well as Cantonese lexical tones including level tone and contour tone, reflecting suprasegmentalperception. The data showed that robust mismatch negativities (MMNs) were elicited by all deviants in the time windowof 300–500 ms in second graders. Moreover, relative to the standard stimuli, the VOT deviant elicited a robust positivemismatch response, and the level tone deviant elicited a significant MMN in the time window of 150–300 ms. Thefindings suggest that Hong Kong second graders were sensitive to neural discriminations of speech sounds both at thesegmental and suprasegmental levels.

Descriptors: Speech perception, Cantonese, Segmental, Suprasegmental, MMN, p-MMR

The speech signal consists of segmental and suprasegmental infor-mation (Miller, 1978). In linguistics, the segmental features ofspeech are defined as “any discrete unit that can be identified, eitherphysically or auditorily, in the stream of speech” (Crystal, 2003,pp. 408–409), such as consonants and vowels, which occur in adistinct temporal order. The suprasegmental features are concernedwith units such as tone, stress, pitch, and length intonation, whichalways accompany the production of segmental features (Fox,2000). Recently, there has been increasing neurophysiologicalresearch examining the neural discrimination of speech soundsat both segmental and suprasegmental levels in both adults andchildren (e.g., Chandrasekaran, Gandour, & Krishnan, 2007;Chandrasekaran, Krishnan, & Gandour, 2007; Gomes et al., 1999;Korpilahti, Krause, Holopainen, & Lang, 2001; Maurer, Bucher,Brem, & Brandeis, 2003; Näätänen, Jacobsen, & Winkler, 2005;Näätänen et al., 1997; Näätänen, Paavilainen, Rinne, & Alho,2007; Shafer, Morr, Kreuzer, & Kurtzberg, 2000; Shafer, Schwartz,& Kurtzberg, 2004). In adults, converging evidence shows that

perception of both segmental and suprasegmental information ishighly automatic, which can be reflected by the mismatch negativ-ity (MMN), a component of the event-related potential (ERP).Specifically, the MMN is believed to be a marker of neural dis-crimination of speech perception at both the segmental level andthe suprasegmental level (for a review, see Näätänen et al., 2007).However, the nature of the neurophysiological mechanisms ofspeech perception at both segmental and suprasegmental levelsremains unclear in children thus far (Lee et al., 2012).

Notably, few studies have investigated the neural discriminationof different speech contrasts at both the segmental andsuprasegmental levels in the same group of school-aged children.In particular, little is known about the neural correlates of differentspeech information in Chinese children, whose native language is atonal language, which differs from English, a nontonal language(e.g., Lee et al., 2012; Meng et al., 2005). In the present study, wewere interested in Hong Kong Chinese second graders’ brainresponses to different speech contrasts at both the segmental andsuprasegmental levels. Specifically, we recorded children’s brainactivity in response to place of articulation and voice onset time(VOT) of an initial consonant in a consonant-vowel (CV) Chinesesyllable, reflecting segmental perception, as well as Cantoneselexical tones including level tone and contour tone, reflectingsuprasegmental perception. A multiple-deviant oddball paradigmwas adopted in the present study. We aimed at investigating thequestion of whether Chinese second graders were sensitive toneural discrimination of changes in consonant and lexical tones of

This research was supported by the General Research Fund of the HongKong Special Administrative Region Research Grants Council (CUHK:451811) to Catherine McBride. We would like to thank the three anony-mous reviewers for their helpful comments on earlier version of this article.We also thank all helpers on the data collection, and children and parents fortheir participation.

Address correspondence to: Catherine McBride, Psychology Depart-ment, The Chinese University of Hong Kong, Shatin, Hong Kong, China.E-mail: [email protected]

bs_b

s_ba

nner

Psychophysiology, •• (2014), ••–••. Wiley Periodicals, Inc. Printed in the USA.Copyright © 2014 Society for Psychophysiological ResearchDOI: 10.1111/psyp.12257

1

a CV Chinese syllable in speech perception. That is, we testedwhether the MMN, which is usually elicited in the oddball para-digm in adult ERP speech research, would be a marker of neuraldiscrimination of speech perception at both the segmental andsuprasegmental levels in Chinese second graders and whether itwould show peak latencies and polarity directions as typicallyobserved in adult speech perception.

MMN in Adults

The MMN is a negative-going and attention-independent compo-nent of the ERPs to any sufficiently discriminable change in asequence of repetitive auditory stimuli, and it peaks between 150 to200 ms with a distribution over the frontocentral sites but withinversion in polarity at inferior-posterior sites in adults (e.g.,Chandrasekaran, Gandour, & Krishnan, 2007; Cheour, Leppänen,& Kraus, 2000; Näätänen, Pakarinen, Rinne, & Takegata, 2004; fora review, see Näätänen et al., 2007). Adult MMN studies show thatMMN can be elicited by changes in acoustic features of an auditorystimulus that the participant can also discriminate behaviorallysuch as deviation in frequency (e.g., Jacobsen & Schröger, 2001;Paavilainen, Simola, Jaramillo, Näätänen, & Winkler, 2001), inten-sity (e.g., Kisley, Noecker, & Guinther, 2004; Näätänen et al.,2004), or duration (e.g., Grimm, Widmann, & Schröger, 2004;Jaramillo, Paavilainen, & Näätänen, 2000; Roeber, Widmann, &Schröger, 2003). The MMN can also be elicited by changes in morecomplex speech stimuli (e.g., changes in VOT, place of articula-tion, and lexical tone; e.g., Näätänen, 2001; Shafer et al., 2004;Sharma & Dorman, 1999; Tsang, Jia, Huang, & Chen, 2011). TheMMN is believed to reflect automatic detection of auditory changeat the preattentive stage and to originate from the auditory cortex(for a review, see Näätänen, Jacobsen, & Winkler, 2005). Theincrease of the MMN amplitude with a decrease of its peak latencyreflects participants’ discrimination of auditory elements (for areview, see Näätänen et al., 2007).

The magnitude of the MMN is found to be sensitive to thedegree of the difference between standard and deviants. Previousresearch on MMN has demonstrated that the peak amplitude ofMMN is larger and it peaks earlier with an increase in the degreeof deviance between standards and deviants (e.g., Dehaene-Lambertz, 1997; Pakarinen et al., 2009; Pakarinen, Takegata,Rinne, Huotilainen, & Näätänen, 2007; Tiitinen, May,Reinikainen, & Näätänen, 1994).The MMN is also modulated bythe acoustic salience of a sound. That is, a large MMN tends tobe elicited for a salient acoustic feature of a sound (e.g., Deouell,Bentin, & Giard, 1998; Isoguchi, Kanzaki, & Takahashi, 2011;Peltola et al., 2003). Deouell and colleagues (1998), for example,observed a larger MMN in peak amplitude for a change in fre-quency of a tone compared to the MMN found for changes inspatial location, intensity, and stimulus-onset asynchrony (SOA)even when other acoustic differences were matched across devi-ants. Pakarinen et al. (2007) reported a similar finding that fre-quency and duration elicited larger MMNs as compared to theintensity and location deviants. The authors also found that theMMNs elicited by duration peaked earlier and the MMNs elicitedby intensity peaked later than MMNs elicited by frequency andlocation deviants, but the latencies of MMNs elicited by fre-quency and location did not differ from each other. The largerMMN suggests that it is much easier for speakers to detect thedifference in frequency or duration between standard and deviantcompared to the changes in intensity, spatial location, and SOA.Additionally, phonological status of a sound is another critical

factor that affects the magnitude of the MMN, and this influencevaries with listeners’ native language (for a review, seeDehaene-Lambertz & Gliga, 2004). Previous studies have shownthat MMNs are easy to elicit with the same change within oracross syllables in listeners for whom the change is phonemicallyrelevant in their native language in comparison to listeners forwhom the change is not phonemically relevant (e.g., Sharma &Dorman, 2000; Winkler et al., 1999).

MMN in Children

Attention is not necessary to elicit MMN, although some studieshave demonstrated that MMN is somewhat modulated by attention(e.g., Hisagi, Shafer, Strange, & Sussman, 2010; for a review, seeSussman, 2007; Sussman, Winkler, & Wang, 2003). Thus, MMNmay be a useful tool to examine auditory or speech perception inthose who are limited in attention or motivation such as children(e.g., Friederici, Friedrich, & Weber, 2002; Kuhl, 1998; Lee et al.,2012; Shafer et al., 2000, 2010). There has been increasingresearch using the MMN paradigm to investigate auditory andspeech perception in children. However, while MMN has beenidentified as a stable marker of neural discrimination for the dif-ferent features of a sound in adults (for a review see Näätänen et al.,2007), the findings of change-related MMN have not been consist-ent across studies on auditory and speech perception in childrenthus far (e.g., Lee et al., 2012; Shafer, Yan, & Datta, 2010). Someresearchers suggest that the polarity and latency of the MMN mayvary with participants’ maturation (e.g., Lee et al., 2012; Maurer,Bucher, Brem, & Brandeis, 2003). For example, infants and youngchildren tend to exhibit MMNs that have a longer duration ordelayed onset compared to the MMN found in adults (e.g., Alho,Sainio, Sajaniemi, Reinikainen, & Näätänen, 1990; Cheng et al.,2013; Cheour et al., 1997; Cheour-Luhtanen et al., 1995; Lee et al.,2012). For example, Lovio et al. (2009) used a multideviant MMNparadigm to examine the neural response to changes in vowel,vowel duration, consonant, frequency (F0), and intensity inFinnish-speaking 6-year-old children. The authors found that sig-nificant adultlike MMNs were observed for all deviants against astandard, but the MMNs had a longer latency (mean 258–328 ms)compared to adults. However, some studies have shown no changesin MMN latencies from childhood to adulthood (e.g., Csepe,Dieckmann, Hoke, & Ross, 1992; Kraus et al., 1993).

A more dramatic age difference is the observation of a positivemismatch response (p-MMR) in infants and preschool children inresponse to discriminable changes in auditory and speech featuresof a sound. Infants and young children show a coexistence of theMMN and p-MMR in several studies (e.g., Cheng et al., 2013;Cheour et al., 1997; He, Hotson, & Trainor, 2009; Lee et al., 2012;Maurer et al., 2003; Shafer et al., 2010). Maurer and colleagues(2003), for example, used an oddball sequence with short intervals(every .38 s), comparing adults’ and 6- to 7-year-old children’sbrain responses to small frequencies and phoneme deviance (stand-ard: 1000 Hz and /ba/; larger deviance: 1060 Hz, /ta/; smaller devi-ance: 1030 Hz, /da/, for frequency and phoneme, respectively). Theauthors observed a typical frontocentral MMN in the time windowof 129–199 ms in adults, but a frontal p-MMR with posteriornegativity in the time window of 179–207 ms was found in chil-dren. Their source localization analysis demonstrated that thep-MMR in the child group and MMN among the adults originatedfrom superior temporal plane generators but with opposite polarity.They also observed that the p-MMR in children increased with thedegree of phoneme deviance, but not with the degree of tone

2 X. Tong et al.

deviance. Thus, Maurer and colleagues suggested that the p-MMRfound in children might have the same functional nature as thetypical adult MMN, which may reflect additional or increasedneural activation to deviants relative to standards.

In another recent study, Shafer et al. (2010) showed a coexist-ence of the MMN and p-MMR in 4- to 7-year-old children inresponse to an English vowel contrast, /I/ versus /ε/. Specifically,the authors observed an adultlike MMN but in the time window of300–400 ms in 4-, 5-, and 7-year-old children. In an earlier timewindow of 100–300 ms, they observed that children younger than5.5 years old and some of the older children showed a significantp-MMR in response to the vowel contrast. The amplitude of thep-MMR was revealed to diminish in older children. The authorssuggest that the p-MMR observed in younger children might reflecta recovery from refractoriness of the P100 (P1) obligatoryresponse, and the disappearance of the p-MMR in older childrenmight be attributed to “increased specificity of firing of neuronscontributing to the P1 component, and to overlap with the N1b andMMN, as they mature” (p. 734).

To some extent, the findings of Maurer et al. and Shafer et al.were replicated in a more recent study of Chinese young childrenwith Mandarin as a native language (Lee et al., 2012). Mandarin isa tonal language with a relatively simple syllable structure.A Mandarin syllable consists of onset, rime, and tone(McBride-Chang, Lin, Fong, & Shu, 2010). Using a multiple-deviant oddball paradigm, Lee and colleagues (2012) examined thedevelopmental trajectory of lexical tone processing (standard:tone3 (T3), deviants: tone1 (T1) and tone2 (T2)), initial consonants(standard: /ba1/; deviants: /da1/, /ga1/), and vowels (standard:/da1/; deviants: /di1/, /du1/) in three groups of Chinese childrenaged 4, 5, and 6 years. Lee et al. observed the coexistence of MMNand p-MMR in the same age group when responding to three typesof speech features. More specifically, they found that the largercontrast T1/T3 elicited adultlike MMNs in the time window of150–300 ms in the three age groups, but the small contrast T2/T3elicited p-MMRs in the 5- and 6-year-old groups. As for the vowelcontrasts, the large contrast of /du1/ with /da1/ elicited adultlikeMMNs in the three age groups with a distribution in the bilateralfrontocentral sites from 100–200 ms, whereas p-MMRs wereobserved for the small contrast of /di1/ with /da1/ in 5-year-oldchildren with a distribution in the midline and right hemisphericsites. The p-MMR was larger over the left hemispheric recordingsites from 150–200 ms; an adultlike MMN between 100–150 mswas also observed in 6-year-old children. As for the initial con-trasts, p-MMRs were observed for the larger contrast /ga1/ and/ba1/ in the time window of 150–300 ms across three age groups.However, the small contrast /ba1/ versus /da1/ elicited p-MMRonly in the 4-year-old group. These findings demonstrated that,with increasing age, the adultlike MMN becomes more prominent,whereas the p-MMR decreases, suggesting that the MMN mayreflect enhanced and more mature discrimination ability, and thep-MMR may reflect the more difficult discrimination in youngchildren. The authors thus suggest that MMN and p-MMR reflectdifferent functional characteristics.

However, questions such as what the p-MMR reflects and whenit may be present or absent continue to be debated among research-ers (e.g., Cheng et al., 2013; Lee et al., 2012; Maurer et al., 2003).For example, some researchers claim that the p-MMR may act asan analogy of sorts to the adultlike P3a, which is suggested toreflect distractability or an involuntary attention shift or the auto-matic categorization of stimuli (e.g., Escera, Alho, Winkler, &Näätänen, 1998; He et al., 2009; Kushnerenko, Ceponiene, Balan,

Fellman, & Näätänen, 2002; Shestakova, Huotilainen, & Cheour,2003). Other researchers hold the view that the p-MMR mightreflect a recovery from refractoriness, indexing the detection andencoding of the acoustic properties of a stimulus in connections inthe primary auditory cortex (e.g., Dehaene-Lambertz & Dehaene,1994; Friederici et al., 2002; Shafer et al., 2010; Shafer, Yu, &Garrido-Nag, 2012). Additionally, the p-MMR was observed inmost prior studies in infants or preschool children. It is rarelyobserved in school children (e.g., Lee et al., 2012). Nevertheless,more research is needed to understand the functional significanceof the mismatch response (positive or negative) in speech percep-tion in children. In particular, neurophysiology studies of speechperception in Cantonese children are very limited at both the seg-mental and suprasegmental levels.

The Present Study

There were two purposes of the present study. The first purpose wasto examine whether the MMN could be a neural index of Cantoneseinitial consonant and lexical tone discrimination in Cantonese-speaking second graders. The second purpose was to investigatewhether the peak latency and polarity of the MMN would varyacross different types of deviants. We manipulated the VOT andplace of articulation of an initial consonant of Chinese single-syllable and lexical tones including level tone and contour tone.The initial consonant and lexical tone are two components of aCantonese syllable. The consonant in a Cantonese syllable isoptional. That is, syllables may consist of vowels only. In total,there are 20 initial consonants. As a tone language, Cantonese usesvariations in pitch (e.g., high vs. low) or the pitch pattern (e.g.,rising vs. falling) to distinguish lexical meanings of morphemes.Lexical tone is, therefore, obligatory for a Cantonese syllable (Tan,Ching, Chan, Cheng, & Mak, 1995). There are six distinctive tones(up to nine, depending on how one counts it): tone1–high level(55), tone2–high rising (25), tone3–midlevel (33), tone4–lowfalling (21), tone5–low rising (23), tone6–low level (22).1 Amongthe six tones, tone1, tone3, and tone6 are categorized as level tonesin which there are few or no changes of pitch over time; tone2,tone4, and tone5 are contour tones—these pitches change appreci-ably over time (Wang, Yang, & Cheng, 2009).

In fact, there have been a number of studies using an ERP ormagnetoencephalography (MEG) paradigm examining the neuralcorrelates of place of articulation and VOT perception in adults innontonal languages (e.g., Dehaene-Lambertz, 1997; Phillips et al.,1995, 2000; Shafer et al., 2004; Sharma & Dorman, 1999). Ingeneral, significant MMNs are elicited by changes of VOT bothwithin and across consonants, as well as by changes in place ofarticulation in a typical MMN time window of 150–250 ms, andMMNs tend to be larger for the changes across a phonetic boundarythan within a phonetic boundary (Sharma & Dorman, 1999). A fewstudies in lexical tone perception have also reported detectableMMNs for both level tone and contour tone in tonal languages(e.g., Cheng et al., 2013; Lee et al., 2012; Sittiprapaporn,Chindaduangratn, Tervaniemi, & Khotchabhakdi, 2003; Tsanget al., 2011; Xi, Zhang, Shu, Zhang, & Li, 2010).

However, to our knowledge, there are only two develop-mental studies so far examining the trajectory of the MMN in

1. Chao (1980) first transcribed lexical tone in a numerical notationalsystem by using five levels (from lowest 1 to highest 5) to describe relativeheight, shape, and duration of pitch contour. High, middle, and low are thedescriptions of pitch ranges. Level, rising, and falling are the specificationsof the slopes of the pitch contours.

Speech perception in Cantonese children: An ERP study 3

Chinese-speaking children. The first study was conducted by Menget al. (2005). Using the MMN paradigm, Meng et al. (2005) exam-ined the differences in brain responses to pure frequency changes,pure tone changes, an initial consonant change, and a lexical tonechange in typically developing and dyslexic children aged 11 years.They found that all contrasts elicited significant MMNs in typicallydeveloping children, and the dyslexic group showed smallerMMNs in processing initial consonant and vowel syllable contrastsas well as marginally different MMNs in response to tone percep-tion. The other study was that of Lee et al. (2012) as mentionedabove. However, the native language for the participants in both theMeng et al. (2005) and Lee et al. (2012) studies was Mandarin.Although both Mandarin and Cantonese are tonal languages, theydiffer substantially in tone attributes. Specifically, there are onlyfour tones in Mandarin, but there are up to nine tones in Cantonese.The present study aimed to investigate Cantonese-speaking chi-ldren’s preattentive brain responses to changes in consonants andlexical tones to determine the nature of Cantonese-speaking chi-ldren’s neural discrimination of speech sound at both the segmentaland suprasegmental levels.

Based on findings of prior research on consonant and lexicaltone perception, we expected that changes in consonants andlexical tones would result in a significant either negative or positivemismatch response in Cantonese second graders. More specifically,we expected that all four of the deviants, that is, VOT, place ofarticulation, level tone, and contour tone, would elicit significantadultlike MMNs, and the p-MMR, which has been suggested todisappear after 7 years old, would not appear in second gradeCantonese-speaking children. In addition, we hypothesized that thebrain responses to speech sounds for Cantonese second gradersmight peak later or less than those that emerged in previous studiesof Mandarin-speaking children when considering the differences intone attributes between Cantonese and Mandarin. For example,there are only four tones in Mandarin but up to nine tones inCantonese, which may lead to greater density in perceptual spacefor Cantonese relative to Mandarin. There are also certain differ-ences in acoustic features between Cantonese tones and Mandarintones. For example, for Cantonese there are several acoustic cor-relates of lexical tones, that is, pitch height, pitch contour, and pitchonset. Recent evidence shows that pitch contour is important todistinguish tone contrasts differing in pitch contour such as highlevel (55) versus high-rising (25), whereas pitch onset (F0 onset) ismore important to distinguish tone contrasts having the samecontour but different pitch height, such as midlevel (33)-low level(22) (Tong, McBride, & Burnham, 2014). By contrast, pitchcontour is the primary acoustic cue to distinguish Mandarin tonecontrasts (Howie, 1976). In other words, Cantonese children’s per-ceptual system for Cantonese tones is much more fine grained anddelicate relative to the perceptual system for Mandarin tones (seeSo & Best, 2010). Thus, the sophisticated tone system to whichCantonese-speaking children are exposed tunes them in to subtleacoustic details, such as F0 onset. For this reason, we would expectsome perceptual differences between Cantonese- and Mandarin-speaking children.

Method

Participants

Participants were 17 Hong Kong second graders (7 females and 10males) aged between 88 and 99 months (Mage = 93.59 months,SD = 3.52 months). All children were Cantonese-native speakers.

None of the participants had a history of neurological, psychiatric,brain injury, or hearing problems reported by parents.

Materials and Design

The multiple-oddball paradigm was adopted in the present study.The experimental stimuli were five Cantonese syllables: /gaa1/,/gaa2/, /gaa3/, /daa1/, and /kaa1/. The syllable /gaa1/ was used asthe standard; the other four syllables were assigned as deviants.The stimuli were recorded by a female Cantonese speaker. Allstimuli were kept at 75 dB SPL; F0 of the five syllables wasbetween 200.4 and 288.3 Hz. The duration of the stimuli wasnormalized to 170 ms. Thus, in each trial, the stimuli lasted170 ms, with a 600-ms interstimulus interval (ISI). Among the fivestimuli, the contrasts /gaa1/ versus /daa1/, and /gaa1/ versus /kaa1/were classified to reflect the difference in initial consonant definedas segmental level condition; the contrasts /gaa1/ versus /gaa2/, and/gaa1/ and /gaa3/ were categorized as reflecting the difference inlexical tones defined as the suprasegmental level condition. Thecontrast of /gaa1/ and /daa1/ represented the difference in place ofarticulation; that is, the second formants of the stimuli showeddifferent start frequencies (the frequency of F1, F2, and F3 was1336.5 Hz, 2033.3 Hz, and 3063.3 Hz, respectively, for /gaa1/stimuli, and the frequency of F1, F2, and F3 was 1333.8 Hz,2026.4 Hz, and 3049.5 Hz, respectively, for /daa1/ stimuli), and thepitch was matched between /gaa1/ and /daa1/. The contrast of/gaa1/ and /kaa1/ differed in the VOT with a 41.3 ms difference.The contrast of /gaa1/ and /gaa2/ differed in pitch contour, and the/gaa1/ and /gaa3/ contrast differed in pitch height. The averagepitch value of /gaa1/, /gaa2/, and /gaa3/ was 282.9 Hz, 236.0 Hz,and 221.7, respectively. In addition, the minimum and maximumpitch values for /gaa2/ and /gaa3/ were 200.4 Hz and 288.3 Hz,respectively. The acoustic feature of each contrast is shown inFigure 1.

Procedure

All participants were tested individually in a sound-attenuated ERPlab. Before the experiment, we presented caregivers who broughtthe child to our lab with a consent form and parental questionnaireand asked them to complete each before testing began. The parentalquestionnaire was used to screen out children with attention deficithyperactivity disorder (ADHD), and other related learning disabil-ities; this questionnaire also ensured that the first language of allchildren was Cantonese. Following completion of these forms, thepreparation of the electroencephalographic (EEG) recordingbegan. After preparation, one Cantonese-speaking helper, who hadbeen trained before the testing, instructed each participant ondetails of the experimental procedure.

Participants were seated in a comfortable chair in front of acomputer monitor at a distance of 80 cm from the computermonitor. The participants were required to watch a movie, “TheMole,” in silence while listening to the experimental stimuli. Thestimuli were binaurally delivered via headphones. The stimuli werepresented through three blocks, each of which consisted of 515trials, starting with 15 trials of standard, followed by 40% deviants(10% for each deviant), and 60% standard. Thus, there were 150trials for each type of deviant in total. Among all stimuli includingall deviant trials and standard trials, 80% of the stimuli had tone1but with a different onset, that is, either with /g/ or /d/ or /k/; 80%of these had the /g/ onset. The order of the trials within each block

4 X. Tong et al.

was pseudorandomized. Participants were given 2 min for a breakbetween blocks during the experiment.

EEG Recordings

EEG activity was recorded from a 64-channel (Ag-AgCl)Neuroscan system. The 64 channels were arranged accordingto the 10/20 system with the reference electrode located betweenCz and CPz. The electrode located between Fpz and Fz served asthe ground electrode. The vertical electrooculogram (EOG)was obtained from below versus above the left eye, and the leftversus right lateral orbital rim defined the horizontal EOG.During recording, the electrode impedances were kept below15 kΩ. The EEG and EOG signals were amplified with aband-pass of 0.05–70 Hz and digitized at a sampling rate of1000 Hz.

Data Analysis

We analyzed the EEG data offline using Scan 4.4 software. TheEEG data were referenced to the average of all electrodes(Lehmann & Skrandies, 1980). Given that there are no inactivelocations on the head, choosing the average reference avoids

biasing the analysis by the selection of a particular electrode asreference (Koenig & Gianotti, 2009). The continuous data werefiltered with a 0.10–30 Hz band-pass. The filtered data were seg-mented into epochs of 700 ms long, with a 100-ms prestimulusinterval used for baseline correction. The first 15 standard trials andepochs with artifacts exceeding ± 100 μν were discarded (e.g.,Cheng et al., 2013; Lee et al., 2012). Also, the standard trials thatimmediately followed the deviant trials were excluded from aver-aging. The mean numbers of accepted deviants were 122, 123, 124,and 124 for deviants of place of articulation, VOT, level tone, andcontour tone, respectively, and that of the accepted standards was238.

The mean amplitudes of each condition were computed in theepoch from 150–300 ms and 300–500 ms at the electrodes F3, Fz,F4, FC3, FCz, and FC4. Repeated measures analyses of variance(ANOVAs) with experimental condition (standard, deviants), site(frontal, centrofrontal), and hemisphere (left, middle, right) aswithin-subject factors were performed separately for the segmentallevel condition and the suprasegmental level condition in the twotime windows. To examine whether each deviant elicited signifi-cant MMNs, the planned comparisons were performed betweeneach deviant condition and the shared standard condition. For eachANOVA, the Greenhouse-Geisser adjustment to the degrees of

Time (ms)75

500Fr

eque

ncy

(Hz)

a.

/daa1/ /gaa1/

Freq

uenc

y (H

z)

00 170 Time (ms)

5000b.

/gaa1/

/gaa3/

/gaa2/

c.

Time (ms)0 170-400

500

0

16.5

/gaa1/

57.8

/kaa1/

Freq

uenc

y (H

z)

Figure 1. a: Acoustic features of the sounds used in the present study. b: Tone contrast pitch difference. c: Spectrogram of /gaa1/ and /daa1/ showing theformant difference. VOT contrast of /gaa1/ and /kaa1/.

Speech perception in Cantonese children: An ERP study 5

freedom was used to correct for the violations of sphericity asso-ciated with repeated measures.

Results

The mismatch responses at Fz and the topographic voltage mapsobtained by subtracting the standard from the four types of deviantsand difference mismatch potential maps are shown in Figures 2 and3 for all types of stimuli, respectively. As shown in Figure 2, therewas no obvious MMN or p-MMR for place of articulation orcontour tone deviants in the time window from 150–300 ms.However, the level tone deviant elicited a clear MMN in the timewindow of 150–300 ms, whereas a p-MMR was observed in the

VOT deviant. In addition, there appear to be clear MMNs in thetime window from 300–500 ms for each of the deviants.

Segmental-Level Conditions (/daa1/ and /kaa1/ versus /gaa1/ standard)

150–300 ms time window. Statistical analyses revealed that therewas a significant main experimental condition effect in this timewindow, F(2,32) = 14.38, p < .001, ηp

2 47= . . Planned comparisonsdemonstrated that the VOT deviant (/kaa1/) was more positivethan the standard (M = 1.00 μν, SD = .29, and −.12 μν, SD = .30,respectively, p < .001). However, there was no significant differ-ence found between the place of articulation deviant (/daa1/) and

ms

-100 0 100 200 300 400 500 600µV

-5

5

b.

c.

StandardDeviant -Standard

Deviant

d.

a.

Figure 2. ERP waveforms to the standard and deviant stimuli for (a) placement of articulation, (b) VOT, (c) level tone, and (d) contour tone at the electrodeof Fz.

200-249 ms 250-299 ms150-199 ms

a.

b.

c.

d.

-3

+3

300-349 ms 350-399 ms 400-449 ms 450-499 ms

Figure 3. Maps display the topographic distribution of the mean amplitude in the MMN analysis window for the deviant minus standard difference for (a)placement of articulation, (b) VOT, (c) level tone, and (d) contour tone.

6 X. Tong et al.

the standard (M = −36 μν, SD =.28, and −.12 μν, SD = .30, respec-tively, p = .18).

300–500 ms time window. A significant main effect of experi-mental condition was found in this time window, F(2,32) = 9.48,p < .01, ηp

2 37= . . Follow-up pairwise comparisons indicated thatthe mean amplitude of the place of articulation deviant was morenegative than the mean amplitude of the standard (M = −2.85 μν,SD = .34, and −2.02 μν, SD = .20, respectively, p < .01), and themean amplitude of the VOT deviant was more negative thanthe mean amplitude of the standard (M = −3.48 μν, SD = .33, and−2.02 μν, SD = .20, respectively, p < .001).

Suprasegmental-Level Conditions (/gaa2/ and /gaa3/ versus/gaa1/ standard)

150–300 ms time window. For the lexical tone contrasts, a sig-nificant main effect of experimental condition was found in thistime window, F(2,32) = 8.66, p < .01, ηp

2 35= . . Follow-up pairwisecomparisons indicated that the mean amplitude of the level tonedeviant was more negative than the mean amplitude of the standard(M = −.86 μν, SD = .27, and −.12 μν, SD = .30, respectively,p < .05), but there was no significant difference found between thecontour tone deviant and standard (M = .15 μν, SD = .27, and−.12 μν, SD = .30, respectively, p = .25). In addition, a significantinteraction effect of Site × Condition was found in this timewindow, F(2,32) = 3.36, p < .05, ηp

2 17= . . Follow-up analysisshowed that the mean amplitude of the level tone was more nega-tive than the standard in the frontal site (p < .01), and there wasonly a marginally significant MMN found for the level tone in thefrontal central site (p = .06).

300–500 ms time window. In this time window, the ANOVA per-formed on the mean amplitude of the MMNs revealed that therewas a significant main effect of experimental condition,F(2,32) = 4.80, p < .05, ηp

2 23= . . Planned comparisons showedthat the level tone deviant was more negative than the standard(M = −2.89 μν, SD = .37, and −2.02 μν, SD = .20, respectively,p < .01). The mean amplitude elicited by the contour tone deviantwas also more negative than the mean amplitude elicited by thestandard (M = −2.58 μν, SD = .31, and −2.02 μν, SD = .20, forcontour tone deviant and standard, respectively, p < .05). Nohemisphere asymmetry effect was found in either of the two timewindows across deviants for either the segmental andsuprasegmental analyses in the present study.

Discussion

In this study, we used the ERP approach with an auditory multiple-deviant oddball paradigm to investigate Hong Kong secondgraders’ Cantonese speech perception at both the segmental andsuprasegmental levels. Specifically, we recorded second graders’brain responses to place of articulation and VOT at the segmentallevel, and to level tone and contour tone at the suprasegmentallevel. Interestingly, we found that the segmental level feature ofVOT elicited a robust positive response (p-MMR) relative to thestandard between 150 and 300 ms; in contrast, the suprasegmentalfeature of level tone elicited a significant negative response (typicalMMN) at the same time window. No detectable p-MMR or MMNwas found in the other two contrasts, that is, the place of articula-tion versus standard and the contour tone versus standard in thetime window of 150–300 ms. However, in a later time window of

300–500 ms, all deviants elicited significant MMNs, presumablyrelated to a late MMN (Korpilahti, Lang, & Aaltonen, 1995;Maurer et al., 2003) or the late discriminative negativity (Cheour,Korpilahti, Martynova, & Lang, 2001) as reported in previousstudies. Our findings suggest that Cantonese-speaking secondgraders are sensitive to automatic neural discrimination of theacoustic differences between the standard and deviants includingVOT, place of articulation, and lexical tones. We may also inferfrom our results that second graders might be more sensitive to theacoustic change in VOT compared to place of articulation, reflectedby a robust p-MMR found in the VOT deviant in a relatively earlytime window.

Our finding that robust MMNs were elicited for Cantonese-speaking second graders in response to all deviants including thesegmental level (i.e., VOT and place of articulation) and thesuprasegmental level (i.e., level tone and contour tone in the timewindow of 300–500 ms) is consistent with results found in previ-ous MMN research on speech perception in children (e.g., Maureret al., 2003; Shafer et al., 2010). For example, Shafer et al. (2010)found that 4- to 5- and 6- to 7-year-olds showed a robust MMN inthe time window between 300–400 ms in response to an Englishvowel contrast (e.g., /I/ versus /ε/); they also found a p-MMR in allyounger children and half of the older children in the time windowof 100–200 ms, and the younger children showed a bit longerlatency (up to 260 ms). However, the MMNs to all deviants foundin our present study were a bit different from the MMN found inadults in terms of peak latency. That is, the MMNs found in oursecond graders peaked in a relatively late time window, namely,from 300–500 ms, whereas MMN is often found to peak in the timewindow between 150–250 ms in adults (for a review, see Näätänenet al., 2007). Previous research on MMN in adults suggests that thelatency and amplitude of the MMN reflect the degree of partici-pants’ discriminability of speech sounds. With the increasing dif-ficulty between the standard and the deviant, “the MMN shifts laterin time and becomes smaller in amplitude” (Shafer et al., 2010; p.736). Thus, on the one hand, our finding suggests that Cantonese-speaking second graders show neural discrimination of speechsounds at both the segmental and suprasegmental levels; on theother hand, the MMN peaking in a relatively late time window mayindicate that Cantonese-speaking second graders may still not beable to discriminate speech sounds automatically at the preattentivestage as accurately as adults. This seems consistent with findingsfrom behavioral research that children performed worse than adultsin discrimination of fine-grained differences in speech stimuli(Elliott & Hammer, 1988).

In accordance with the findings from Mandarin-speaking chil-dren of the ages of 4, 5 and 6 years by Lee et al. (2012), the VOTdeviant elicited a detectable p-MMR in the time window of 150–300 ms in the present study. That is, p-MMR and MMN werecoexistent in Cantonese-speaking second graders when discrimi-nating the VOT deviant. Notably, the p-MMR is generallyobserved in oddball studies of newborns, infants, and young chil-dren when responding to auditory or speech stimuli (e.g, Alhoet al., 1990; Cheng et al., 2013; Korpilahti et al., 2001; Lee et al.,2012; Maurer et al., 2003). For example, Kurtzberg, Vaughan,Kreuzer, & Fliegler (1995) reported a p-MMR in newborns whenresponding to a pitch change among stimuli presented with 750and 1,000 ISIs. Morr, Shafer, Kreuzer, & Kurtzberg (2002)observed larger p-MMRs to two types of pitch deviants in infantsbetween 2 and 12 months using a 750 ms ISI. In a very recentstudy, Cheng et al. (2013) found a p-MMR in newborns, but anadultlike MMN was observed in infants from the age of 6 months

Speech perception in Cantonese children: An ERP study 7

when responding to a larger contrast between Mandarin lexicaltone1 and tone3, whereas the small contrast between lexicaltone2 and tone3 was only elicited in infants but not in newborns.Some studies have shown a robust p-MMR in young children(e.g., Lee et al., 2012; Maurer et al., 2003), and a few studieshave shown a p-MMR in children aged 7 years in relation toauditory and speech perception (e.g., Shafer et al., 2000). More-over, a significant p-MMR was elicited by the vowel /i/ inChinese tone1 and tone4 in Mandarin-speaking participants agedbetween 18 to 24 years in a recent study (Kuo et al., 2013). Thus,the existence of the p-MMR is not a phenomenon that is specificto infants and preschoolers only.

However, there is still a debate regarding the nature of thep-MMR in the literature on auditory and speech perception (Leeet al., 2012; Maurer et al., 2003; Shafer et al., 2010). One explana-tion for the presence of the p-MMR is that the p-MMR may reflectthe immature neural change-detection processes in infants andyoung children (e.g., Lee et al., 2012; Maurer et al., 2003).Perhaps, with maturation, the p-MMR is progressively replaced bythe MMN by about 4 years old. However, this explanation issomewhat questionable because the p-MMR has been observed inseveral studies in older children as well as in adults, as mentionedabove (e.g., Lee et al., 2012; Maurer et al., 2003; Shafer et al.,2010; Kuo, Lee, Chen, Liu, & Cheng, 2013). Thus, the presence ofthe p-MMR found in the present study of Cantonese-speakingsecond graders with a mean age of 7.8 years when discriminatingthe VOT deviant may not be accounted for by the immatureresponse hypothesis.

There are also some other different accounts proposed byresearchers to explain the presence of the p-MMR. For example,the p-MMR might be an immature type of P3a. P3a is a compo-nent with a frontocentral scalp distribution and is suggested toreflect the process of involuntary attention shifting; it is usuallypresent after the MMN component in research focused on MMN(e.g., He et al., 2009; Kushnerenko et al., 2002; Lee et al., 2012;Maurer et al., 2003). As argued by Shafer et al. (2010), if theprocesses indexed by P3a are dependent on discrimination carriedout in the system indexed by MMN, the p-MMR, which ishypothesized to be an immature type of P3a, should be presentafter the MMN component. However, the p-MMR has beenobserved in an earlier time window preceding the MMN inseveral recent studies (e.g., Lee et al., 2012; Maurer et al., 2003;Shafer et al., 2010). The MMN also came first in the presentstudy. Because of the debate regarding whether the elicitation ofP3a requires MMN, the p-MMR found in the present study maynot have the same function as the P3a. Another account, whichwe favor, is that the p-MMR may reflect the recovery fromrefractoriness of the P100 (P1) response with a source in auditorycortex in infants and young children (e.g., Mikkola et al., 2007;Shafer et al., 2010). This component might be canceled by theMMN for a specific contrast or at a specific time point (e.g., Leeet al., 2012; Shafer et al., 2010). We thus propose that thep-MMR found in the present study may be a recovery of P1. AsP1 may reflect very early auditory sensory encoding, the p-MMRmay index detection and encoding of the acoustic properties of astimulus in the present study.

Another interesting finding in the present study is that theMMNs emerged continuously in the 150–300 ms and 300–500 ms time windows for the level tone (tone3), but they wereonly observed in the time window of 300–500 ms for the contourtone (tone2) in Cantonese-speaking second graders. This findingmay suggest that the temporal course of the level tone and

contour tone perceptions may be different from each other inCantonese-speaking second graders. This may be because of thedifferent attributes of level tone and contour tone in Cantonese.As shown in Figure 1, the pitch height was held constant through-out the duration for level tone3, but the pitch height was changedwith the unfolding of time for contour tone2. In other words, thechange of tone3 with time was static but the change of tone2 wasdynamic. Previous behavioral studies on speech perception ateither the segmental or suprasegmental levels have revealed thatdynamic and static cues have different decay rates in auditoryworking memory, and a number of studies have shown that theperceptual trace of dynamic cues decays faster than that of staticcues (e.g., Pisoni, 1973; Tong, Francis, & Gandour, 2008). TheMMN is believed to reflect a comparison process between amemory constructed to the standard stimulus and to each incom-ing stimulus; thus, the MMN observed in the time window of150–300 ms in the present study may reflect the fact thatCantonese-speaking second graders were able to detect the dif-ference between tone3 and tone1 within a short period after theonset of the deviant, and they may monitor the process until theoffset of the stimuli, which may be reflected by the MMN foundin the time window of 300–500 ms. In addition, tone3 is a leveltone, whereas tone2 is a rising tone in Cantonese. Behavioralresearch on tone acquisition and perception has suggested that therising tone is more difficult to perceive than are the level tone andfalling tone for Chinese (e.g., Li & Thompson, 1977; Snow,1998; Varley & So, 1995). Thus, the absence of a significantMMN for tone2 in a relatively early time window (i.e., 150–300 ms) also suggests that children may have difficulty in pro-cessing tone2, which is a rising tone, leading to no detectableMMN found in a relatively early time window.

It is worth noting that, to some extent, Cantonese-speakingchildren appear to show brain responses that are similar to those ofMandarin-speaking children, at least as demonstrated by Lee et al.(2012), to speech sounds at both the segmental and suprasegmentallevels. For example, as in the present study, Lee et al. (2012) foundthat Mandarin-speaking children at 4, 5, and 6 years old were ableto discriminate changes in initial consonants, vowels, and lexicaltones, reflected by either positive or negative mismatch responses.However, some differences were also revealed in brain responses tochanges in initial consonants and lexical tones between Mandarin-speaking and Cantonese-speaking children. For example, thecoexistence of the p-MMR and MMN was found for Cantonese-speaking children for the VOT deviant, but only the p-MMR wasfound for the Mandarin-speaking children. In addition, onlyMMNs were found in tone perception for Cantonese-speaking chil-dren. In contrast, Lee et al. (2012) found a coexistence of thep-MMR and MMN in Mandarin young children in tone perception.Also, the MMNs found in Cantonese-speaking second gradersappear to peak a bit later compared to the MMNs found inMandarin-speaking young children for tone perception. Forexample, MMNs were found in the window of 150–200 ms in6-year-old children in the study by Lee et al. for the contrastbetween tone1 and tone2.

Two possibilities may account for those differences. First, asdiscussed above, the presence and/or absence of the p-MMRmight be associated with a change of age. Thus, age differencesfor the participants between the two studies may be one of thefactors leading to those differences. In our study, children werethose with a mean age of 7.8 years old. In contrast, the partici-pants were children 4, 5, and 6 years old in Lee et al.’s study(2012). A second reason may be the result of the differences in

8 X. Tong et al.

the phonological systems between Cantonese and Mandarin. Forexample, our finding that MMNs for our Cantonese-speaking par-ticipants peaked a bit later compared to the MMNs found inyoung Mandarin-speaking children for tone perception may beaccounted for by the differences between Mandarin and Canton-ese tones. Cantonese has more tones than Mandarin. Thus, theperceptual space may be less dense in Mandarin relative to Can-tonese. In addition, Cantonese tones also differ from Mandarintones in acoustic features. For example, in Cantonese, tone3 isthe midlevel tone in which the pitch height is held constant, butthe tone3 in Mandarin is a low dipping tone in which the pitchheight changes with the unfolding of time. We propose that thedifferences in perceptual space and acoustic features betweenCantonese and Mandarin might also result in differences in neuraldiscrimination in speech information between Cantonese andMandarin speakers.

In fact, developmental studies have demonstrated thatCantonese children acquire all tones later than Mandarin-speaking children do. Thus, Cantonese-speaking second graderswith a mean age of 7.8 years old may still not be able to dis-criminate speech sounds completely automatically at thepreattentive stage. This hypothesis is consistent with the evidencefrom behavioral studies on the development of Cantonese toneperception showing that Cantonese children achieve adultlike per-formance in lexical perception by the age of 10 (Ciocca & Lui,2003).

Conclusions

Our study was among the first to provide neuropsychologicalevidence for speech perception at both the segmental andsuprasegmental levels in Cantonese-speaking children. Results con-firmed that Cantonese-speaking second graders were sensitive toneural discriminations of changes in VOT, place of articulation, andlexical tones, including both level and contour tones, which werereflected by the robust adultlike MMNs, but with a relatively longerlatency found in all deviants. In addition, the p-MMR was observedin those children who were more than 7 years old. This may suggestthat the perception of the phonological contrasts used in our presentstudy is not mature by 7 years of age for Cantonese speakers. Thiscomponent may reflect a recovery from refractoriness of the P1component (Shafer et al., 2010). However, there is still a debateregarding the nature of the p-MMR in the literature on auditory andspeech perception (Lee et al., 2012; Maurer et al., 2003). Futurework might attempt to replicate and extend our findings on Canton-ese speech perception in children of different ages. Future researchshould also help in further determining when and what factors mayaffect the presence or attenuation/absence of the p-MMR in chil-dren. Moreover, our finding that the significant MMN could befound in both early and late time windows for level tone but only inthe late time window for contour tone may suggest that Hong KongCantonese-speaking second graders show differential sensitivitiesto pitch height and pitch contour in tone perception.

References

Alho, K., Sainio, K., Sajaniemi, N., Reinikainen, K., & Näätänen, R.(1990). Event-related brain potential of human newborns to pitchchange of an acoustic stimulus. Electroencephalography and ClinicalNeurophysiology, 77, 151–155. doi: 10.1016/0168-5597(90)90031-8

Chandrasekaran, B., Gandour, J. T., & Krishnan, A. (2007). Neuroplasticityin the processing of pitch dimensions: A multidimensional scalinganalysis of the mismatch negativity. Restorative Neurology & Neuro-science, 25, 195–210. doi: 10.1016/j.cogbrainres.2003.10.007

Chandrasekaran, B., Krishnan, A., & Gandour, J. T. (2007). Mismatchnegativity to pitch contours is influenced by language experience. BrainResearch, 1128, 148–56. doi: 10.1016/j.brainres.2006.10.064

Chao, Y. R. (1980). A system of tone letters. Fangyan (Dialect), 02, 81–83.

Cheng, Y.-Y., Wu, H.-C., Tzeng, Y.-L., Yang, M.-T., Zhao, L.-L., & Lee,C.-Y. (2013). The development of mismatch responses to Mandarinlexical tones in early infancy. Developmental Neuropsychology, 38,281–300. doi: 10.1080/87565641.2013.799672

Cheour, M., Alho, K., Sainio, K., Reinikainen, K., Renlund, M., Aaltonen,O., & Näätänen, R. (1997). The mismatch negativity to changes inspeech sounds at the age of three months. DevelopmentalNeuropsychology, 13, 167–174. doi: 10.1080/87565649709540676

Cheour, M., Korpilahti, P., Martynova, O., & Lang, A. H. (2001). Mismatchnegativity and late discriminative negativity in investigating speechperception and learning in children and infants. Audiology andNeurotology, 6(1), 2–11. doi: org/10.1159/000046804

Cheour, M., Leppänen, P. H., & Kraus, N. (2000). Mismatch negativity(MMN) as a tool for investigating auditory discrimination and sensorymemory in infants and children. Clinical Neurophysiology, 111, 4–16.doi: 10.1016/S1388-2457(99)00191-1

Cheour-Luhtanen, M., Alho, K., Kujala, T., Sainio, K., Reinikainen, K.,Renlund, M., & Näätänen, R. (1995). Mismatch negativity indicatesvowel discrimination in newborns. Hearing Research, 82, 53–58. doi:10.1016/0378-5955(94)00164-l

Ciocca, V., & Lui, J. (2003).The development of the perception of Canton-ese lexical tones. Journal of Multilingual Communication Disorders, 1,141–147. doi: 10.1080/1476967031000090971

Crystal, D. (2003). A dictionary of linguistics and phonetics (5th ed.).Malden, MA: Blackwell.

Csepe, V., Dieckmann, B., Hoke, M., & Ross, B. (1992). Mismatch nega-tivity to pitch change of acoustic stimuli in preschool- and school-agechildren. Proceedings of EPIC, 10, 32. doi: 10.1016/0168-5597(93)90063-u

Dehaene-Lambertz, G. (1997). Electrophysiological correlates of catego-rical phoneme perception in adults. NeuroReport, 8, 919–924.doi: 10.1097/00001756-199703030-00021

Dehaene-Lambertz, G., & Dehaene, S. (1994). Speed and cerebral corre-lates of syllable discrimination in infants. Nature, 370(6487), 292–295.doi: org/10.1038/370292a0

Dehaene-Lambertz, G., & Gliga, T. (2004). Common neural basis forphoneme processing in infants and adults. Journal of Cognitive Neuro-science, 16, 1375–1387. doi: 10.1162/0898929042304714

Deouell, L. Y., Bentin, S., & Giard, M. H. (1998). Mismatch negativity indichotic listening: Evidence for interhemispheric differences and multi-ple generators. Psychophysiology, 35, 355–365. doi: 10.1111/1469-8986.3540355

Elliott, L., & Hammer, M. (1988).Longitudinal changes in auditory dis-crimination in normal children and children with language-learningproblems. Journal of Speech Hearing Research, 53, 467–474.doi: 10.1016/0165-5876(89)90100-6

Escera, C., Alho, K., Winkler, I., & Näätänen, R. (1998). Neural mecha-nisms of involuntary attention to acoustic novelty and change. Journalof Cognitive Neuroscience, 10, 590–604. doi: 10.1162/089892998562997

Fox, A. (2000). Prosodic features and prosodic structure: The phonology ofsuprasegmentals. Oxford, UK: Oxford University Press

Friederici, A. D., Friedrich, M., & Weber, C. (2002). Neural manifestationof cognitive and precognitive mismatch detection in early infancy.NeuroReport, 13, 1251–1254. doi: 10.1097/00001756-200207190-00006

Gomes, H., Sussman, E., Ritter, W., Kurtzberg, D., Cowan, N., &Vaughan, H. G., Jr. (1999). Electrophysiological evidence ofdevelopmental changes in the duration of auditory sensory memory.Developmental Psychology, 35, 294–302. doi: 10.1037/0012-1649.35.1.294

Grimm, S., Widmann, A., & Schröger, E. (2004). Differential processing ofduration changes within short and long sounds in humans. NeuroscienceLetters, 356, 83–86. doi: 10.1016/j.neulet.2003.11.035

Speech perception in Cantonese children: An ERP study 9

He, C., Hotson, L., & Trainor, L. J. (2009). Maturation of cortical mismatchresponses to occasional pitch change in early infancy: Effects of pres-entation rate and magnitude of change. Neuropsychologia, 47, 218–229.doi: 10.1016/j.neuropsychologia.2008.07.019

Hisagi, M., Shafer, V. L., Strange, W., & Sussman, E. S. (2010). Perceptionof a Japanese vowel length contrast by Japanese and AmericanEnglish listeners: Behavioral and electrophysiological measures. BrainResearch, 1360, 89–105. doi: 10.1016/j.brainres.2010.08.092

Howie, J. M. (1976). Acoustical studies of Mandarin vowels and tones(No. 6). New York, NY: Cambridge University Press.

Isoguchi, T., Kanzaki, R., & Takahashi, H. (2011). Information processingfor acoustic saliency and emotional valence of sound in rat auditorycortex. Neuroscience Research, 71, e355. doi: 10.1016/j.neures.2011.07.1556

Jacobsen, T., & Schröger, E. (2001). Is there pre-attentive memory-basedcomparison of pitch? Psychophysiology, 38, 723–727. doi: 10.1111/1469-8986.3840723

Jaramillo, M., Paavilainen, P., & Näätänen, R. (2000). Mismatch negativityand behavioural discrimination in humans as a function of the magni-tude of change in sound duration. Neuroscience Letters, 290, 101–104.doi: 10.1016/s0304-3940(00)01344-6

Kisley, M. A., Noecker, T. L., & Guinther, P. M. (2004). Comparison ofsensory gating to mismatch negativity and self-reported perceptual phe-nomena in healthy adults. Psychophysiology, 41(4), 604–612. doi:10.1111/j.1469-8986.2004.00191.x

Koenig, T., & Gianotti, L. R. R. 2009. Scalp field maps and their charac-terization. In C. M. Michel, T. Koenig, D. Brandeis, L. R. R. Gianotti,& J. Wackermann (Eds.). Electrical neuroimaging (pp. 145–169).Cambridge, UK: University Press.

Korpilahti, P., Krause, C. M., Holopainen, I., & Lang, A. H. (2001). Earlyand late mismatch negativity elicited by words and speech-like stimuliin children. Brain and Language, 76, 332–339. doi: 10.1006/brln.2000.2426

Korpilahti, P., Lang, H., & Aaltonen, O. (1995). Is there a late-latencymismatch negativity (MMN) component? Electroencephalographyand Clinical Neurophysiology, 95, 95–97. doi: 10.1016/0013-4694(95)90016-g

Kraus, N., McGee, T., Micco, A., Sharma, A., Carrell, T., & Nicol, T.(1993). Mismatch negativity in school-age children to speech stimulithat are just perceptibly different. Electroencephalography and ClinicalNeurophysiology, 88, 123–130. doi: 10.1016/0168-5597(93)90063-u

Kuhl, P. K. (1998). Effects of language experience on speech perception.Journal of the Acoustical Society of America, 103, 1601–1602. doi:10.1121/1.422159

Kuo, Y. C., Lee, C. Y., Chen, M. C., Liu, T. L., & Cheng, S. K. (2013). Theimpact of spectral resolution on the mismatch response to MandarinChinese tones: An ERP study of cochlear implant simulations. ClinicalNeurophysiology. Advance online publication. doi: 10.1016/j.clinph.2013.11.035

Kurtzberg, D., Vaughan, H. G., Jr., Kreuzer, J. A., & Fliegler, K. Z. (1995).Developmental studies and clinical application of mismatch negativity:Problems and prospects. Ear and Hearing, 16, 105–117. doi. 10.1097/00003446-199502000-00008

Kushnerenko, E., Ceponiene, R., Balan, P., Fellman, V., & Näätänen, R.(2002). Maturation of the auditory change detection response in infants:A longitudinal ERP study. NeuroReport, 13, 1843–1848. doi: 10.1097/00001756-200210280-00002

Lehmann, D., & Skrandies, W. (1980). Reference-free identification ofcomponents of checkerboard-evoked multichannel potential fields.Electroencephalography Clinical Neurophysiology, 48, 609–621. doi:10.1016/0013-4694(80)90419-8

Lee, C.-Y., Yen, H.-L., Yeh, P.-W., Lin, W.-H., Cheng, Y.-Y., Tzeng, Y.-L.,& Wu, H.-C. (2012). Mismatch responses to lexical tone, initial conso-nant, and vowel in Mandarin-speaking preschoolers. Neuropsychologia,50, 3228–3239. doi: 10.1016/j.neuropsychologia.2012.08.025

Li, C. N., & Thompson, S. A. (1977). The acquisition of tone inMandarin-speaking children. Journal of Child Language, 4, 185–199.doi: 10.1017/s0305000900001598

Lovio, R., Pakarinen, S., Huotilainen, M., Alku, P., Silvennoinen, S.,Näätänen, R., & Kujala, T. (2009). Auditory discrimination profiles ofspeech sound changes in 6-year-old children as determined with themulti-feature MMN paradigm. Clinical Neurophysiology, 120, 916–921. doi: 10.1016/j.clinph.2009.03.010

Maurer, U., Bucher, K., Brem, S., & Brandeis, D. (2003). Development ofthe automatic mismatch response: From frontal positivity in kindergar-

ten children to the mismatch negativity. Clinical Neurophysiology, 114,808–817. doi: 10.1016/S1388-2457(03)00032-4

McBride-Chang, C., Lin, D., Fong, Y. C., & Shu, H. (2010). Basics ofChinese literacy. In M. H. Bond (Ed.), The Oxford handbook ofChinese psychology, (pp. 93–107). Oxford, UK: Oxford UniversityPress.

Meng, X., Sai, X., Wang, C., Wang, J., Sha, S., & Zhou, X. (2005).Auditory and speech processing and reading development in Chineseschool children: Behavioural and ERP evidence. Dyslexia, 11, 292–310.doi: 10.1002/dys.309

Mikkola, K., Kushnerenko, E., Partanen, E., Serenius-Sirve, S., Leipälä, J.,Huotilainen, M., & Fellman, V. (2007). Auditory event-relatedpotentials and cognitive function of preterm children at five years ofage. Clinical Neurophysiology, 118(7), 1494–1502. doi: 10.1016/j.clinph.2007.04.012

Miller, J. (1978). Interactions in processing segmental and suprasegmentalfeatures of speech. Perception & Psychophysics, 24, 175–180.doi: 10.3758/bf03199546

Morr, M. L., Shafer, V. L., Kreuzer, J. A., & Kurtzberg, D. (2002). Matu-ration of mismatch negativity in typically developing infants andpreschool children. Ear and Hearing, 23, 118–136. doi: 10.1097/00003446-200204000-00005

Näätänen, R. (2001). The perception of speech sounds by the human brainas reflected by the mismatch negativity (MMN) and its magnetic equiva-lent (MMNm). Psychophysiology, 38, 1–21. doi: 10.1111/1469-8986.3810001

Näätänen, R., Jacobsen, T., & Winkler, I. (2005). Memory-based or afferentprocesses in mismatch negativity (MMN): A review of the evi-dence. Psychophysiology, 42, 25–32. doi: 10.1111/j.1469-8986.2005.00256.x

Näätänen, R., Lehtokoski, A., Lennest, M., Luuki, A., Alliki, J., Sinkkonen,J., & Alho, K. (1997). Language-specific phoneme representationsrevealed by electric and magnetic brain responses. Nature, 385, 432–434. doi: 10.1038/385432a0

Näätänen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatchnegativity (MMN) in basic research of central auditory processing: Areview. Clinical Neurophysiology, 118, 2544–2590. doi: 10.1016/j.clinph.2007.04.026

Näätänen, R., Pakarinen, S., Rinne, T., & Takegata, R. (2004). Themismatch negativity (MMN): Towards the optimal paradigm.Clinical Neurophysiology, 115, 140–144. doi: 10.1016/j.clinph.2003.04.001

Paavilainen, P., Simola, J., Jaramillo, M., Näätänen, R., & Winkler, I.(2001). Preattentive extraction of abstract feature conjunctions fromauditory stimulation as reflected by the mismatch negativity (MMN).Psychophysiology, 38, 359–365. doi: 10.1111/1469-8986.3820359

Pakarinen, S., Lovio, R., Huotilainen, M., Alku, P., Näätänen, R., & Kujala,T. (2009). Fast multi-feature paradigm for recording several mismatchnegativities (MMNs) to phonetic and acoustic changes in speechsounds. Biological Psychology, 82, 219–226. doi: 10.1016/j.biopsycho.2009.07.008

Pakarinen, S., Takegata, R., Rinne, T., Huotilainen, M., & Näätänen, R.(2007). Measurement of extensive auditory discrimination profilesusing the mismatch negativity (MMN) of the auditory event-relatedpotential (ERP). Clinical Neurophysiology, 118, 177–185. doi: 10.1016/j.clinph.2006.09.001

Peltola, M. S., Kujala, T., Tuomainen, J., Ek, M., Aaltonen, O., & Näätänen,R. (2003). Native and foreign vowel discrimination as indexed by themismatch negativity (MMN) response. Neuroscience Letters, 352,25–28. doi: 10.1016/s0304-3940(03)00997-2

Phillips, C., Marantz, A., McGinnis, M., Pesetsky, D., Wexler, K., Yellin,A., & Rowley, H. (1995). Brain mechanisms of speech perception:A preliminary report. MIT Working Papers in Linguistics, 26, 125–163.

Phillips, C., Pellathy, T., Marantz, A., Yellin, E., Wexler, K., Poeppel, D., &Roberts, T. (2000). Auditory cortex accesses phonological categories:An MEG mismatch study. Journal of Cognitive Neuroscience, 12,1038–1055. doi: 10.1162/08989290051137567

Pisoni, D. B. (1973). Auditory and phonetic memory codes in the discrimi-nation of consonants and vowels. Perception & Psychophysics, 13,253–260. doi: 10.3758/bf03214136

Roeber, U., Widmann, A., & Schröger, E. (2003). Auditory distraction byduration and location deviants: A behavioral and event-related potentialstudy. Cognitive Brain Research, 17, 347–357. doi: 10.1016/s0926-6410(03)00136-8

10 X. Tong et al.

Sharma, A., & Dorman, M. F. (1999). Cortical auditory evoked potentialcorrelates of categorical perception of voice-onset time. Journal of theAcoustical Society of America, 106, 1078–1083. doi: 10.1121/1.428048

Sharma, A., & Dorman, M. F. (2000). Neurophysiologic correlates ofcross-language phonetic perception. Journal of the Acoustical Society ofAmerica, 107, 2697–2703. doi: 10.1121/1.428655

Shafer, V. L., Morr, M. L., Kreuzer, J. A., & Kurtzberg, D. (2000). Matu-ration of mismatch negativity in school-age children. Ear and Hearing,21, 242–251. doi: 10.1097/00003446-200006000-00008

Shafer, V. L., Schwartz, R. G., & Kurtzberg, D. (2004). Language-specificmemory traces of consonants in the brain. Cognitive Brain Research,18, 242–254. doi: 10.1016/j.cogbrainres.2003.10.007

Shafer, V. L., Yan, H. Y., & Datta, H. (2010). Maturation of speech dis-crimination in 4- to 7-yr-old children as indexed by event-related poten-tial mismatch responses. Ear and Hearing, 31, 735–745. doi: 10.1097/aud.0b013e3181e5d1a7

Shafer, V. L., Yu, Y. H., & Garrido-Nag, K. (2012). Neural mismatchindices of vowel discrimination in monolingually and bilinguallyexposed infants: Does attention matter? Neuroscience Letters, 526,10–14. doi: 10.1016/j.neulet.2012.07.064

Shestakova, A., Huotilainen, M., & Cheour, M. (2003). Event-relatedpotentials associated with second language learning in children. Clini-cal Neurophysiology, 114, 1507–1512. doi: 10.1016/s1388-2457(03)00134-2

Sittiprapaporn, W., Chindaduangratn, C., Tervaniemi, M., &Khotchabhakdi, N. (2003). Preattentive processing of lexical tone per-ception by the human brain as indexed by the mismatch negativityparadigm. Annals of the New York Academy of Sciences, 999, 199–203.doi: 10.1196/annals.1284.029

Snow, D. (1998). Children’s imitations of intonation contours: Are risingtones more difficult than falling tones? Journal of Speech, Language,and Hearing Research, 41, 576–587.

So, C. K., & Best, C. T. (2010). Cross-language perception of non-nativetonal contrasts: Effects of native phonological and phonetic influences.Language and Speech, 53, 273–293. doi: org/10.1177/0023830909357156

Sussman, E. S. (2007). A new view on the MMN and attention debate.Journal of Psychophysiology, 21, 164–175. doi: 10.1027/0269-8803.21.34.164

Sussman, E., Winkler, I., & Wang, W. (2003). MMN and attention: Com-petition for deviance detection. Psychophysiology, 40, 430–435. doi:10.1111/1469-8986.00045

Tan, L., Ching, P. C., Chan, L. W., Cheng, Y. H., & Mak, B. (1995). Tonerecognition of isolated Cantonese syllables. IEEE Transactions onSpeech and Audio Processing, 3, 204–209. doi: 10.1109/89.388147

Tiitinen, H., May, P., Reinikainen, K., & Näätänen, R. (1994). Attentivenovelty detection in humans is governed by pre-attentive sensorymemory. Nature, 372, 90–92. doi: 10.1038/372090a0

Tong, X., McBride, C. & Burnham, D. (2014). Cues for lexical tone per-ception in children: Acoustic correlates and phonetic context effects.Journal of Speech, Language, and Hearing Research. Advance onlinepublication. doi: 10.1044/2014_JSLHR-S-13-0145

Tong, Y., Francis, A. L., & Gandour, J. T. (2008). Processing dependenciesbetween segmental and suprasegmental features in Mandarin Chinese.Language and Cognitive Processes, 23, 689–708. doi: 10.1080/01690960701728261

Tsang, Y.-K., Jia, S., Huang, J., & Chen, H.-C. (2011). ERP correlates ofpre-attentive processing of Cantonese lexical tones: The effects of pitchcontour and pitch height. Neuroscience Letters, 487, 268–272. doi:10.1016/j.neulet.2010.10.035

Varley, R., & So, L. K. H. (1995). Age effects in tonal comprehension inCantonese. Journal of Chinese Linguistics, 23, 76–97.

Wang, M., Yang, C., & Cheng, C. (2009). The contributions of phonology,orthography, and morphology in Chinese–English biliteracy acquisi-tion. Applied Psycholinguistics, 30, 291–314. doi: 10.1017/S0142716409090122

Winkler, I., Kujala, T., Tiitinen, H., Sivonen, P., Alku, P., Lehtokoski, A., &Näätänen, R. (1999). Brain responses reveal the learning of foreignlanguage phonemes. Psychophysiology, 36, 638–642. doi: 10.1111/1469-8986.3650638

Xi, J., Zhang, L., Shu, H., Zhang, Y., & Li, P. (2010). Categorical perceptionof lexical tones in Chinese revealed by mismatch negativity. Neuro-science, 170, 223–231. doi: 10.1016/j.neuroscience.2010.06.077

(Received October 23, 2013; Accepted May 14, 2014)

Speech perception in Cantonese children: An ERP study 11