Hemispheric roles in the perception of speech prosody

14
Hemispheric roles in the perception of speech prosody $ Jackson Gandour, a, * Yunxia Tong, a Donald Wong, b Thomas Talavage, c Mario Dzemidzic, d Yisheng Xu, a Xiaojian Li, e and Mark Lowe f a Department of Audiology and Speech Sciences, Purdue University, West Lafayette, IN 47907-2038, USA b Department of Anatomy and Cell Biology, Indiana University School of Medicine, IN 46202-5120, USA c School of Electrical and Computer Engineering, Purdue University, IN 47907-2035, USA d MDZ Consulting Inc., Greenwood, IN 46143, USA e South China Normal University, Guangzhou, PR China f Cleveland Clinic Foundation, Cleveland, OH 44195, USA Received 11 March 2004; revised 2 June 2004; accepted 2 June 2004 Speech prosody is processed in neither a single region nor a specific hemisphere, but engages multiple areas comprising a large-scale spatially distributed network in both hemispheres. It remains to be elucidated whether hemispheric lateralization is based on higher-level prosodic representations or lower-level encoding of acoustic cues, or both. A cross-language (Chinese; English) fMRI study was conducted to examine brain activity elicited by selective attention to Chinese intonation (I) and tone (T) presented in three-syllable (I3, T3) and one- syllable (I1, T1) utterance pairs in a speeded response, discrimination paradigm. The Chinese group exhibited greater activity than the English in a left inferior parietal region across tasks (I1, I3, T1, T3). Only the Chinese group exhibited a leftward asymmetry in inferior parietal and posterior superior temporal (I1, I3, T1, T3), anterior temporal (I1, I3, T1, T3), and frontopolar (I1, I3) regions. Both language groups shared a rightward asymmetry in the mid portions of the superior temporal sulcus and middle frontal gyrus irrespective of prosodic unit or temporal interval. Hemispheric laterality effects enable us to distinguish brain activity associated with higher-order prosodic representations in the Chinese group from that associated with lower-level acoustic/auditory processes that are shared among listeners regardless of language experience. Lateralization is influenced by language experience that shapes the internal prosodic representa- tion of an external auditory signal. We propose that speech prosody perception is mediated primarily by the RH, but is left-lateralized to task-dependent regions when language processing is required beyond the auditory analysis of the complex sound. D 2004 Elsevier Inc. All rights reserved. Keywords: fMRI; Human auditory processing; Speech perception; Selective attention; Laterality; Language; Prosody; Intonation; Tone; Chinese Introduction The differential roles of the left (LH) and right (RH) cerebral hemispheres in the processing of prosodic information have re- ceived considerable attention over the last several decades. Evi- dence supporting an RH role in the perception of prosodic units at phrase- and sentence-level structures has been wide-ranging, in- cluding dichotic listening (Blumstein and Cooper, 1974; Shipley- Brown et al., 1988), lesion deficit (Baum and Pell, 1999; Bra ˚dvik et al., 1991; Pell, 1998; Pell and Baum, 1997; Weintraub et al., 1981), and functional neuroimaging (Gandour et al., 2003; George et al., 1996; Meyer et al., 2003; Plante et al., 2002; Wildgruber et al., 2002). Involvement of the LH in the perception of prosodic units at the syllable- or word-level structures has also been compelling with converging evidence from dichotic listening (Moen, 1993; Van Lancker and Fromkin, 1973; Wang et al., 2001), lesion deficit (Eng et al., 1996; Gandour and Dardarananda, 1983; Hughes et al., 1983; Yiu and Fok, 1995), and neuroimaging (Gandour et al., 2000, 2003; Hsieh et al., 2001; Klein et al., 2001). The precise mechanisms underlying functional asymmetry for speech prosody remain a matter of debate. Task-dependent hy- potheses focus on functional properties (e.g., tone vs. intonation) of the speech stimuli (Van Lancker, 1980), whereas cue-dependent hypotheses are directed to particular physical properties (e.g., temporal vs. spectral) of the acoustic signal (Ivry and Robertson, 1998; Poeppel, 2003; Schwartz and Tallal, 1980; Zatorre and Belin, 2001). Speech prosody is predicted to be right-lateralized by cue-dependent hypotheses. Hemispheric specialization, howev- er, appears to be sensitive to language-specific factors irrespective of neural mechanisms underlying lower-level auditory processing (Gandour et al., 2002). The Chinese (Mandarin) language can be exploited to address questions of functional asymmetry underlying prosodic processing that involve primarily variations in pitch. Chinese has four lexical tones (e.g., ma [tone 1] ‘‘mother’’, ma ‘‘hemp’’ [tone 2], ma [tone 3] ‘‘horse’’, ma [tone 4] ‘‘scold’’). Tones 1–4 can be described phonetically as high level, high rising, falling rising, and high falling, respectively (Howie, 1976). They are manifested at the level of the syllable, the smallest structural unit for carrying prosodic 1053-8119/$ - see front matter D 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2004.06.004 $ Supplementary data associated with this article can be found, in the online version, at doi: 10.1016/j.neuroimage.2004.06.004. * Corresponding author. Department of Audiology and Speech Sciences, Purdue University, 1353 Heavilon Hall, 500 Oval Drive, West Lafayette, IN 47907-2038. Fax: +1-765-494-0771. E-mail address: [email protected] (J. Gandour). Available online on ScienceDirect (www.sciencedirect.com.) www.elsevier.com/locate/ynimg NeuroImage 23 (2004) 344 – 357

Transcript of Hemispheric roles in the perception of speech prosody

www.elsevier.com/locate/ynimg

NeuroImage 23 (2004) 344–357

Hemispheric roles in the perception of speech prosody$

Jackson Gandour,a,* Yunxia Tong,a Donald Wong,b Thomas Talavage,c Mario Dzemidzic,d

Yisheng Xu,a Xiaojian Li,e and Mark Lowef

aDepartment of Audiology and Speech Sciences, Purdue University, West Lafayette, IN 47907-2038, USAbDepartment of Anatomy and Cell Biology, Indiana University School of Medicine, IN 46202-5120, USAcSchool of Electrical and Computer Engineering, Purdue University, IN 47907-2035, USAdMDZ Consulting Inc., Greenwood, IN 46143, USAeSouth China Normal University, Guangzhou, PR ChinafCleveland Clinic Foundation, Cleveland, OH 44195, USA

Received 11 March 2004; revised 2 June 2004; accepted 2 June 2004

Speech prosody is processed in neither a single region nor a specific

hemisphere, but engages multiple areas comprising a large-scale

spatially distributed network in both hemispheres. It remains to be

elucidated whether hemispheric lateralization is based on higher-level

prosodic representations or lower-level encoding of acoustic cues, or

both. A cross-language (Chinese; English) fMRI study was conducted

to examine brain activity elicited by selective attention to Chinese

intonation (I) and tone (T) presented in three-syllable (I3, T3) and one-

syllable (I1, T1) utterance pairs in a speeded response, discrimination

paradigm. The Chinese group exhibited greater activity than the

English in a left inferior parietal region across tasks (I1, I3, T1, T3).

Only the Chinese group exhibited a leftward asymmetry in inferior

parietal and posterior superior temporal (I1, I3, T1, T3), anterior

temporal (I1, I3, T1, T3), and frontopolar (I1, I3) regions. Both

language groups shared a rightward asymmetry in the mid portions of

the superior temporal sulcus and middle frontal gyrus irrespective of

prosodic unit or temporal interval. Hemispheric laterality effects

enable us to distinguish brain activity associated with higher-order

prosodic representations in the Chinese group from that associated

with lower-level acoustic/auditory processes that are shared among

listeners regardless of language experience. Lateralization is influenced

by language experience that shapes the internal prosodic representa-

tion of an external auditory signal. We propose that speech prosody

perception is mediated primarily by the RH, but is left-lateralized to

task-dependent regions when language processing is required beyond

the auditory analysis of the complex sound.

D 2004 Elsevier Inc. All rights reserved.

Keywords: fMRI; Human auditory processing; Speech perception; Selective

attention; Laterality; Language; Prosody; Intonation; Tone; Chinese

1053-8119/$ - see front matter D 2004 Elsevier Inc. All rights reserved.

doi:10.1016/j.neuroimage.2004.06.004

$ Supplementary data associated with this article can be found, in the

online version, at doi: 10.1016/j.neuroimage.2004.06.004.

* Corresponding author. Department of Audiology and Speech

Sciences, Purdue University, 1353 Heavilon Hall, 500 Oval Drive, West

Lafayette, IN 47907-2038. Fax: +1-765-494-0771.

E-mail address: [email protected] (J. Gandour).

Available online on ScienceDirect (www.sciencedirect.com.)

Introduction

The differential roles of the left (LH) and right (RH) cerebral

hemispheres in the processing of prosodic information have re-

ceived considerable attention over the last several decades. Evi-

dence supporting an RH role in the perception of prosodic units at

phrase- and sentence-level structures has been wide-ranging, in-

cluding dichotic listening (Blumstein and Cooper, 1974; Shipley-

Brown et al., 1988), lesion deficit (Baum and Pell, 1999; Bradvik et

al., 1991; Pell, 1998; Pell and Baum, 1997; Weintraub et al., 1981),

and functional neuroimaging (Gandour et al., 2003; George et al.,

1996; Meyer et al., 2003; Plante et al., 2002; Wildgruber et al.,

2002). Involvement of the LH in the perception of prosodic units at

the syllable- or word-level structures has also been compelling with

converging evidence from dichotic listening (Moen, 1993; Van

Lancker and Fromkin, 1973; Wang et al., 2001), lesion deficit (Eng

et al., 1996; Gandour and Dardarananda, 1983; Hughes et al., 1983;

Yiu and Fok, 1995), and neuroimaging (Gandour et al., 2000, 2003;

Hsieh et al., 2001; Klein et al., 2001).

The precise mechanisms underlying functional asymmetry for

speech prosody remain a matter of debate. Task-dependent hy-

potheses focus on functional properties (e.g., tone vs. intonation) of

the speech stimuli (Van Lancker, 1980), whereas cue-dependent

hypotheses are directed to particular physical properties (e.g.,

temporal vs. spectral) of the acoustic signal (Ivry and Robertson,

1998; Poeppel, 2003; Schwartz and Tallal, 1980; Zatorre and

Belin, 2001). Speech prosody is predicted to be right-lateralized

by cue-dependent hypotheses. Hemispheric specialization, howev-

er, appears to be sensitive to language-specific factors irrespective

of neural mechanisms underlying lower-level auditory processing

(Gandour et al., 2002).

The Chinese (Mandarin) language can be exploited to address

questions of functional asymmetry underlying prosodic processing

that involve primarily variations in pitch. Chinese has four lexical

tones (e.g.,ma [tone 1] ‘‘mother’’,ma ‘‘hemp’’ [tone 2],ma [tone 3]

‘‘horse’’, ma [tone 4] ‘‘scold’’). Tones 1–4 can be described

phonetically as high level, high rising, falling rising, and high

falling, respectively (Howie, 1976). They are manifested at the level

of the syllable, the smallest structural unit for carrying prosodic

J. Gandour et al. / NeuroImage 23 (2004) 344–357 345

features, on a time scale of 200–350 ms. Intonation, on the other

hand, is manifested at the phrase or sentence level, typically on a

time scale of seconds. In Chinese, interrogative intonation exhibits a

higher pitch contour than that of its declarative counterpart (Shen,

1990) as well as a wider pitch range for sentence-final tones (Yuan

et al., 2002). In English, interrogative sentences do not have overall

higher pitch contours than declarative sentences, nor do they show

any effects of tone and intonation interaction in sentence-final

position. Chinese interrogative intonation with a final rising tone

has a rising end, which is similar to English, whereas that with a

final falling tone often has a falling end (Yuan et al., 2002).

In a previous fMRI study of Chinese tone and intonation (Gan-

dour et al., 2003), both tone and intonation were judged in sentences

presented at a fixed length (three words), and we observed left-

lateralized lexical tone perception in comparison to intonation.

However, the prosodic unit listeners selectively attended to and

the temporal interval of attentional focus were coterminous. In

judgments of lexical tone, the focus of attention was on the final

word only, whereas judgments of intonation required that the focus

be directed to the entire sentence. Whether the principal driving

force in hemispheric lateralization of speech prosody is due to the

temporal interval of attentional focus rather than the hierarchical

level of linguistic units is not yet well-established. The aim of the

present study is to determine whether the temporal interval in which

prosodic units are presented influences the neural substrates used in

prosodic processing. As such, participants are asked to make per-

ceptual judgments of tone and intonation in one-syllable and three-

syllable Chinese utterances. By comparing activation in homolo-

gous regions of both hemispheres, we can assess the extent to which

hemispheric laterality for speech prosody is driven by the temporal

interval, prosodic unit, or both. Only native Chinese speakers

possess implicit knowledge that relates external auditory cues to

internal representations of tone and intonation. By employing two

language groups, one consisting of Chinese speakers, the other of

English speakers, we are able to determine whether activation of

particular brain areas is sensitive to language experience.

Materials and methods

Subjects

Ten native speakers of Mandarin (five male; five female) and

ten native speakers of American English (five male; five female)

were closely matched in age/years of education (Chinese: M = 29/

19; English: M = 27/19). All subjects were strongly right-handed

(Oldfield, 1971) and exhibited normal hearing sensitivity. All

subjects gave informed consent in compliance with a protocol

approved by the Institutional Review Board of Indiana University

Purdue University Indianapolis and Clarian Health.

Stimuli

Stimuli consisted of 36 pairs of three-syllable Chinese utter-

ances, and 44 pairs of one-syllable Chinese utterances. Utterances

were designed with two intonation patterns (declarative, interrog-

ative) in combination with the four Chinese tones on the utterance-

final syllable (Fig. 1). Focus was held constant on the utterance-

final syllable. No adjacent syllables in the three-syllable utterances

formed bisyllabic words to minimize lexical-semantic processing.

Tone or intonation each differed in 36% of the pairs for the one-

syllable utterances, 39% of the pairs for the three-syllable utter-

ances. Stimuli that were identical in both tone and intonation

comprised 28% and 22% of the pairs in one-syllable and three-

syllable utterances, respectively.

Recording procedure

A 52-year-old male native speaker of Mandarin was instructed

to read one- and three-syllable utterances at a conversational

speaking rate in a declarative and interrogative sentence mood. A

reading task was chosen to maximize the likelihood of simulating

normal speaking conditions as much as possible while at the same

time controlling the syntactic, prosodic, and segmental character-

istics of the spoken sentences. To enhance the naturalness of

producing the three-syllable utterances, he was told to treat them

as SVO (subject verb object) sentences with non-emphatic stress

placed on the final syllable. All items in the list were typed in

Chinese characters. A sufficient pause was provided between items

to ensure that the speaker maintained a uniform speaking rate. By

controlling the pace of presentation, we maximized the likelihood

of obtaining consistent, natural-sounding productions. To avoid

list-reading effects, extra items were placed at the top and bottom

of the list. Recordings were made in a double-walled soundproof

booth using an AKG C410 headset type microphone and a Sony

TCD-D8 digital audio tape recorder. The subject was seated and

wore a custom-made headband that maintained the microphone at a

distance of 12 cm from the lips.

Prescreening identification procedure

All one- and three-syllable utterances were presented individ-

ually in random order for identification by five native speakers of

Chinese who were naive to the purposes of the experiment. They

were asked to respond whether they heard a declarative or

interrogative intonation and to indicate the tone occurring on the

final syllable. Only those stimuli that achieved a perfect (100%)

recognition score for both intonation and tone were retained for

possible use as stimuli in our training and experimental sessions.

Task procedure

The experimental paradigm consisted of four active tasks (Table

1) and a passive listening task. The active tasks required discrim-

ination judgments of intonation (I) and tone (T) in paired three-

syllable (I3, T3) and one-syllable (I1, T1) utterances. Subjects were

instructed to focus their attention on either the utterance-level

intonation or the lexical tone of the final syllable, make discrim-

ination judgments, and respond by pressing a mouse button (left =

same; right = different). The control task involved passive listening

to the same utterances, either one-syllable utterances (L1) or three-

syllable utterances (L3). Subjects responded by alternately pressing

the left and right mouse button after each trial.

A scanning sequence consisted of two tasks presented in a

blocked format alternating with rest periods (Fig. 2). The one-

syllable and three-syllable utterance blocks contained 11 and 9

trials, respectively. The order of scanning runs and trials within

blocks were randomized for each subject. Instructions were deliv-

ered to subjects in their native language via headphones during rest

periods immediately preceding each task: ‘‘listen’’ for passive

listening to speech stimuli, ‘‘intonation’’ for same–different judg-

ments on Chinese intonation, and ‘‘tone’’ for same–different

Fig. 1. Acoustic features of sample Chinese speech stimuli. Broad-band spectrograms (SPG: 0–8 kHz) and voice fundamental frequency contours (F0: 0–400

Hz) are displayed for utterance pairs consisting of same tone/different intonation in three-syllable utterances (top left), same tone/different intonation in one-

syllable utterances (top right), different tone/same intonation in three-syllable utterances (bottom left), and different tone/same intonation in one-syllable

utterances (bottom right).

J. Gandour et al. / NeuroImage 23 (2004) 344–357346

judgments on Chinese tone. Average trial duration was about 2.9

and 3.5 s, respectively, for the one-syllable and three-syllable

utterance blocks, including a response interval of 2 s.

Table 1

Samples of Chinese tone and intonation stimuli for tasks involving one-syllable a

Note. I1 (T1) and I3 (T3) represent intonation (tone) tasks in one-syllable and th

All speech stimuli were digitally edited to have equal maximum

energy level in dB SPL. Auditory stimuli were presented binaurally

using a computer playback system (E-Prime) and a pneumatic-

nd three-syllable utterances

ree-syllable utterances, respectively.

Fig. 2. Sequence and timing of conditions in each of the four functional imaging runs. I3 and I1 stand for intonation in three-syllable and one-syllable Chinese

utterances, respectively; T3 and T1 stand for tone in three-syllable and one-syllable Chinese utterances, respectively; R = rest interval; L3 and L1 stand for

passive listening to three-syllable and one-syllable Chinese utterances, respectively.

J. Gandour et al. / NeuroImage 23 (2004) 344–357 347

based audio system (Avotec). The plastic sound conduction tubes

were threaded through tightly occlusive foam eartips inside the

earmuffs that attenuated the average sound pressure level of the

continuous scanner noise by f30 dB. Average intensity of all

experimental stimuli was 92 dB SPL as compared to 80 dB SPL

scanner noise.

Accuracy, reaction time, and subjective ratings of task difficulty

were used to measure task performance. Each task was self-rated by

listeners on a 1- to 5-point graded scale of difficulty (1 = easy, 3 =

medium, 5 = hard) at the end of the scanning session. Before

scanning, subjects were trained to a high level of accuracy using

stimuli different from those presented during the scanning runs: I3

(Chinese, 93% correct; English, 88%); I1 (Chinese, 92%; English,

77%); T3 (Chinese, 99%; English, 82%); T1 (Chinese, 99%;

English, 85%).

Imaging protocol

Scanning was performed on a 1.5T Signa GE LX Horizon

scanner (Waukesha, WI) equipped with birdcage transmit–re-

ceive radiofrequency head coils. Each of four 200-volume echo-

planar imaging (EPI) series was begun with a rest interval

consisting of 8 baseline volumes (16 s), followed by 184

volumes during which the two comparison tasks (32 s) alternated

with intervening 16 s rest intervals, and ended with a rest interval

of 8 baseline volumes (16 s) (Fig. 2). Functional data were

acquired using a gradient-echo EPI pulse sequence with the

following parameters: repetition time (TR) 2 s; echo time (TE)

50 ms; matrix 64 � 64; flip angle (FA) 90j; field of view (FOV)

24 � 24 cm. Fifteen 7.5-mm-thick, contiguous axial slices were

used to image the entire cerebrum. Before functional imaging

runs, high-resolution, and anatomic images were acquired in 124

contiguous axial slices using a 3D Spoiled-Grass (3D SPGR)

sequence (slice thickness 1.2–1.3 mm; TR 35 ms; TE 8 ms; 1

excitation; FA 30j; matrix 256 � 128; FOV 24 � 24 cm) for

purposes of anatomic localization and coregistration to a standard

stereotactic system (Talairach and Tournoux, 1988). Subjects

were scanned with eyes closed and room lights dimmed. The

effects of head motion were minimized by using a head–neck

pad and dental bite bar.

Imaging analysis

Image analysis was conducted using the AFNI software pack-

age (Cox, 1996). All data for a given subject were motion-

corrected to the fourth acquired volume of the first functional

imaging run. To remove differences in global intensity between

runs, the signal in each voxel was detrended across each functional

scan to remove scanner signal drift, and then normalized to its

mean intensity. Each of the four functional runs was analyzed to

obtain cross-correlation for each of three reference waveforms with

the measured fMRI time series for each voxel. The first reference

waveform corresponded to one of the four active conditions (I1, I3,

T1, T3) presented in a single run (Fig. 2). The second and third

reference waveforms corresponded to the two control conditions,

L1 and L3, respectively, presented during the two runs with the

same temporal interval for the intonation and tone conditions (L1

for I1 and T1; L3 for I3 and T3). After the resulting EPI volumes

were transformed to 1-mm isotropic voxels in Talairach coordinate

space (Talairach and Tournoux, 1988), the correlation coefficients

were converted to z scores for purposes of analyzing multisubject

fMRI data (Bosch, 2000), and spatially smoothed by a 5.2-mm

FWHM Gaussian filter to account for intersubject variation in brain

anatomy and to enhance the signal-to-noise ratio.

Direct comparison of active conditions (I1, I3, T1, T3) across

runs was accomplished by computing the average z score for each

of the four active conditions relative to its corresponding control

condition. Averaged z scores for the control conditions were then

subtracted from those obtained for their corresponding intonation

or tone conditions (e.g., DzI1 = zI1 � zL1, DzI3 = zI3 � zL3).

Evaluating each active condition to a control of the same temporal

interval also makes it possible to compare active conditions across

temporal intervals (e.g., DzI1 vs. DzI3).

Within- and between-group random effects maps (I1 vs. L1, T1

vs. L1, I3 vs. L3, T3 vs. L3) were also generated for display purposes

by applying voxel-wise ANOVAs on the z (e.g., Chinese zI1 vs.

Chinese zL1) and Dz (e.g., Chinese DzI1 vs. English DzI1) values,

respectively. The individual voxel threshold for between-group

maps was set at P = 0.01. For within-group maps, significantly

activated voxels (P < 0.001) located within a radius of 7.6 mm were

grouped into clusters, with a minimum cluster size threshold

Table 2

Center coordinates and extents of 5-mm spherical ROIs

Region BA x y z Description

Frontal

aMFG 10 F32 +50 +4

mMFG 46/9 F45 +32 +22

pMFG 9/6 F44 +10 +33

FO 45/13 F37 +25 +14 centered deep within the frontal

operculum of the inferior frontal gyrus,

extending dorsally to the lower bank of

the inferior frontal sulcus, ventrally to

the bordering edge of the anterior insula

Parietal

IPS 40/7 F32 �48 +43 centered in and confined to

the intraparietal sulcus

IPL 40 F50 �31 +28 centered in anteroventral aspects of

the supramarginal gyrus, extending

ventrally into the bordering edge of

the Sylvian fissure

Temporal

aSTG 38 F55 +9 �8 centered in the temporal pole and

wholly confined to the STG; posterior

border ( y = +5) was about 20 mm

anterior to the medial end of the first

transverse temporal sulcus (TTS)

mSTS 22 F49 �20 �3 centered in the STS encompassing both

the upper and lower banks of the STS;

anterior border ( y = �16)

was contiguous with the medial

border of TTS

pSTG 22 F56 �38 +12 centered in the STG, extending

ventrally into the STS; anterior border

( y = �35) was about 20 mm

posterior to the medial border of TTS

Notes. Stereotaxic coordinates (mm) are derived from the human brain atlas

of Talairach and Tournoux (1988). a, anterior; m, middle; p, posterior; FO,

frontal operculum; MFG, middle frontal gyrus; IPS, intraparietal sulcus;

IPL, inferior parietal lobule; STG, superior temporal gyrus; STS, superior

temporal sulcus. Right hemisphere ROIs were generated by reflecting the

left hemisphere location across the midline.

J. Gandour et al. / NeuroImage 23 (2004) 344–357348

corresponding to four original resolution voxels. According to a

Monte Carlo simulation (AlphaSim), this clustering procedure

yielded a false-positive alpha level of 0.04.

ROI analysis

Nine anatomically constrained 5-mm radius spherical regions of

interest (ROI) were examined along with other regions. We chose

ROIs that have been implicated in previous studies of phonological

processing (Burton, 2001; Hickok and Poeppel, 2000l; Hickok et al.,

2003), speech perception (Binder et al., 2000; Davis and Johnsrude,

2003; Giraud and Price, 2001; Scott, 2003; Scott and Johnsrude,

2003; Scott et al., 2000; Zatorre et al., 2002), attention (Corbetta,

1998; Corbetta and Shulman, 2002; Corbetta et al., 2000; Shaywitz

et al., 2001; Shulman et al., 2002), and workingmemory (Braver and

Bongiolatti, 2002; Chein et al., 2003; D’Esposito et al., 2000;

Jonides et al., 1998; Newman et al., 2002; Paulesu et al., 1993;

Smith and Jonides, 1999). ROIs were symmetric in nonoverlapping

frontal, temporal, and parietal regions of both hemispheres (see

Table 2 and Fig. 3). All center coordinates were derived by averaging

over peak location coordinates reported in previous studies. They

were then slightly adjusted to avoid overlapping of ROIs and

crossing of major anatomical boundaries. Of these coordinates, 26

out of 27 (9 ROIs � 3 coordinates) fell within 1 SD, 1 (x, mSTS)

within 2 SD of the mean published values. Similar results were

obtainedwith 7-mm radius ROIs, but we chose to present only 5-mm

radius results because larger ROIs would have to be shifted to avoid

crossing of anatomical boundaries.

The mean Dz (I1, I3, T1, T3) was calculated for each ROI and

every subject. These mean Dz values within each ROI were

analyzed using repeated measures mixed-model ANOVAs

(SASR) to compare activation between tasks (I1, T1, I3, T3),

hemispheres (LH, RH), and groups (Chinese, English). Tasks and

hemispheres were treated as fixed, within-subjects effects; groups

as a fixed, between-subjects effect. Subjects were nested within

groups as a random effect. It may seem reasonable to use stimulus

length as a separate factor in the ANOVA, treating one-syllable and

three-syllable as two levels of this factor. However, as pointed out in

Introduction, although each stimulus contained three syllables in

both I3 and T3 tasks, T3 was different from I3 with respect to

attentional demands. In T3, participants had to pay attention to the

last syllable only, whereas in I3, they had to focus their attention on

all three syllables. Treating stimulus length as a separate factor

would have confounded length (1, 3) and prosodic unit (I, T).

Results

Behavioral performance

Behavioral measures of task performance by Chinese and En-

glish groups are given in Table 3. A repeated measures ANOVAwas

conductedwithGroup as between-subjects factor (Chinese, English)

and Task as within-subjects factor (I1, I3, T1, T3). Results revealed

significant task� group interactions on self-ratings of task difficulty

[F(1,18) = 3.14, P= 0.0325], accuracy [F(3,54) = 18.33, P < .0001]

and reaction time (RT) [F(3,54) = 8.68, P < 0.0001]. Tests of simple

main effects indicated that for between group comparisons, the tone

task was judged to be easier for Chinese than for English listeners

(T1, P < 0.0001; T3, P = 0.0004); Chinese listeners judged all tasks

at a higher level of accuracy than English listeners (P < 0.01); and

RTs were longer for English than for Chinese listeners when making

tonal judgments (T1, P = 0.0281; T3, P = 0.007). Regardless of

language background, intonation judgments took longer in the one-

syllable (I1) than in the three-syllable (I3) utterances (Chinese, P =

0.0003; English, P= 0.0421). In the Chinese group, I1 was judged to

be more difficult than T1 (P = 0.0003); more errors were made in I1

than T1 (P = 0.0001), and RTs were longer in I1 compared to T1

(P < 0.0001), I3 compared to T3 (P = 0.0339). In contrast, the

English group achieved a higher level of accuracy in I3 than in any of

the other three tasks (P < 0.01).

Between group comparisons

ROI-based ANOVAs revealed that the Chinese group exhibited

significantly (P < 0.001) greater activity, as measured by Dz, in the

left IPL relative to the English group regardless of task (I1, I3, T1,

T3) (Figs. 4f and 5; Table 4). No other ROIs in either the LH or RH

elicited significantly more activity in the Chinese group as com-

pared to the English group.

In contrast, the English group showed significantly greater

bilateral or right-sided activity in frontal, parietal, and temporal

Fig. 3. Location of fixed spherical ROIs in frontal (open circle), parietal (checkered circle), and temporal (barred circle) regions displayed in left sagittal

sections (top and middle panels), and on the lateral surface of both hemispheres (bottom panels). LH = left hemisphere; RH = right hemisphere. Stereotactic x

coordinates that appear in the top and middle panels are derived from the human brain atlas of Talairach and Tournoux (1988). See also Table 2.

J. Gandour et al. / NeuroImage 23 (2004) 344–357 349

ROIs relative to the Chinese group (Fig. 6; Table 4). In the frontal

lobe, all four ROIs (Figs. 4a–d) were more active bilaterally for the

tone tasks (T1, T3). In the parietal lobe, IPS (Fig. 4e) activity was

greater in both the LH and RH for T1. In the temporal lobe, the

Table 3

Behavioral performance and self-ratings of task difficulty

Language group Task Accuracy (%) Reaction time (ms) Difficultya

Chinese I1 91.2 (1.5) 682 (48) 3.3 (0.4)

T1 97.3 (0.9) 504 (41) 1.4 (0.16)

I3 93.9 (1.3) 559 (36) 2.7 (0.3)

T3 96.9 (1.2) 485 (26) 1.8 (0.25)

English I1 76.8 (2.5) 668 (49) 4.0 (0.30)

T1 70.9 (2.9) 642 (53) 3.3 (0.37)

I3 85.3 (2.4) 565 (40) 3.1 (0.28)

T3 72.2 (3.0) 656 (47) 3.4 (0.27)

Note. Values are expressed as mean and standard error (in parentheses). See

also note in Table 1.a Scalar units are from 1 to 5 (1 = easy; 3 = medium; 5 = hard) for self-

ratings of task difficulty.

pSTG (Fig. 4i) was more active bilaterally across tasks (I1, I3, T1,

T3), whereas greater activity in the aSTG (Fig. 4g) was observed

across tasks in the RH only.

Within group comparisons

Hemisphere effects for the Chinese group revealed complemen-

tary leftward and rightward asymmetries, as measured by Dz,

depending on ROI and task (Table 5). Laterality differences

favored the LH in the frontal aMFG (Figs. 4a and 7, upper panel)

for intonation tasks only, irrespective of temporal interval (I1, I3).

In the parietal lobe, significantly more activity was observed in the

left IPL (Figs. 4f and 5) across tasks, and in the left IPS (Figs. 4e

and 7, lower panel) for T3 (cf. Gandour et al., 2003). In the

temporal lobe, activity was greater in the left pSTG (Figs. 4i and 8)

and aSTG (Fig. 4g) across tasks regardless of temporal interval. In

contrast, laterality differences favored the RH in the frontal mMFG

(Fig. 4b) and temporal mSTS (Figs. 4h and 8) across tasks.

Hemisphere effects for the English group were restricted to

frontal and temporal ROIs in the RH (Table 5). Rightward asymme-

Fig. 4. Comparison of mean Dz scores between language groups (Chinese, English) per task (I1, T1, I3, T3) and hemisphere (LH, RH) within each ROI. Frontal

lobe, a–d; parietal, e– f; temporal, g– i. I1 is measured by DzI1; T1 by DzT1; I3 by DzI3; T3 by DzT3. Error bars represent F1 SE.

Fig. 5. A random effects fMRI activation map obtained from comparison of

discrimination judgments of intonation in one-syllable utterances (I1)

relative to passive listening to the same stimuli (L1) between the two

language groups (DzI1 Chinese vs. DzI1 English). Left/right sagittal sections

through stereotaxic space are superimposed onto a representative brain

anatomy. The Chinese group shows increased activation in the left IPL, as

compared to the English group, centered in ventral aspects of the

supramarginal gyrus, and extending into the bordering edge of the Sylvian

fissure. Similar activation foci in the IPL are also observed in I3 vs. L3, T1

vs. L1, and T3 vs. L3 comparisons. See also Fig. 4.

J. Gandour et al. / NeuroImage 23 (2004) 344–357350

tries were observed in the frontal mMFG (Fig. 4b) and temporal

mSTS (Fig. 4h) across tasks. These functional asymmetries favoring

the RH were identical to those for the Chinese group. No significant

leftward asymmetries were observed for any task across ROIs.

Task effects for the Chinese group revealed laterality differ-

ences, as measured by Dz, related to the prosodic unit. Intonation

(I1, I3), when compared to tone (T1, T3), favored the LH in the

aMFG (Figs. 4a and 7). In the pMFG (Fig. 4c), I3 was greater than

T3 in the RH; I1 was greater than T1 in both hemispheres.

For both groups, a cluster analysis revealed significant (P <

.001) activation in the supplementary motor area across tasks. The

Chinese group showed predominantly right-sided activation in the

lateral cerebellum across tasks. In the caudate and thalamus,

increased activation was observed in the Chinese group for the

intonation tasks only (I1, I3), but across tasks in the English group.

Discussion

Hemispheric roles in speech prosody

The major findings of this study demonstrate that Chinese tone

and intonation are best thought of as a mosaic of multiple local

asymmetries that allows for the possibility that different regions

Table 4

Group effects per task-and-hemisphere from statistical analyses on mean Dz within each spherical ROI

Group Hemi Task Frontal Parietal Temporal

aMFG mMFG pMFG FO IPS IPL aSTG mSTS pSTG

C > E LH I1 ***

T1 ***

I3 ***

T3 ***

RH I1

T1

I3

T3

E > C LH I1 **

T1 *** * *** ** *** **

I3 **

T3 ** * ** * **

RH I1 * **

T1 *** * *** ** *** * **

I3 * **

T3 ** * ** * * **

Note. C = Chinese group; E = English group; Hemi = hemisphere. LH = left hemisphere; RH = right hemisphere. *F(1, 18), P < 0.05; **F(1, 18), P < 0.01;

***F(1, 18), P < 0.001. See also notes to Tables 1 and 2.

J. Gandour et al. / NeuroImage 23 (2004) 344–357 351

may be differentially weighted in laterality depending on language-,

modality-, and task-related features (Ide et al., 1999). Earlier

hypotheses that focus on hemispheric function capture only part

of, but not the whole, phenomenon. Not all aspects of speech

prosody are lateralized to the RH. Cross-language differences in

laterality of particular brain regions depend on a listener’s implicit

knowledge of the relation between external stimulus features

(acoustic/auditory) and internal conceptual representations (linguis-

tic/prosodic). All regions in the frontal, temporal, and parietal lobes

Fig. 6. Random effects fMRI activation map obtained from comparison of

discrimination judgments of tone in one-syllable utterances (T1) relative to

passive listening to the same stimuli (L1) between the two language groups

(DzT1 English vs. DzT1 Chinese). An axial section reveals increased

activation bilaterally in both frontal and parietal regions, as well as in the

supplementary motor area, for the English group relative to the Chinese

group. Similar activation foci are also observed in the T3 vs. L3

comparison. See also Fig. 4.

that are lateralized to the LH in response to all tasks or subsets of

tasks are found in the Chinese group only (Fig. 9). Conversely, the

two regions in the temporal and frontal lobes that are lateralized to

the RH are found in both language groups. We infer that LH

laterality reflects higher-order processing of internal representations

of Chinese tone and intonation, whereas RH laterality reflects

lower-order processing of complex auditory stimuli.

Previous models of speech prosody processing in the brain have

either focused on linguistics or acoustics as the driving force

underlying hemispheric lateralization. In this study, tone and

intonation are lateralized to the LH for the Chinese group. Despite

their functional differences from a linguistic perspective, they both

recruit shared neural mechanisms in frontal, temporal, and parietal

regions of the LH. The finding that intonation is lateralized to the

LH cannot be accounted for by a model that claims that ‘‘supra-

segmental sentence level information of speech comprehension is

subserved by the RH’’ (Friederici and Alter, 2004, p. 268). Neither

can this finding be explained by a hypothesis based on the size of

the temporal integration window (short ! LH; long ! RH)

(Poeppel, 2003). In spite of the fact that both intonation and tone

meet his criteria for a long temporal integration window, they are

lateralized to the LH instead of the RH.

Instead of viewing hemispheric roles as being derived from

either acoustics or linguistics independently, we propose that both

linguistics and acoustics, in addition to task demands (Plante et

al., 2002), are all necessary ingredients for developing a neuro-

biological model of speech prosody. This model relies on dynamic

interactions between the two hemispheres. Whereas the RH is

engaged in pitch processing of complex auditory signals, includ-

ing speech, we speculate that the LH is recruited to process

categorical information to support phonological processing, or

even syntactic and semantic processing (cf. Friederici and Alter,

2004). With respect to task demands, I1 elicits greater activation

than T1 in the left aMFG and bilaterally in the pMFG. These

differences cannot be explained by ‘‘prosodic frame length’’

(Dogil et al., 2002) since both tone and intonation are presented

in an identical temporal context (one-syllable). These findings

cannot be explained by a model that claims that segmental, lexical

Table 5

Within-group hemisphere effects per task from statistical analyses on mean Dz within each spherical ROI

Group Hemi Task Frontal Parietal Temporal

aMFG mMFG pMFG FO IPS IPL aSTG mSTS pSTG

C LH > RH I1 + ** *

T1 ** *

I3 ++ ** *

T3 ** *

RH > LH I1 * *

T1 * *

I3 * *

T3 * *

E LH > RH I1

T1

I3

T3

RH > LH I1 * *

T1 * *

I3 * *

T3 * *

Note. *F(1, 9), P < 0.05; **F(1, 9), P < 0.01; +tTukey-adjusted(9), P < 0.05. See also notes to Tables 2 and 4. ++tTukey-adjusted(9), P < 0.01.

J. Gandour et al. / NeuroImage 23 (2004) 344–357352

(i.e., tone), and syntactic information is processed in the LH,

suprasegmental sentence level information (i.e., intonation) in the

RH (Friederici and Alter, 2004). Rather, they most likely reflect

task demands related to retrieval of internal representations

associated with tone and intonation.

Functional heterogeneity within a spatially distributed network

Frontal lobe

Activation in the frontopolar cortex (BA 10) was bilateral

across all tasks for English listeners, but predominantly left-sided

in the intonation tasks (I1, I3) for Chinese listeners (Table 5). The

frontopolar region has extensive interconnections with auditory

regions of the superior temporal gyrus (Petrides and Pandya,

1984). Thus, when presented with a competing articulatory sup-

pression task, bilateral activation of frontopolar cortex has been

reported in a verbal working memory paradigm (Gruber, 2001). Its

functional role is inferred to be that of integrating working memory

with the allocation of attentional resources (Koechlin et al., 1999),

or applying greater effort in memory retrieval (Buckner et al.,

1996; Schacter et al., 1996).

These cross-language differences in frontopolar activation are

likely to result from the linguistic function of suprasegmental

information in Chinese and English. As measured by RT and

accuracy, Chinese listeners take longer and are less proficient in

judging intonation than tone. The relatively greater difficulty in

intonation judgments presumably reflects the fact that in Chinese,

all syllables carry tonal contours obligatorily. Tones are likely to be

processed first, as compared to intonation, due to this syllable-by-

syllable processing. By comparison, intonation contours play a

comparatively minor role in signaling differences in sentence

mood. In this study, the unmarked (i.e., minus a sentence-final

particle) yes–no interrogatives are known to carry a light func-

tional load (Shen, 1990).

In the present study, subjects were required to keep tone or

intonation information of the first stimulus in a pair in their

working memory while concurrently accessing tone or intonation

identification of the second stimulus. Due to the functional

difference between tone and intonation for Chinese listeners,

intonation judgment of the second stimulus competes for more

attentional resources and leads to greater effort in memory retrieval

of intonation from the first stimulus. This process presumably

elicits greater activity in the left frontopolar region for intonation

tasks in Chinese listeners. English listeners, on the other hand,

employ a different processing strategy regardless of linguistic

function. Without prior knowledge of the Chinese language,

retrieving auditory information from working memory and making

discrimination judgments is presumed to be equally difficult

between tone and intonation, resulting in bilateral activation of

frontopolar cortex for all tasks.

Dorsolateral prefrontal cortex, including BA 46 and BA 9, is

involved in controlling attentional demands of tasks and maintain-

ing information in working memory (Corbetta and Shulman, 2002;

Knight et al., 1999; MacDonald et al., 2000; Mesulam, 1981). The

rightward asymmetry in the mMFG (BA 46) that is observed in all

tasks (I1, I3, T1, T3) in both language groups (Table 5) points to a

stage of processing that involves auditory attention and working

memory. Functional neuroimaging data reveal that auditory selec-

tive attention tasks elicit increased activity in right dorsolateral

prefrontal cortex (Zatorre et al., 1999). In the music domain,

perceptual analysis and short-term maintenance of pitch informa-

tion underlying melodies recruits neural systems within the right

prefrontal and temporal cortex (Zatorre et al., 1994). In this study,

activation of the prefrontal mMFG and temporal mSTS is similarly

lateralized to the RH across tasks in both language groups. These

data are consistent with the idea that the right dorsolateral

prefrontal area (BA 46/9) plays a role in auditory attention that

modulates pitch perception in sensory representations beyond the

lateral belt of the auditory cortex, and actively retains pitch

information in auditory working memory (cf. Plante et al., 2002).

Albeit in the speech domain, this frontotemporal network in the RH

serves to maintain pitch information regardless of its linguistic

relevance. A frontotemporal network for auditory short-term mem-

ory is further supported by epileptic patients who show significant

deficits in retention of tonal information after unilateral excisions

of the right frontal or temporal regions (Zatorre and Samson,

1991). In nonhuman primates, a processing stream for sound-

object identification has been proposed that projects anteriorly

Fig. 7. Random effects fMRI activation maps obtained from comparison of discrimination judgments of intonation (I3; upper panel) and tone (T3; bottom

panel) in three-syllable utterances relative to passive listening to the same stimuli (L3) for the Chinese group (zI3 vs. zL3; zT3 vs. zL3). In I3 vs. L3 and I1 vs. L1

(not shown), increased activity in frontopolar cortex (aMFG) shows a leftward asymmetry (upper panel; x = � 35), whereas activation of the middle (mMFG)

region of dorsolateral prefrontal cortex shows the opposite laterality effect (upper panel; x = + 35, + 40, + 45). In T3 vs. L3, IPS activity is predominant in the

LH (bottom panel; x = � 35, � 40, � 45). In I3 (upper panel; x = + 35, + 40, + 45) vs. T3 (lower panel; x = + 35, + 40, + 45), activation of the right pMFG is

greater in the I3 than the T3 task. See also Fig. 4.

J. Gandour et al. / NeuroImage 23 (2004) 344–357 353

along the lateral temporal cortex (Rauschecker and Tian, 2000),

leading to the lateral prefrontal cortex (Hackett et al., 1999;

Romanski et al., 1999a,b). A similar anterior processing stream

destined for the lateral prefrontal cortex in humans presumably

underlies a frontotemporal network, at least in the RH, for low-

level auditory processing of complex pitch information.

Intonation elicited greater activity relative to tone in the pMFG

(BA 9), bilaterally in the one-syllable condition, right sided only in

the three-syllable condition (Fig. 4c). The fact that I3 elicited

greater activity than T3 in the posterior MFG of the RH replicates

Gandour et al. (2003). One possible explanation focuses on the

prosodic units themselves. Tones are processed in the LH, intona-

tion predominantly in the RH. However, this account is untenable

because I1 elicits greater activation bilaterally as compared to T1.

Moreover, intonation (I1, I3) and tone (T1, T3) tasks separately

elicit no hemispheric laterality effects in the pMFG. Another

possible explanation has to do with the temporal interval. One

might argue that the difference between I3 and T3 is due to the

time interval of focused attention for the prosodic unit: I3 = three

syllables; T3 = last syllable only. On this view, shorter prosodic

frames are processed in the LH, longer frames in the RH. This

alternative account of pMFG activity is also ruled out because I1

elicits similar hemispheric laterality effects as I3. Instead, differ-

ential pMFG activity related to direct comparisons between into-

nation and tone are most likely related to task demands (cf. Plante

et al., 2002). As measured by RT and self-ratings of task difficulty,

intonation tasks are more difficult than tone for Chinese listeners

(Table 3). Equally significant is the fact that the English group

shows greater activation for tonal processing (T1, T3) than the

Chinese group in the pMFG bilaterally (Table 4). These findings

together are consistent with the idea that the pMFG coordinates

attentional resources required by the task.

Fig. 8. A random effects fMRI activation map obtained from comparison of

discrimination judgments of intonation in one-syllable utterances (I1)

relative to passive listening to the same stimuli (L1) for the Chinese group

(zI1 vs. zL1). Left/right sagittal sections reveal increased mSTS activity in

the RH, projecting both ventrally and dorsally into the MTG and STG,

respectively. pSTG activity shows the opposite hemispheric effect, part of a

continuous swath of activation extending caudally from middle regions of

the STG/STS. Similar activation foci are also observed in T1 vs. L1, I3 vs.

L3, and T3 vs. L3. See also Fig. 4.

Fig. 9. Laterality effects for ROIs in the Chinese group only, and in both

Chinese and English groups, rendered on a three-dimensional LH template

for common reference. In the Chinese group (top panel), IPL, aSTG, and

pSTG are left-lateralized (LH > RH) across tasks; aMFG (I1, I3) and IPS

(T3) are left-lateralized for specific tasks. In both language groups (bottom

panel), mMFG and mSTS are right-lateralized (RH > LH) across tasks

(bottom right panel). Other ROIs do not show laterality effects. No ROI

elicited either a rightward asymmetry for the Chinese group only, or a

leftward asymmetry for both Chinese and English groups. See also Table 5.

J. Gandour et al. / NeuroImage 23 (2004) 344–357354

The fronto-opercular region (FO, BA 45/13) is activated

bilaterally in both language groups (Table 5). Activation levels

are similar across tasks (I1, I3, T1, T3). Recent neuroimaging

studies (Meyer et al., 2002, 2003) also show bilateral FO activation

in a prosodic speech condition in which a speech utterance is

reduced to speech melody by removal of all lexical and syntactic

information. Increased FO activity is presumed to reflect increased

effort in extracting syntactic, lexical-semantic, or slow pitch

information from degraded speech signals (Meyer et al., 2002,

2003), or in discriminating sequences of melodic pitch patterns

(Zatorre et al., 1994). Similarly, our tasks require increased

cognitive effort to extract tone and intonation from the auditory

stream to maintain this information in working memory.

Parietal lobe

There appear to be at least two distinct regions of activation in

the parietal cortex, one located more superiorly (IPS) in the

intraparietal sulcus and adjacent aspects of the superior parietal

lobule, another more inferiorly (IPL) in the anterior supramarginal

gyrus (SMG) near the parietotemporal boundary (cf. Becker et al.,

1999). Our findings show greater activation in the IPS bilaterally in

T1 for the English group compared to Chinese (Table 4). It has

been proposed that this area supports voluntary focusing and

shifting of attentional scanning across activated memory represen-

tations (Chein et al., 2003; Corbetta and Shulman, 2002; Corbetta

et al., 2000; Cowan, 1995; Mazoyer et al., 2002). The efficacy of

selective attention depends on how external stimuli are encoded

into internal phonological representations. English listeners expe-

rienced more difficulty in focusing and shifting of attention in T1

because lexically relevant pitch variations do not occur in English

monosyllables.

In contrast, the Chinese group shows left-sided activity in the

IPS for T3 (Table 5). This finding replicates our previous study of

Chinese tone and intonation (Gandour et al., 2003), reinforcing the

view that a left frontoparietal network is recruited for the process-

ing of lexical tones (Li et al., 2003). In T1, listeners extract tone

from isolated monosyllables. In T3, they extract tone from a fixed

position in a sequence of syllables, which causes repeated shifts in

attention from one item to another. These laterality differences

between T3 and T1 indicate that selective attention to discrete

linguistic constructs is a gradient neurophysiological phenomenon

in the context of task-specific demands.

The Chinese group, as compared to English, shows greater

activation across tasks (I1, I3, T1, T3) in the left ventral aspects of

the IPL (BA 40) near the parietotemporal boundary (Table 4).

Within the Chinese group, a relatively greater IPL activation on the

left is observed across tasks and without regard to the prosodic unit

(I, T) or temporal interval (1, 3). Perhaps it is the ‘‘categoricalness’’

or phonological significance of the auditory stimuli that triggers

activation in this area (Jacquemot et al., 2003). This language-

specific effect can be understood from the conceptualization of the

IPL as part of an auditory-motor integration circuit in speech

perception (Hickok and Poeppel, 2000; Wise et al., 2001). Chinese

listeners possess articulatory-based representations of Chinese

tones and intonation. English listeners do not. Consequently, no

J. Gandour et al. / NeuroImage 23 (2004) 344–357 355

activation of this area is observed in the English group. Its LH

activity co-occurs with a leftward asymmetry in the pSTG across

tasks. Co-activation of the IPL reinforces the view that it is part of

an auditory–articulatory processing stream that connects posterior

temporal and inferior prefrontal regions. An alternative conceptu-

alization is that the phonological storage component of verbal

working memory resides in the IPL (Awh et al., 1996; Paulesu et

al., 1993). This notion predicts that both passive listening and

verbal working memory tasks should elicit activation in this region,

since auditory verbal information has obligatory access to the store

(Chein et al., 2003). I1, I3, T1, and T3 were all derived by

subtracting their corresponding passive listening control condition.

Contrary to fact, this notion would wrongly predict no increased

activation in the IPL.

Temporal lobe

The anterior superior temporal gyrus (aSTG) displays an LH

advantage in the Chinese group across tasks (Table 5). A reduced

RH, rather than increased LH aSTG activation, appears to

underlie this hemispheric asymmetry across all tasks. Since

intelligible speech is used in all tasks, phonological input alone

may be sufficient to explain the leftward asymmetry in the

Chinese group (Scott and Johnsrude, 2003; Scott et al., 2000).

It is also consistent with the notion that this region maps

acoustic–phonetic cues onto linguistic representations as part of

a larger auditory-semantic integration circuit in speech perception

(Giraud and Price, 2001; Scott and Johnsrude, 2003; Scott et al.,

2003). In contrast, English listeners do not have knowledge of

these prosodic representations. Consequently, they employ a

nonlinguistic pitch processing strategy across tasks and fail to

show any hemispheric asymmetry.

A language group effect is not found in hemispheric laterality

of the mSTS (BA 22/21). Both groups show greater RH activity in

the mSTS across tasks (Table 5). This suggests that this area is

sensitive to different acoustic features of the speech signal irre-

spective of language experience. The rightward asymmetry may

reflect shared mechanisms underlying early attentional modulation

in processing of complex pitch patterns. In this study, subjects were

required to direct their attention to slow modulation of pitch

patterns (i.e., c300–1000 ms) underlying either Chinese tone or

intonation. This interpretation is consistent with hemispheric roles

hypothesized for auditory processing of complex sounds in the

temporal lobe: RH for spectral processing, LH for temporal

processing (Poeppel, 2003; Zatorre and Belin, 2001; Zatorre et

al., 2002). Moreover, it is consistent with the view that right

auditory cortex is most important in the processing of dynamic

pitch variation (Johnsrude et al., 2000). Both groups show greater

activation in the right mSTS. We therefore infer that this activity

reflects a complex aspect of pitch processing that is independent of

language experience.

A left asymmetric activation of the posterior part of the superior

temporal gyrus (pSTG; BA 22) across tasks is observed in the

Chinese group only (Table 5). It has been suggested that the left

pSTG, as part of a posterior processing stream, is involved in

prelexical processing of phonetic cues and features (Scott, 2003;

Scott and Johnsrude, 2003; Scott and Wise, 2003). English listen-

ers, however, show no leftward asymmetry in the pSTG (Table 5).

Moreover, they show greater activation bilaterally relative to the

Chinese group (Table 4). Therefore, auditory phonetic cues that are

of phonological significance in one’s native language may be

primarily responsible for this leftward asymmetry.

These findings collectively support functional segregation of

temporal lobe regions, and their functional integration as part of a

temporofrontal network (Davis and Johnsrude, 2003; Scott, 2003;

Scott and Johnsrude, 2003; Specht and Reul, 2003). LH networks

in the temporal lobe that are sensitive to phonologically relevant

parameters from the auditory signal are in anterior and posterior, as

opposed to central, regions of the STG/STS (Giraud and Price,

2001). The anterior region appears to be part of an auditory-

semantic processing stream, the posterior region part of an audi-

tory-motor processing stream. Both processing streams, in turn,

project to convergence areas in the frontal lobe.

Effects of task performance on hemispheric asymmetry

In this study, the BOLD signal magnitude depends on the

participant’s proficiency in a particular phonological task (Chee et

al., 2001). The two groups differ maximally in relative language

proficiency: Chinese group, 100%; English group, 0%. As reflected

in behavioral measures of task performance (Table 3), perceptual

judgments of Chinese tones require more cognitive effort by

English monolinguals due to their unfamiliarity with lexical tones.

Their unfamiliarity with the Chinese language results in greater

BOLD activation for T1 and T3, either bilateral or RH only (cf.

Chee et al., 2001). The effect of minimal language proficiency

applies only to lexical tone. Intonation, on the other hand, elicits

bilateral activation for both groups in the posterior MFG, frontal

operculum, and intraparietal sulcus (Table 4; Fig. 4). This common

frontoparietal activity implies that processing of intonation requires

similar cognitive effort for Chinese and English participants.

Conclusions

Cross-language comparisons provide unique insights into the

functional roles of different areas of this cortical network that are

recruited for processing different aspects of speech prosody (e.g.,

auditory, phonological). By using tone and intonation tasks, we are

able to distinguish hemispheric roles of areas sensitive to linguistic

levels of processing (LH) from those sensitive to lower-level

acoustical processing (RH). Rather than attribute processing of

speech prosody to RH mechanisms exclusively, our findings

suggest that lateralization is influenced by language experience

that shapes the internal prosodic representation of an external

auditory signal. This emerging model assumes a close interaction

between the two hemispheres via the corpus callosum. In sum, we

propose a more comprehensive model of speech prosody percep-

tion that is mediated primarily by RH regions for complex-sound

analysis, but is lateralized to task-dependent regions in the LH

when language processing is required.

Acknowledgments

Funding was provided by a research grant from the National

Institutes of Health R01 DC04584-04 (JG) and an NIH

postdoctoral traineeship (XL). We are grateful to J. Lowe, T.

Osborn, and J. Zimmerman for their technical assistance in the

MRI laboratory. Portions of this research were presented at the 11th

annual meeting of the Cognitive Neuroscience Society, San

Francisco, April 2004. Correspondence should be addressed to

Jack Gandour, Department of Audiology and Speech Sciences,

J. Gandour et al. / NeuroImage 23 (2004) 344–357356

Purdue University, West Lafayette, IN 47907-2038, or via email:

[email protected].

References

Awh, E., Jonides, J., Smith, E.E., Schumacher, E.H., Koeppe, R.A., Katz,

S., 1996. Dissociation of storage and rehearsal in verbal working mem-

ory. Psychol. Sci. 7 (1), 25–31.

Baum, S., Pell, M., 1999. The neural bases of prosody: insights from lesion

studies and neuroimaging. Aphasiology 13, 581–608.

Becker, J., MacAndrew, D., Fiez, J., 1999. A comment on the functional

localization of the phonological storage subsystem of working memory.

Brain Cogn. 41, 27–38.

Binder, J., Frost, J., Hammeke, T., Bellgowan, P., Springer, J., Kaufman, J.,

Possing, E., 2000. Human temporal lobe activation by speech and non-

speech sounds. Cereb. Cortex 10 (5), 512–528.

Blumstein, S., Cooper, W.E., 1974. Hemispheric processing of intonation

contours. Cortex 10, 146–158.

Bosch, V., 2000. Statistical analysis of multi-subject fMRI data: assessment

of focal activations. J. Magn. Reson. Imaging 11 (1), 61–64.

Bradvik, B., Dravins, C., Holtas, S., Rosen, I., Ryding, E., Ingvar, D., 1991.

Disturbances of speech prosody following right hemisphere infarcts.

Acta Neurol. Scand. 84 (2), 114–126.

Braver, T.S., Bongiolatti, S.R., 2002. The role of frontopolar cortex in

subgoal processing during working memory. NeuroImage 15 (3),

523–536.

Buckner, R.L., Raichle, M.E., Miezin, F.M., Petersen, S.E., 1996. Func-

tional anatomic studies of memory retrieval for auditory words and

visual pictures. J. Neurosci. 16 (19), 6219–6235.

Burton, M., 2001. The role of the inferior frontal cortex in phonological

processing. Cogn. Sci. 25 (5), 695–709.

Chee, M.W., Hon, N., Lee, H.L., Soon, C.S., 2001. Relative language

proficiency modulates BOLD signal change when bilinguals perform

semantic judgments. NeuroImage 13 (6 Pt 1), 1155–1163.

Chein, J.M., Ravizza, S.M., Fiez, J.A., 2003. Using neuroimaging to eval-

uate models of working memory and their implications for language

processing. J. Neurolinguist. 16, 315–339.

Corbetta, M., 1998. Frontoparietal cortical networks for directing attention

and the eye to visual locations: identical, independent, or overlapping

neural systems? Proc. Natl. Acad. Sci. U. S. A. 95 (3), 831–838.

Corbetta, M., Shulman, G.L., 2002. Control of goal-directed and stimulus-

driven attention in the brain. Nat. Rev., Neurosci. 3 (3), 201–215.

Corbetta, M., Kincade, J.M., Ollinger, J.M., McAvoy, M.P., Shulman, G.L.,

2000. Voluntary orienting is dissociated from target detection in human

posterior parietal cortex. Nat. Neurosci. 3 (3), 292–297.

Cowan, N., 1995. Sensory memory and its role in information processing.

Electroencephalogr. Clin. Neurophysiol., Suppl. 44, 21–31.

Cox, R.W., 1996. AFNI: software for analysis and visualization of func-

tional magnetic resonance neuroimages. Comput. Biomed. Res. 29 (3),

162–173.

Davis, M.H., Johnsrude, I.S., 2003. Hierarchical processing in spoken

language comprehension. J. Neurosci. 23 (8), 3423–3431.

D’Esposito, M., Postle, B.R., Rypma, B., 2000. Prefrontal cortical contri-

butions to working memory: evidence from event-related fMRI studies.

Exp. Brain Res. 133 (1), 3–11.

Dogil, G., Ackermann, H., Grodd, W., Haider, H., Kamp, H., Mayer, J.,

Riecker, A., Wildgruber, D., 2002. The speaking brain: a tutorial intro-

duction to fMRI experiments in the production of speech, prosody and

syntax. J. Neurolinguist. 15, 59–90.

Eng, N., Obler, L., Harris, K., Abramson, A., 1996. Tone perception def-

icits in Chinese-speaking Broca’s aphasics. Aphasiology 10, 649–656.

Friederici, A.D., Alter, K., 2004. Lateralization of auditory language func-

tions: a dynamic dual pathway model. Brain Lang. 89 (2), 267–276.

Gandour, J., Dardarananda, R., 1983. Identification of tonal contrasts in

Thai aphasic patients. Brain Lang. 18 (1), 98–114.

Gandour, J., Wong, D., Hsieh, L., Weinzapfel, B., Van Lancker, D., Hutch-

ins, G.D., 2000. A crosslinguistic PET study of tone perception.

J. Cogn. Neurosci. 12 (1), 207–222.

Gandour, J., Wong, D., Lowe, M., Dzemidzic, M., Satthamnuwong, N.,

Tong, Y., Li, X., 2002. A cross-linguistic FMRI study of spectral and

temporal cues underlying phonological processing. J. Cogn. Neurosci.

14 (7), 1076–1087.

Gandour, J., Dzemidzic, M., Wong, D., Lowe, M., Tong, Y., Hsieh, L.,

Satthamnuwong, N., Lurito, J., 2003. Temporal integration of speech

prosody is shaped by language experience: an fMRI study. Brain Lang.

84 (3), 318–336.

George, M.S., Parekh, P.I., Rosinsky, N., Ketter, T.A., Kimbrell, T.A.,

Heilman, K.M., Herscovitch, P., Post, R.M., 1996. Understanding emo-

tional prosody activates right hemisphere regions. Arch. Neurol. 53 (7),

665–670.

Giraud, A.L., Price, C.J., 2001. The constraints functional neuroimaging

places on classical models of auditory word processing. J. Cogn. Neuro-

sci. 13 (6), 754–765.

Gruber, O., 2001. Effects of domain-specific interference on brain activa-

tion associated with verbal working memory task performance. Cereb.

Cortex 11 (11), 1047–1055.

Hackett, T.A., Stepniewska, I., Kaas, J.H., 1999. Prefrontal connections

of the parabelt auditory cortex in macaque monkeys. Brain Res. 817

(1–2), 45–58.

Hickok, G., Poeppel, D., 2000. Towards a functional neuroanatomy of

speech perception. Trends Cogn. Sci. 4 (4), 131–138.

Hickok, G., Buchsbaum, B., Humphries, C., Muftuler, T., 2003. Auditory-

motor interaction revealed by fMRI: speech, music, and working mem-

ory in area Spt. J. Cogn. Neurosci. 15 (5), 673–682.

Howie, J.M., 1976. Acoustical Studies of Mandarin Vowels and Tones.

Cambridge University Press, New York.

Hsieh, L., Gandour, J., Wong, D., Hutchins, G.D., 2001. Functional het-

erogeneity of inferior frontal gyrus is shaped by linguistic experience.

Brain Lang. 76 (3), 227–252.

Hughes, C.P., Chan, J.L., Su, M.S., 1983. Aprosodia in Chinese patients

with right cerebral hemisphere lesions. Arch. Neurol. 40 (12), 732–736.

Ide, A., Dolezal, C., Fernandez, M., Labbe, E., Mandujano, R., Montes, S.,

Segura, P., Verschae, G., Yarmuch, P., Aboitiz, F., 1999. Hemispheric

differences in variability of fissural patterns in parasylvian and cingulate

regions of human brains. J. Comp. Neurol. 410 (2), 235–242.

Ivry, R., Robertson, L., 1998. The Two Sides of Perception. MIT Press,

Cambridge, MA.

Jacquemot, C., Pallier, C., LeBihan, D., Dehaene, S., Dupoux, E., 2003.

Phonological grammar shapes the auditory cortex: a functional magnetic

resonance imaging study. J. Neurosci. 23 (29), 9541–9546.

Johnsrude, I.S., Penhune, V.B., Zatorre, R.J., 2000. Functional specificity

in the right human auditory cortex for perceiving pitch direction. Brain

123 (Pt 1), 155–163.

Jonides, J., Schumacher, E.H., Smith, E.E., Koeppe, R.A., Awh, E.,

Reuter-Lorenz, P.A., Marshuetz, C., Willis, C.R., 1998. The role of

parietal cortex in verbal working memory. J. Neurosci. 18 (13),

5026–5034.

Klein, D., Zatorre, R., Milner, B., Zhao, V., 2001. A cross-linguistic PET

study of tone perception in Mandarin Chinese and English speakers.

NeuroImage 13 (4), 646–653.

Knight, R.T., Staines, W.R., Swick, D., Chao, L.L., 1999. Prefrontal cortex

regulates inhibition and excitation in distributed neural networks. Acta

Psychol. (Amst.) 101 (2–3), 159–178.

Koechlin, E., Basso, G., Pietrini, P., Panzer, S., Grafman, J., 1999. The role

of the anterior prefrontal cortex in human cognition. Nature 399 (6732),

148–151.

Li, X., Gandour, J., Talavage, T., Wong, D., Dzemidzic, M., Lowe, M.,

Tong, Y., 2003. Selective attention to lexical tones recruits left dorsal

frontoparietal network. NeuroReport 14 (17), 2263–2266.

MacDonald III, A.W., Cohen, J.D., Stenger, V.A., Carter, C.S., 2000. Dis-

sociating the role of the dorsolateral prefrontal and anterior cingulate

cortex in cognitive control. Science 288 (5472), 1835–1838.

J. Gandour et al. / NeuroImage 23 (2004) 344–357 357

Mazoyer, P., Wicker, B., Fonlupt, P., 2002. A neural network elicited by

parametric manipulation of the attention load. NeuroReport 13 (17),

2331–2334.

Mesulam, M.M., 1981. A cortical network for directed attention and uni-

lateral neglect. Ann. Neurol. 10 (4), 309–325.

Meyer, M., Alter, K., Friederici, A.D., Lohmann, G., von Cramon, D.Y.,

2002. fMRI reveals brain regions mediating slow prosodic modulations

in spoken sentences. Hum. Brain Mapp. 17 (2), 73–88.

Meyer, M., Alter, K., Friederici, A.D., 2003. Functional MR imaging

exposes differential brain responses to syntax and prosody during au-

ditory sentence comprehension. J. Neurolinguist. 16, 277–300.

Moen, I., 1993. Functional lateralization of the perception of Norwegian

word tones—Evidence from a dichotic listening experiment. Brain

Lang. 44 (4), 400–413.

Newman, S.D., Just, M.A., Carpenter, P.A., 2002. The synchronization of

the human cortical working memory network. NeuroImage 15 (4),

810–822.

Oldfield, R.C., 1971. The assessment and analysis of handedness: the

Edinburgh inventory. Neuropsychologia 9 (1), 97–113.

Paulesu, E., Frith, C.D., Frackowiak, R.S., 1993. The neural correlates

of the verbal component of working memory. Nature 362 (6418),

342–345.

Pell, M.D., 1998. Recognition of prosody following unilateral brain lesion:

influence of functional and structural attributes of prosodic contours.

Neuropsychologia 36 (8), 701–715.

Pell, M.D., Baum, S.R., 1997. The ability to perceive and comprehend

intonation in linguistic and affective contexts by brain-damaged adults.

Brain Lang. 57 (1), 80–99.

Petrides, M., Pandya, D.N., 1984. Association fiber pathways to the frontal

cortex from the superior temporal region in the rhesus monkey.

J. Comp. Neurol. 273, 52–66.

Plante, E., Creusere, M., Sabin, C., 2002. Dissociating sentential prosody

from sentence processing: activation interacts with task demands.

NeuroImage 17 (1), 401–410.

Poeppel, D., 2003. The analysis of speech in different temporal integration

windows: cerebral lateralization as ‘asymmetric sampling in time’.

Speech Commun. 41 (1), 245–255.

Rauschecker, J.P., Tian, B., 2000. Mechanisms and streams for processing

of ‘‘what’’ and ‘‘where’’ in auditory cortex. Proc. Natl. Acad. Sci. U. S.

A. 97 (22), 11800–11806.

Romanski, L.M., Bates, J.F., Goldman-Rakic, P.S., 1999a. Auditory belt

and parabelt projections to the prefrontal cortex in the rhesus monkey.

J. Comp. Neurol. 403 (2), 141–157.

Romanski, L.M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P.S.,

Rauschecker, J.P., 1999b. Dual streams of auditory afferents target mul-

tiple domains in the primate prefrontal cortex. Nat. Neurosci. 2 (12),

1131–1136.

Schacter, D.L., Alpert, N.M., Savage, C.R., Rauch, S.L., Albert, M.S.,

1996. Conscious recollection and the human hippocampal formation:

evidence from positron emission tomography. Proc. Natl. Acad. Sci.

U. S. A. 93 (1), 321–325.

Schwartz, J., Tallal, P., 1980. Rate of acoustic change may underlie hemi-

spheric specialization for speech perception. Science 207, 1380–1381.

Scott, S., 2003. How might we conceptualize speech perception? The view

from neurobiology. J. Phon. 31, 417–422.

Scott, S.K., Johnsrude, I.S., 2003. The neuroanatomical and functional

organization of speech perception. Trends Neurosci. 26 (2), 100–107.

Scott, S.K., Wise, R., 2003. PET and fMRI studies of the neural basis of

speech perception. Speech Commun. 41, 23–34.

Scott, S.K., Blank, C.C., Rosen, S., Wise, R.J., 2000. Identification of a

pathway for intelligible speech in the left temporal lobe. Brain 123

(Pt 12), 2400–2406.

Scott, S.K., Leff, A.P., Wise, R.J., 2003. Going beyond the information

given: a neural system supporting semantic interpretation. NeuroImage

19 (3), 870–876.

Shaywitz, B.A., Shaywitz, S.E., Pugh, K.R., Fulbright, R.K., Skudlarski,

P., Mencl, W.E., Constable, R.T., Marchione, K.E., Fletcher, J.M.,

Klorman, R., et al., 2001. The functional neural architecture of

components of attention in language-processing tasks. NeuroImage

13 (4), 601–612.

Shen, X.-N., 1990. The Prosody of Mandarin Chinese. University of Cal-

ifornia Press, Berkeley, CA.

Shipley-Brown, F., Dingwall, W.O., Berlin, C.I., Yeni-Komshian, G., Gor-

don-Salant, S., 1988. Hemispheric processing of affective and linguistic

intonation contours in normal subjects. Brain Lang. 33 (1), 16–26.

Shulman, G.L., d’Avossa, G., Tansy, A.P., Corbetta, M., 2002. Two atten-

tional processes in the parietal lobe. Cereb. Cortex 12 (11), 1124–1131.

Smith, E.E., Jonides, J., 1999. Storage and executive processes in the

frontal lobes. Science 283, 1657–1661.

Specht, K., Reul, J., 2003. Functional segregation of the temporal lobes

into highly differentiated subsystems for auditory perception: an audi-

tory rapid event-related fMRI-task. NeuroImage 20 (4), 1944–1954.

Talairach, J., Tournoux, P., 1988. Co-planar Stereotaxic Atlas of the Human

Brain : 3-Dimensional Proportional System: An Approach to Cerebral

Imaging. Thieme Medical Publishers, New York.

Van Lancker, D., 1980. Cerebral lateralization of pitch cues in the linguistic

signal. Pap. Linguist. 13 (2), 201–277.

Van Lancker, D., Fromkin, V., 1973. Hemispheric specialization for pitch

and tone: evidence from Thai. J. Phon. 1, 101–109.

Wang, Y., Jongman, A., Sereno, J., 2001. Dichotic perception of Mandarin

tones by Chinese and American listeners. Brain Lang. 78, 332–348.

Weintraub, S., Mesulam, M.M., Kramer, L., 1981. Disturbances in prosody.

A right-hemisphere contribution to language. Arch. Neurol. 38 (12),

742–744.

Wildgruber, D., Pihan, H., Ackermann, H., Erb, M., Grodd, W., 2002.

Dynamic brain activation during processing of emotional intonation:

influence of acoustic parameters, emotional valence, and sex. Neuro-

Image 15 (4), 856–869.

Wise, R.J., Scott, S.K., Blank, S.C., Mummery, C.J., Murphy, K., Warbur-

ton, E.A., 2001. Separate neural subsystems within ‘Wernicke’s area’.

Brain 124 (Pt 1), 83–95.

Yiu, E., Fok, A., 1995. Lexical tone disruption in Cantonese aphasic

speakers. Clin. Linguist. Phon. 9, 79–92.

Yuan, J., Shih, C., Kochanski, G., 2002. Comparison of declarative and

interrogative intonation in Chinese. In: Bel, B., Marlien, I. (Eds.), Pro-

ceedings of the First International Conference on Speech Prosody. Aix-

en-Provence, France, pp. 711–714 (April).

Zatorre, R.J., Belin, P., 2001. Spectral and temporal processing in human

auditory cortex. Cereb. Cortex 11 (10), 946–953.

Zatorre, R., Samson, S., 1991. Role of the right temporal neocortex in

retention of pitch in auditory short-term memory. Brain 114 (Pt 6),

2403–2417.

Zatorre, R.J., Evans, A.C., Meyer, E., 1994. Neural mechanisms under-

lying melodic perception and memory for pitch. J. Neurosci. 14 (4),

1908–1919.

Zatorre, R.J., Mondor, T.A., Evans, A.C., 1999. Auditory attention to space

and frequency activates similar cerebral systems. NeuroImage 10 (5),

544–554.

Zatorre, R.J., Belin, P., Penhune, V.B., 2002. Structure and function of

auditory cortex: music and speech. Trends Cogn. Sci. 6 (1), 37–46.