Individual differences in the multisensory temporal binding window predict susceptibility to...

13
Individual Differences in the Multisensory Temporal Binding Window Predict Susceptibility to Audiovisual Illusions Ryan A. Stevenson, Raquel K. Zemtsov, and Mark T. Wallace Vanderbilt University Human multisensory systems are known to bind inputs from the different sensory modalities into a unified percept, a process that leads to measurable behavioral benefits. This integrative process can be observed through multisensory illusions, including the McGurk effect and the sound-induced flash illusion, both of which demonstrate the ability of one sensory modality to modulate perception in a second modality. Such multisensory integration is highly dependent upon the temporal relationship of the different sensory inputs, with perceptual binding occurring within a limited range of asynchronies known as the temporal binding window (TBW). Previous studies have shown that this window is highly variable across individuals, but it is unclear how these variations in the TBW relate to an individual’s ability to integrate multisensory cues. Here we provide evidence linking individual differences in multisensory temporal processes to differences in the individual’s audiovisual integration of illusory stimuli. Our data provide strong evidence that the temporal processing of multiple sensory signals and the merging of multiple signals into a single, unified perception, are highly related. Specifically, the width of right side of an individuals’ TBW, where the auditory stimulus follows the visual, is significantly correlated with the strength of illusory percepts, as indexed via both an increase in the strength of binding synchronous sensory signals and in an improvement in correctly dissociating asynchronous signals. These findings are discussed in terms of their possible neurobiological basis, relevance to the development of sensory integration, and possible importance for clinical conditions in which there is growing evidence that multisensory integration is compromised. Keywords: multisensory integration, cross-modal, McGurk, sound-induced flash illusion, perception, temporal processing Human sensory systems have the ability to transduce and pro- cess information from distinct types of energy in the environment. In addition to the workings of the individual sensory systems, multisensory networks have the ability to bind these signals into a coherent, unified perception of external events (Gaver, 1993). This integration of information across the senses improves our ability to interact with the environment, allowing for enhanced detection (Lovelace, Stein, & Wallace, 2003; Stein & Wallace, 1996), more accurate localization (Nelson et al., 1998; Wilkinson, Meredith, & Stein, 1996), and faster reactions (Diederich & Colonius, 2004; Hershenson, 1962). In addition to these highly adaptive benefits, multisensory integration can also be seen in a host of perceptual illusions, where the presentation of a signal in one sensory mo- dality can modulate perception in a second sensory modality. One of the more compelling of these illusions is the sound-induced flash illusion, in which an observer is presented with a single visual flash paired with multiple auditory cues (i.e., beeps) in rapid succession. The participant is instructed to ignore the beeps and to report only the number of visual flashes. Despite the presence of only a single visual flash, observers often report seeing multiple flashes when paired with two or more auditory cues (Shams, Kamitani, & Shimojo, 2000). Whereas this illusion takes advan- tage of very low-level stimulus pairings (i.e., flashes and beeps) other illusions can be demonstrated using stimuli rich in semantic content. In the speech realm, this is best illustrated by the McGurk effect, in which the combination of a visual /ga/ and an auditory /ba/ often results in the perception of a novel syllable (i.e., /da/ (McGurk & MacDonald, 1976). These interactions between the senses, including the illusory examples described above, are highly dependent upon the charac- teristics of the stimuli that are combined. These findings were originally reported in animal models with neuronal recordings, showing that the more spatially congruent (Meredith & Stein, 1986a) and temporally synchronous (Meredith, Nemitz, & Stein, 1987) the stimuli are to one another, the larger the change (i.e., gain) seen under multisensory circumstances. In addition, the less effective the individual stimuli are in generating a response, the This article was published Online First March 5, 2012. Ryan A. Stevenson, Department of Hearing and Speech Sciences, Van- derbilt University Medical Center; Raquel K. Zemtsov, Undergraduate Neuroscience Program, Vanderbilt University; Mark T. Wallace, Depart- ment of Hearing and Speech Sciences, Vanderbilt University Medical Center, Vanderbilt University Kennedy Center, Department of Psychology, Vanderbilt Brain Institute, Vanderbilt University. This research was supported in part by NIH NIDCD Research Grants 1R34 DC010927 and 1F32 DC011993. Thanks to Laurel Stevenson and Jeanne Wallace for their support, to Lena Quinto for the McGurk stimuli, to Juliane Krueger Fister, Justin Siemann, and Aaron Nidiffer for assis- tance running subjects, to Zachary Barnett for technical assistance, and to our reviewers for their insights on this work. Correspondence concerning this article should be addressed to Ryan A. Stevenson, 7110 MRB III BioSci Bldg, 465 21st Ave. South, Nashville, TN 37232. E-mail: [email protected] Journal of Experimental Psychology: © 2012 American Psychological Association Human Perception and Performance 2012, Vol. 38, No. 6, 1517–1529 0096-1523/12/$12.00 DOI: 10.1037/a0027339 1517

Transcript of Individual differences in the multisensory temporal binding window predict susceptibility to...

Individual Differences in the Multisensory Temporal Binding WindowPredict Susceptibility to Audiovisual Illusions

Ryan A. Stevenson, Raquel K. Zemtsov, and Mark T. WallaceVanderbilt University

Human multisensory systems are known to bind inputs from the different sensory modalities into aunified percept, a process that leads to measurable behavioral benefits. This integrative process can beobserved through multisensory illusions, including the McGurk effect and the sound-induced flashillusion, both of which demonstrate the ability of one sensory modality to modulate perception in asecond modality. Such multisensory integration is highly dependent upon the temporal relationship of thedifferent sensory inputs, with perceptual binding occurring within a limited range of asynchronies knownas the temporal binding window (TBW). Previous studies have shown that this window is highly variableacross individuals, but it is unclear how these variations in the TBW relate to an individual’s ability tointegrate multisensory cues. Here we provide evidence linking individual differences in multisensorytemporal processes to differences in the individual’s audiovisual integration of illusory stimuli. Our dataprovide strong evidence that the temporal processing of multiple sensory signals and the merging ofmultiple signals into a single, unified perception, are highly related. Specifically, the width of right sideof an individuals’ TBW, where the auditory stimulus follows the visual, is significantly correlated withthe strength of illusory percepts, as indexed via both an increase in the strength of binding synchronoussensory signals and in an improvement in correctly dissociating asynchronous signals. These findings arediscussed in terms of their possible neurobiological basis, relevance to the development of sensoryintegration, and possible importance for clinical conditions in which there is growing evidence thatmultisensory integration is compromised.

Keywords: multisensory integration, cross-modal, McGurk, sound-induced flash illusion, perception,temporal processing

Human sensory systems have the ability to transduce and pro-cess information from distinct types of energy in the environment.In addition to the workings of the individual sensory systems,multisensory networks have the ability to bind these signals into acoherent, unified perception of external events (Gaver, 1993). Thisintegration of information across the senses improves our ability tointeract with the environment, allowing for enhanced detection(Lovelace, Stein, & Wallace, 2003; Stein & Wallace, 1996), moreaccurate localization (Nelson et al., 1998; Wilkinson, Meredith, &Stein, 1996), and faster reactions (Diederich & Colonius, 2004;

Hershenson, 1962). In addition to these highly adaptive benefits,multisensory integration can also be seen in a host of perceptualillusions, where the presentation of a signal in one sensory mo-dality can modulate perception in a second sensory modality. Oneof the more compelling of these illusions is the sound-inducedflash illusion, in which an observer is presented with a singlevisual flash paired with multiple auditory cues (i.e., beeps) in rapidsuccession. The participant is instructed to ignore the beeps and toreport only the number of visual flashes. Despite the presence ofonly a single visual flash, observers often report seeing multipleflashes when paired with two or more auditory cues (Shams,Kamitani, & Shimojo, 2000). Whereas this illusion takes advan-tage of very low-level stimulus pairings (i.e., flashes and beeps)other illusions can be demonstrated using stimuli rich in semanticcontent. In the speech realm, this is best illustrated by the McGurkeffect, in which the combination of a visual /ga/ and an auditory/ba/ often results in the perception of a novel syllable (i.e., /da/(McGurk & MacDonald, 1976).

These interactions between the senses, including the illusoryexamples described above, are highly dependent upon the charac-teristics of the stimuli that are combined. These findings wereoriginally reported in animal models with neuronal recordings,showing that the more spatially congruent (Meredith & Stein,1986a) and temporally synchronous (Meredith, Nemitz, & Stein,1987) the stimuli are to one another, the larger the change (i.e.,gain) seen under multisensory circumstances. In addition, the lesseffective the individual stimuli are in generating a response, the

This article was published Online First March 5, 2012.Ryan A. Stevenson, Department of Hearing and Speech Sciences, Van-

derbilt University Medical Center; Raquel K. Zemtsov, UndergraduateNeuroscience Program, Vanderbilt University; Mark T. Wallace, Depart-ment of Hearing and Speech Sciences, Vanderbilt University MedicalCenter, Vanderbilt University Kennedy Center, Department of Psychology,Vanderbilt Brain Institute, Vanderbilt University.

This research was supported in part by NIH NIDCD Research Grants1R34 DC010927 and 1F32 DC011993. Thanks to Laurel Stevenson andJeanne Wallace for their support, to Lena Quinto for the McGurk stimuli,to Juliane Krueger Fister, Justin Siemann, and Aaron Nidiffer for assis-tance running subjects, to Zachary Barnett for technical assistance, and toour reviewers for their insights on this work.

Correspondence concerning this article should be addressed to Ryan A.Stevenson, 7110 MRB III BioSci Bldg, 465 21st Ave. South, Nashville, TN37232. E-mail: [email protected]

Journal of Experimental Psychology: © 2012 American Psychological AssociationHuman Perception and Performance2012, Vol. 38, No. 6, 1517–1529

0096-1523/12/$12.00 DOI: 10.1037/a0027339

1517

greater the multisensory effect when they are combined (Meredith& Stein, 1986b). These effects have been extended to humanpsychophysical and neuroimaging studies with a great deal ofinterest directed of late to the temporal features of these multisen-sory interactions. In large measure these studies have paralleled theanimal model work referenced above, showing that the largestmultisensory interactions occur when stimuli are combined inclose temporal proximity. Evidence for this has been gathered inbehavioral studies (Conrey & Pisoni, 2004; Conrey & Pisoni,2006; Dixon & Spitz, 1980; Foss-Feig et al., 2010; Hillock, Pow-ers, & Wallace, 2011; Keetels & Vroomen, 2005; Powers, Hillock,& Wallace, 2009; van Atteveldt, Formisano, Blomert, & Goebel,2007; van Wassenhove, Grant, & Poeppel, 2007; Wallace et al.,2004; Zampini, Guest, Shore, & Spence, 2005), in event-relatedpotentials (Schall, Quigley, Onat, & Konig, 2009; Senkowski,Talsma, Grigutsch, Herrmann, & Woldorff, 2007; Talsma, Sen-kowski, & Woldorff, 2009), and in functional MRI studies (James& Stevenson, 2012; Macaluso, George, Dolan, Spence, & Driver,2004; Miller & D’Esposito, 2005; Stevenson, Altieri, Kim, Pisoni,& James, 2010; Stevenson, VanDerKlok, Pisoni, & James, 2011).In addition to these findings, it has been shown that multisensorystimuli presented in close temporal proximity are often integratedinto a single, unified percept (Andersen, Tiippana, & Sams, 2004;McGurk & MacDonald, 1976; Shams et al., 2000; Stein & Mer-edith, 1993). This perceptual binding over a given temporal inter-val is best captured in the construct of a multisensory temporalbinding window (TBW; Colonius & Diederich, 2004; Foss-Feig etal., 2010; Hairston, Burdette, Flowers, Wood, & Wallace, 2005;Powers et al., 2009).

The TBW is a probabilistic concept, reflecting the likelihoodthat two stimuli from different modalities will or will not result inalterations in behavior and/or perception that likely reflect theperceptual binding of these stimuli across a range of stimulusasynchronies. The TBW typically exhibits a number of character-istic features (for review, see Vroomen & Keetels, 2010). First, asthe temporal interval between the stimuli (i.e., the stimulus onsetasynchrony, SOA) increases, the likelihood of multisensory inter-actions decreases (Conrey & Pisoni, 2006; Hirsh & Fraisse, 1964;Keetels & Vroomen, 2005; Miller & D’Esposito, 2005; Powers etal., 2009; Spence, Baddeley, Zampini, James, & Shore, 2003;Stevenson et al., 2010; van Atteveldt, et al., 2007; van Wassen-hove, et al., 2007; Vatakis & Spence, 2006). This reflects thestatistics of the natural environment, where sensory inputs whichare not closely related in time are less likely to have originatedfrom a single event. Second, the TBW is often asymmetrical, withthe right side (reflecting conditions in which visual stimuli pre-cedes auditory stimuli) being wider (Conrey & Pisoni, 2006;Dixon & Spitz, 1980; Hillock et al., 2011; Stevenson et al., 2010;van Atteveldt, et al., 2007; van Wassenhove, et al., 2007; Vroomen& Keetels, 2010). This asymmetry has been attributed to thedifference in propagation times of visual and auditory stimulusenergies, where auditory lags increase as stimulus distance grows(Pöppel, Schill, & von Steinbuchel, 1990). In addition, there aresubstantial differences in the timing of the acoustic and visualtransduction processes, as well as in neural conduction times forthese modalities (Corey & Hudspeth, 1979; King & Palmer, 1985;Lamb & Pugh, 1992; Lennie, 1981). As such, the SOA at whichthe probability of multisensory interactions peaks is often alsoasymmetrical, generally being greatest when the auditory signal

slightly lags the visual signal (Dixon & Spitz, 1980; Meredith etal., 1987; Roach, Heron, Whitaker, & McGraw, 2011; Seitz, Hol-loway, & Watanabe, 2006; Zampini et al., 2005; Zampini, Shore,& Spence, 2003, 2005). Third, the width of the TBW varies withstimulus type, with simple stimuli with sharp rise times generallygiving rise to narrower TBWs (Hirsh & Sherrick, 1961; Keetels &Vroomen, 2005; Zampini et al., 2005; Zampini et al., 2003) rela-tive to those observed with more complex stimuli such as speech(Conrey & Pisoni, 2006; Miller & D’Esposito, 2005; Stevenson etal., 2010; van Atteveldt, et al., 2007; Van der Burg, Cass, Olivers,Theeuwes, & Alais, 2010; van Wassenhove, et al., 2007; Vatakis& Spence, 2006). Finally, as a prelude to the current study, inmany of the aforementioned studies it is clear that there arestriking individual differences in each of these attributes of theTBW, the most salient of which are its width, asymmetry, and theSOA at which the probability of integration peaks (Miller &D’Esposito, 2005).

While it is clear that more synchronous stimulus presentationsare typically associated with increased multisensory integration, itis unclear how individual differences in the TBW are related to anindividual’s integrative abilities. We suspect that the TBW and thecapacity for and magnitude of multisensory integration may berelated based upon three premises. First, the developmental chro-nology of the TBW (Hillock et al., 2011) and the susceptibility tomultisensory illusions such as the McGurk Effect (McGurk &MacDonald, 1976) follow a similar time course. Second, a numberof clinical populations exhibit comorbid impairments in both theTBW and in the perception of multisensory illusions. These in-clude autism (Foss-Feig et al., 2010; Kwakye, Foss-Feig, Cascio,Stone, & Wallace, 2011; Taylor, Isaac, & Milne, 2010), dyslexia(Bastien-Toniazzo, Stroumza, & Cave, 2009; Hairston et al.,2005), and schizophrenia (Foucher, Lacambre, Pham, Giersch, &Elliott, 2007; Pearl et al., 2009). Finally, the neural underpinningsof the sound-induced flash illusion (Bolognini, Rossetti, Casati,Mancini, & Vallar, 2011; Watkins, Shams, Josephs, & Rees, 2007;Watkins, Shams, Tanaka, Haynes, & Rees, 2006) and McGurkeffect (Beauchamp, Nath, & Pasalar, 2010; Nath & Beauchamp,2011b; Sekiyama, Kanno, Miura, & Sugita, 2003) share commonsubstrates that show modulations in functional activity with vari-ations in real and perceived asynchrony (Macaluso et al., 2004;Miller & D’Esposito, 2005; Stevenson et al., 2010; Stevenson etal., 2011) and with multisensory temporal processing (Dhamala,Assisi, Jirsa, Steinberg, & Kelso, 2007; Noesselt et al., 2007). Ourresults will be discussed specifically in light of each of theseprevious findings with the overarching hypothesis that individualswith a narrower TBW will encounter fewer occurrences of stim-ulus inputs that are perceived to be temporally synchronous andthat perceived synchronous events will be more tightly bound.Such a view results in the predictions that those with greaterprecision in the temporal constraints of their multisensory net-works (i.e., narrower temporal binding windows) will show (a) anincreased ability to dissociate, or failure to bind, sensory signalsthat are asynchronous (i.e., a correct rejection);.and (b) greaterenhancements associated with integration of synchronous sensoryinputs.

In the experiments described here, we seek to provide the firstcomprehensive evidence linking individual differences in multi-sensory temporal processes (i.e., the TBW) to differences in theindividuals’ audiovisual integration as indexed through two robust

1518 STEVENSON, ZEMTSOV, AND WALLACE

multisensory illusions. Specifically, we determined individuals’TBWs through the use of a simultaneity judgment task employinglow-level visual and auditory stimuli (i.e., flashes and beeps) and,in the same individuals, measured the strength of the McGurkeffect and the sound-induced flash illusion. Our hypothesis thatindividuals with narrower TBWs would exhibit greater magnitudeof multisensory integration with synchronous inputs predicts thatthese individuals should show a stronger McGurk effect. Con-versely, the hypothesis that individuals with narrower windowsshould be more temporally precise and subsequently fail to inte-grate asynchronous stimulus inputs leads to the prediction thatthese individuals will be less susceptible to the sound-inducedflash illusion, which requires the integration of auditory and visualinputs that are not temporally aligned. Our results suggest that thispattern is indeed seen in the relationship between the TBW and theability to form fused, unified percepts from multisensory inputs, afinding that has important mechanistic implications for the neuralnetworks that subserve multisensory binding.

Method

Participants

Participants included 31, right-handed, native-English speakingVanderbilt students (13 male, mean age � 21 years, SD � 4 years,age range � 18–35 years) and who were compensated with classcredit. Participants reported normal hearing and normal orcorrected-to-normal vision. All recruitment and experimental pro-cedures were approved by the Vanderbilt University InstitutionalReview Board.

Stimuli

All stimuli throughout the study were presented using MATLAB(MATHWORKS Inc., Natick, MA) software with the PsychophysicsToolbox extensions (Brainard, 1997; Pelli, 1997). Visual stimuli werepresented on a NEC MultiSync FE992 monitor at 100 Hz at a distanceof approximately 60 cm from the participants. Auditory stimuli werepresented binaurally via Phillips noise-cancelling SBC HN-110 head-phones. The duration of all visual and auditory stimuli, as well as theSOAs, was confirmed using a Hameg 507 oscilloscope with a pho-tovoltaic cell and microphone.

Two categories of audiovisual stimuli were presented: simplepaired flash-beep stimuli for the 2-AFC simultaneity judgmenttask and sound-induced flash illusion task and single syllableutterances for the McGurk task. The visual component of thesound-induced flash illusion task (Figure 1A) consisted of a whitering circumscribing the visual fixation cross on a black back-ground. Visual stimulus duration was 10 ms. Auditory stimuliconsisted of a 3500 Hz pure tone with a duration of 7 ms. Whenmultiple flashes were presented in the sound-induced flash illusiontask, they were separated by 43 ms intervals (Figure 1A).

Single syllable utterances for the McGurk task were selectedfrom a stimulus set that has been previously used successfully instudies of multisensory integration (Quinto, Thompson, Russo, &Trehub, 2010). Stimuli consisted of two audiovisual clips of afemale speaker uttering single instances of the syllables /ga/ and/ba/ (Figures 1B and 1C, respectively). Visual stimuli werecropped to square, down-sampled to a resolution of 400 � 400

pixels spanning 18.25 cm per side, and converted from color tograyscale. Presentations were shortened to 2 s, with each presen-tation containing the entire articulation of the syllable, includingprearticulatory gestures.

Procedure

Each participant completed three separate experiments within alarger battery of experiments that were spread over 4 days. Exper-iment orders were randomized across participants.

The first task was a two-alternative forced-choice (2AFC) si-multaneity judgment, used to derive the TBW. In this task, par-

Figure 1. Stimuli used for the simultaneity judgment, sound-inducedflash illusion and McGurk tasks. A. The stimuli used for the simultaneityjudgment task and the sound-induced flash illusion task are identical, andinclude the illumination of a white annulus (left) and a pure-tone beep(right). B and C. Frames from the visual stimuli as well as waveforms andspectrograms of the auditory stimuli used for the two syllables (/ga/ and/da/) in the McGurk task.

1519TEMPORAL PROCESSING AND AUDIOVISUAL ILLUSIONS

ticipants were presented with a series of flash-beep stimuli atparametrically varied SOAs. SOAs were 0, 10, 20, 50, 80, and100–300 ms in 50 ms increments, with both auditory-preceding-visual and visual-preceding-auditory stimulus presentations. Thus,a total of 21 SOAs were used, with 20 trials per condition. Theparticipants were asked to report whether each presentation wastemporally synchronous or asynchronous via a button press.

The second task was the sound-induced flash illusion paradigm.Participants were presented with a number of flashes and beepsand were asked to indicate how many flashes they perceived.Participants were explicitly instructed to respond according to theirvisual perception only. Trials included a single flash presentedwith 0–4 beeps as well as control presentations of 2–4 flasheswith either a single beep or no beep. Thus, a total of 11 conditionswere presented with 25 trials per condition. Participants respondedvia button press. In all conditions with auditory stimuli, the firstauditory beep was presented synchronously with the visual flash.

The third and final task was a test of the McGurk effect.Twenty-six of the participants completed this task. Participantswere presented with audio-only (with the fixation cross remainingon the screen), visual-only, and audiovisual versions of the /ba/and /ga/ stimuli described above. Additionally, they were alsopresented with a McGurk stimulus in which the visual /ga/ waspresented with the auditory /ba/. Thus, a total of seven stimulusconditions were presented, with 20 trials in each condition. Par-ticipants were asked to report what syllable they perceived byindicating what the first letter of the syllable was via button press,including the options to respond with /ba/, /ga/, /da/, and /tha/.

For all tasks, participants were seated inside an unlit soundattenuating WhisperRoom™ (Model SE 2000; Whisper Room Inc,Morristown, TN), were asked to fixate a central cross, and weremonitored by close circuit infrared cameras throughout the exper-iment to ensure fixation. Each task began with an instructionscreen, after which the participant was asked if he or she under-stood the instructions. Each trial began with a fixation screen for500 ms plus a random jitter ranging from 1 to 1000 ms, after which

the stimulus(i) was presented and then followed by an additional250 ms fixation screen. A response screen was then presentedduring which the participant responded via button press (“Was thepresentation synchronous,” “How many flashes did you see,” or“What did she say,” for the perceived simultaneity, sound-inducedflash illusion, and McGurk tasks, respectively). Following theresponse, the fixation screen beginning the subsequent trial waspresented. The order of trial types was randomized with all tasks,and was randomly generated for each participant for each exper-iment.

Analysis

Responses from the simultaneity judgment task in which par-ticipants reported if they perceived the audiovisual presentation aseither simultaneous or not were used to calculate TBWs for eachparticipant. The first step in this process was to calculate a rate ofperceived simultaneity with each SOA, which was simply thepercentage of trials in a given condition in which the individualreported that the presentation was simultaneous. Two psychomet-ric sigmoid functions were then fit to the rates of perceivedsimultaneity across SOAs, one to the audio-first presentations anda second to the visual-first presentations which will be referred toas the left (AV) and right (VA) windows, respectively (Figure 2A).This data was used to create both best-fit sigmoids included thesimultaneous, 0 ms SOA condition. These best-fit functions werecalculated using the glmfit function in MATLAB. Each partici-pant’s individual left and right TBWs were then estimated as theSOA at which the best-fit sigmoid’s y-value equaled a 50% rate ofperceived simultaneity (Stevenson, Zemtsov, & Wallace, underreview). Group TBWs were then calculated by taking the arith-metic mean of the respective left and right TBWs from eachparticipant.

Responses to the sound-induced flash illusion were calculatedusing trials in which a single flash was presented with varyingnumbers of beeps (0–4). For each of these five trial types, each

Figure 2. Temporal Binding Windows defined using the simultaneity judgment task. A. Group mean curvesused to derive each individual’s temporal binding windows (standard deviations shown in gray). Note that thetotal window is comprised of individually defined and fitted left and right windows. B. Correlation in the sizeof each individual’s left and right temporal binding windows.

1520 STEVENSON, ZEMTSOV, AND WALLACE

participant’s mean response was calculated, and a group averageresponse was calculated as a mean of individuals’ means (Figure3A). An additional index was calculated to describe each individ-ual’s susceptibility to the illusion. This index consisted of theindividual’s average change from the nonillusory condition (oneflash paired with one beep) to each of the three illusory conditions(one flash paired with 2–4 beeps) relative to the number ofillusion-inducing beeps. The usefulness of this index is twofold.First, it accounts for any individuals who may have consistentlyreported perceiving multiple flashes when there was only a singleflash presented. Second, this index gives a single numerical indexfor each individual across conditions, ensuring that the pattern ofcorrelations reported here are driven by a consistent relationshipwithin each individual (e.g., ensuring that the correlation betweenthe 2-beep and 3-beep illusory condition and the TBW are not eachdriven by different individual outliers in those specific conditions).The susceptibility index was calculated using the following equa-tion,

Sl �

�R2 � R1

1�

R3 � R1

2�

R4 � R1

3 �3

where Rn represents the individuals mean response to the conditionwith one flash and n beeps. This index gives a single metric for theindividual’s susceptibility to the illusion where an index of 1would reflect a scenario in which each additional auditory cueincreased the number of perceived flashes by 1, and an index of 0would reflect a scenario in which the addition of auditory cues didnot change the visual perception. It should be noted that it waspossible that participants could deduce that every time multiplebeeps were heard, the correct response was to report a single visualflash; however, these results suggest that this did not occur.

Responses to the visual-only conditions were calculated usingtrials in which 1–4 flashes were presented without beeps. For eachof these four trial types, each participant’s mean responses werecalculated, and group average responses were calculated as a meanof individuals’ means (Figure 3B).

Responses with the speech stimuli were used to calculate eachparticipant’s strength of the McGurk effect. First, a baseline wascalculated for each participant equal to the percentage of unisen-sory trials in which they perceived a unisensory /ba/ or /ga/ aseither /da/ or /tha/ (Figure 4B). Next, the percentage of trials inwhich each participant reported the fused (i.e., McGurk) percept,an auditory /ba/ paired with a visual /ga/ as either /da/ or /tha/ wascalculated (Figure 4A). For simplicity, we will henceforth refer toillusory perceptions only as /da/. Each participant’s McGurk scorewas calculated as the percentage of perceived McGurk illusionsrelative to their unisensory baseline using the equation:

p�AV McGurk� � �1 � p�Unisensory / da/��,

where p(AV McGurk) is the individual’s rate of McGurk perceptswith audiovisual McGurk stimuli, and p(Unisensory / da/) is therate at which the individual reported perceiving /da/ with unisen-sory /ba/ and /ga/ stimuli. This choice of baseline was made due tothe ambiguity of the visual /ga/, where it was often misperceivedas /da/ (Figure 4A), a misperception that has been frequentlyreported in the literature (Erber, 1972; Massaro, 1998; Massaro,2004; Summerfield, 1987). When all unisensory components pro-

Figure 3. Sound-induced flash illusion and its relationship to theTBW. A. Group mean responses show a rise in the perceived number offlashes as a function of the number of beeps. B. Control trials show a risein the number of perceived flashes when the number of visual stimuli isincreased. C. Correlations between the strength of each individual’s per-ceived illusion (i.e., perceived number of flashes) and left and right TBWwidth for the 1, 2, 3, and 4 beep conditions. Note the presence of significantcorrelations for each of the illusory (i.e., � 2 beep) conditions. D. Corre-lation between individuals’ sound-induced flash illusion susceptibility in-dex and their left and right temporal binding window.

1521TEMPORAL PROCESSING AND AUDIOVISUAL ILLUSIONS

duce highly accurate responses, no such unisensory baseline isneeded and one can simply compare the illusory percepts (for anexample, see Grant, Walden, & Seitz, 1998). However, our choiceof baseline ensures that a participant’s perceptions of the McGurkeffect are not due to incorrect perceptions of the unisensory com-ponents of the McGurk stimuli. Additionally, percent correct re-sponses were calculated to the audiovisual congruent /ba/ and /ga/presentations to ensure recognition of the utterances (Figure 4B).

Results

Determination of the Temporal Binding Window

The temporal binding window (TBW) was calculated for eachparticipant on the basis of their simultaneity judgment responses,with the full window being created by merging the sigmoid func-tions used to create the left (AV) and right (VA) sides of theresponse distributions (see methods for additional detail on thecreation of these windows). On average, the width of the rightTBW (215 ms � 19 ms SEM) was significantly larger than the leftTBW (156 ms � 10 ms SEM; t � 4.56, p � .0001; Figure 2A).The relationship between left and right TBWs for individuals wasalso tested, with the two showing a significant correlation (seeFigure 2B for detailed statistics).

Correlation Between the TBW and Sound-InducedFlash Illusion Performance

In these same individuals, the pattern of results on the sound-induced flash illusion task revealed results in keeping with the

original description of the illusion (Shams et al., 2000). Thus, andas illustrated in Figure 3A and Table 1, the pairing of a single flashwith multiple beeps drove perceptual reports toward multipleflashes. Trials on which a single flash was presented concomitantwith either no beep or a single auditory beep were associated withthe least number of perceived flashes, and did not differ from eachother. Trials with multiple auditory beeps were associated with agreater number of perceived flashes than in the 0- and 1-beepconditions. Additionally, each addition of a beep significantlyincreased the number of perceived flashes. For detailed statistics ofeach pairwise comparison, see Table 1.

Responses with the control, visual-only conditions that wereinterleaved with the sound-induced flash illusion illusory trialswere also analyzed (Figure 3B). As for the illusory trials, resultswere in keeping with the original report (Shams et al., 2000), withparametric increases in perceived flashes as the actual number offlashes increased. For detailed statistics of each pairwise compar-ison, see Table 2.

Each individual’s left and right TBWs were then correlated withtheir sound-induced flash illusion responses as measured with one,two, three, and four flashes (Figure 3C). No significant correlationwas seen in the nonillusory, single-flash, single-beep condition.However, in each of the illusory conditions, the width of the TBWwas significantly correlated with sound-induced flash illusionstrength at an -value of 0.05. When corrected for multiple com-parisons (10 comparisons including susceptibility index correla-tions, Greenhouse-Geisser corrected � .005), only the rightTBW was significantly correlated with sound-induced flash illu-sion strength. In all cases, the strength of the illusion decreased

Figure 4. McGurk Effect and its relationship to the TBW. A. Group mean responses to the incongruent pairingof auditory /ba/ with visual /ga/ show a high incidence of reports of the illusory fused percept /da/ B. Group meanresponses to the congruent multisensory and the individual unisensory components. C. Correlation of eachindividual’s McGurk effect score (i.e., % of fusions) with their left and right temporal binding window.

1522 STEVENSON, ZEMTSOV, AND WALLACE

with narrower TBWs. See Figure 3C for detailed statistics for eachcorrelation.

In addition to these singular measures, a more global suscepti-bility index for the illusion was calculated (see Method) and wasalso found to vary greatly between individuals, with a range from0 (no illusory reports) to 0.85 (high degree of bias associated withnumber of beeps). The mean susceptibility index was 0.38 with astandard deviation of 0.24. Supporting the individual measures,this index showed a correlation with both the left and right TBW(Figure 3D), with the correlation only being significant for theright TBW after correction for multiple comparisons. For detailedstatistics, see Figure 3D.

Correlation Between the TBW and McGurkPerformance

In addition to sound-induced flash illusion performance, behav-ioral responses associated with the presentation of auditory, visual,and audiovisual speech syllables were also analyzed for interac-tions. The paradigm was structured to examine the McGurk effect,in which the presentation of auditory /ga/ and visual /ba/ syllablesoften give rise to a novel and synthetic percept (i.e., /da/). Of theindividuals that reported such fusions (n � 21), the illusory /da/percept was reported on 58.4% of trials when the auditory /ga/ andvisual /ba/ were paired (Figure 4A). In contrast to these incongru-ent trials, accuracy rates for congruent /ba/ and /ga/ multisensorytrials were near 100%. In addition, accuracy rates were very highfor unisensory (i.e., visual alone, auditory alone) presentations,with the exception of the visual /ga/, which was more ambiguous(Figure 4B). When correcting for these inaccurate perceptions ofthe unisensory components of the McGurk stimuli (see Method),the average rate of perceptual fusion was 52.3% (SE � 6.7%).Additionally, with McGurk stimulus presentations, participants

were more likely to perceive /da/ than either /ba/ or /ga/ (p �8.99e5, t � 4.88; p � .03, t � 2.44, respectively).

Similar to the sound-induced flash illusion analysis, the rate ofeach individual’s perception of the McGurk effect was then cor-related with his or her left and right TBWs (Figure 4C). There wasno correlation between left TBWs and a participant’s rate ofreporting the McGurk effect, even at an uncorrected � .05, butthere was a significant negative correlation between width of theparticipant’s right TBW and his or her rate of McGurk perception.Intriguingly, this relationship is the opposite of that seen for thesound-induced flash illusion, with the rate of the McGurk effectdecreasing with wider right TBWs. See Figure 4C for detailedstatistics.

Correlations Between the Sound-Induced FlashIllusion and McGurk Tasks

Finally, the rate of each individual’s perception of the McGurkeffect was correlated with their perception of the sound-inducedflash illusion (see Figure 5). Correlations relating the McGurkeffect with the 1–4 beep conditions as well as the susceptibilityindex were calculated. Significant negative correlations werefound with the 2–4 beep conditions as well as with the suscepti-bility index (five comparisons, corrected � .01). See Figure 5 fordetailed statistics of each correlation.

Discussion

In these experiments, we have described the relationship be-tween individuals’ temporal constraints in integrating multisensorycues (i.e., the temporal binding window) and their strength ofintegrating or binding these cues as indexed through illusorypercepts. Three main conclusions can be drawn from these data.

Table 1Sound-Induced Flash Illusion Responses

# beeps Mean (SEM) perceived flashes

� from 1beep

� from 2beeps

� from 3beeps

� from 4beeps

t p t p t p t p

0 1.12 (0.03) 0.08 n.s. 9.95 3.33e10 7.77 3.98e8 7.41 7.20e8

1 1.12 (0.03) — — 9.41 1.87e10 7.49 2.40e8 7.33 3.63e8

2 1.66 (0.07) 9.41 1.87e10 — — 2.71 0.02 3.03 0.0053 1.81 (0.10) 7.49 2.40e8 2.71 0.02 — — 2.24 0.044 1.87 (0.12) 7.33 3.63e8 3.03 0.005 2.24 0.04 — —

Table 2Responses to Visual-Only Control Trials

# flashes Mean (SEM) perceived flashes

� from 1 flash � from 2 flashes � from 3 flashes � from 4 flashes

t p t p t p t p

1 1.12 (0.03) — — 12.04 5.04e13 20.37 4.07e19 21.93 5.05e20

2 1.74 (0.05) 12.04 5.04e13 — — 10.43 2.49e11 15.06 2.99e15

3 2.43 (0.06) 20.37 4.07e19 10.43 2.49e11 — — 11.73 2.56e12

4 2.87 (0.07) 21.93 5.05e20 15.06 2.99e15 11.73 2.56e12 — —

1523TEMPORAL PROCESSING AND AUDIOVISUAL ILLUSIONS

First, there is a clear and significant correlation between the widthof the temporal binding window (TBW), as measured through asimultaneity judgment task, and the strength of audiovisual inte-gration with synchronous sensory stimuli, as measured via theMcGurk effect. Second, the width of the TBW was correlated with

the temporal precision of individual’s multisensory integration, inthat individuals with narrower TBWs were able to better dissociateasynchronous audiovisual inputs, measured via the sound-inducedflash illusion. Third, and perhaps most intriguing, these relation-ships are specific to the right side of the TBW (i.e., reflecting whenan auditory stimulus lags a visual stimulus).

The TBW as a construct relates to the ability of our perceptualsystems to bind inputs from multiple sensory modalities into asingular perceptual gestalt. This ability is constrained by the tem-poral relationship of the unisensory components of an audiovisualstimulus pair; within a limited temporal range of offsets, binding orintegration is highly likely to occur. However, as the offset in-creases, the probability of binding decreases. Both current andprior work has shown marked variability in the size of the TBWacross individuals, suggesting that the degree of tolerance fortemporal asynchrony in multisensory inputs is highly variable(Conrey & Pisoni, 2006; Dixon & Spitz, 1980; Miller &D’Esposito, 2005; Powers et al., 2009; Stevenson et al., 2010). Theobservation of these striking differences coupled with previousfindings led to the underlying hypotheses that motivated the cur-rent study—that a narrower TBW will be associated with anincreased strength of integration for synchronous sensory (i.e.,audiovisual) inputs due an increase in the uniqueness of suchperceived synchronous events. The corollary to this increase inintegration of synchronous stimuli is that these individuals shouldalso be better able to parse, or correctly fail to bind, stimulus inputsthat are not synchronous. Together these improvements in tempo-ral precision would manifest as greater gains in accuracy, reactiontime (RT), and other measures that can be indexed via multisen-sory tasks. These three previous findings included (a) the concur-rent developmental trajectories of the TBW and susceptibility ofmultisensory illusions, (b) clinical populations with comorbid im-pairments in multisensory integration and temporal processingdiscussed below, and (c) overlapping neural architecture support-ing the perception of multisensory illusions, perception of multi-sensory asynchrony, and multisensory temporal processing. Sup-port for such a relationship can be found in the current data relatingTBW width (specifically the width of the right TBW) with the rateof perceiving the McGurk effect. Individuals with narrowerTBWs, and thus, those individuals for which sensory signals thatare perceived as synchronous in the natural environment are moreunique, showed increases in their rates of integration as indexed bythe McGurk Effect.

Support for the idea that narrower TBWs are associated with abetter ability to dissociate inputs that are not synchronous is foundin the current data relating the TBW and the sound-induced flashillusion. This illusion contains an inherent asynchrony. Conse-quently, to perceive the illusion an individual must perceptuallybind an audiovisual stimulus pair that is temporally asynchronous.Individuals with narrower windows were found to be less likely toperceive the illusion, suggesting that they are more likely todissociate temporally asynchronous inputs. More specifically, thesound-induced flash illusion is comprised of the simultaneouspresentation of a flash and a beep followed by one to threesuccessive beeps. As such, there is an inherent asynchrony withinthe stimulus structure used to generate the illusion. Given therequirement that an individual must bind temporally nonsimulta-neous events in order to perceive the illusion, an individual that hasa wider TBW (and thus, is more likely to bind nonsimultaneous

Figure 5. Relationship between the McGurk Effect sound-induced flashillusion. Individual’s rate of perception of the McGurk Effect were in-versely correlated with the strength of their perception of the sound-induced flash illusion in conditions with 2–4 beeps, as well as with thesound-induced flash illusion susceptibility index.

1524 STEVENSON, ZEMTSOV, AND WALLACE

inputs) is more likely to perceive the illusion. In the case of thethree- and four-beep conditions, the time interval between the firstand last auditory onsets was 100 and 150 ms, respectively, outsideof many individuals’ TBW measured in this study and in previousstudies using simple stimuli (Hirsh & Sherrick, 1961; Keetels &Vroomen, 2005; Zampini et al., 2005; Zampini et al., 2003). Thiswas further supported by the finding that as the number of beepsincreased from two to four, and thus, the inherent asynchrony inthe sound-induced flash illusion increased, the correlation betweenthe TBW and strength of the illusion became stronger.

The finding that the strength of the McGurk effect and sound-induced flash illusion were both significantly correlated with theright but not the left TBW is intriguing. In terms of multisensoryevents that occur in the natural environment, auditory and visualsignals frequently have significant differences in their time ofimpact on neural circuits, a function of the different propagationtimes for their energies and the differences in the transduction andconduction times of signals in the different sensory systems. Thus,as a visual-auditory event happens further from the observer, thearrival time of auditory information (regardless of brain structure)is increasingly delayed. Given that real world events are specifiedin this way (i.e., where auditory signals are progressively delayedrelative to visual signals), it should come as little surprise that theright TBW (in which auditory signals lag visual signals) is widerthan the left, or that the right TBW is related to the strength of theintegration of auditory and visual signals. Further evidence for theright TBW being more ethologically relevant is seen in the abilityof perceptual learning to narrow the right but not left TBW(Powers et al., 2009).

In addition to these observations in adults, developmental stud-ies also highlight the differences between the left and right halvesof the temporal binding window. Early in development, the TBWis markedly wider than the mature TBW. As maturation progresseswith exposure to the inherent temporal statistics of the environ-ment, an asymmetry develops through a narrowing of the TBW,with narrowing of the left side preceding the right (Hillock et al.,2011). Furthermore, the development of the TBW appears toparallel the development of the perception of the McGurk effect.While children as young as four to five months have been reportedto perceive the McGurk effect (Burnham & Dodd, 2004; Rosen-blum, Schmuckler, & Johnson, 1997), younger children are lesslikely to perceive the effect than adolescents and adults (Massaro,1984; Massaro, Thompson, Barron, & Laren, 1986; McGurk &MacDonald, 1976; Tremblay et al., 2007), and adolescents are lesslikely to perceive the effect than adults (Hillock et al., 2011).These parallels in the developmental trajectories of the TBW andthe perception of multisensory illusions provide further supportthat the two may be mechanistically related.

The body of literature focusing on the neural architecture un-derlying these processes also suggests that there may be a rela-tionship between the TBW and the strength of multisensory illu-sions. One of the primary brain regions implicated in the fusion ofauditory and visual information is the posterior superior temporalsulcus (pSTS; Beauchamp, 2005; Beauchamp, Argall, Bodurka,Duyn, & Martin, 2004; Beauchamp, Lee, Argall, & Martin, 2004;James, Stevenson, & Kim, in press; James, Stevenson, & Kim,2009; James, VanDerKlok, Stevenson, & James, 2011; Macalusoet al., 2004; Miller & D’Esposito, 2005; Nath & Beauchamp,2011a; Stevenson, Geoghegan, & James, 2007; Stevenson &

James, 2009; Stevenson, Kim, & James, 2009; Stevenson et al.,2011; Wallace & Murray, 2011; Werner & Noppeney, 2009,2010). This region shows greater activation when individuals arepresented with multisensory stimuli relative to unisensory stimuli,and the activity in the pSTS is specifically modulated when theauditory and visual presentations are bound into a unified percept.Furthermore, fMRI studies of the McGurk effect have also impli-cated the pSTS as a region in which neural activity is correlatedwith the perception (or lack thereof) of the McGurk effect (Beau-champ et al., 2010; Nath & Beauchamp, 2011b; Sekiyama et al.,2003). Likewise, a series of imaging studies of the sound-inducedflash illusion have implicated the STS (in addition to occipitalregions including V1) as a region in which neural activity iscorrelated with the perception (or lack thereof) of the audiovisualillusion (Bolognini et al., 2011; Watkins et al., 2007; Watkins etal., 2006).

Studies of the impact that temporal simultaneity has on multi-sensory processing have also implicated the pSTS. Thus, varia-tions in the temporal disparity between cross-modal stimuli havebeen shown to modulate the amount of activity measured in pSTSin a number of studies (Macaluso et al., 2004; Miller &D’Esposito, 2005; Stevenson et al., 2010; Stevenson et al., 2011).The findings that variation in both the temporal structure of mul-tisensory stimuli as well as reports of multisensory illusions mod-ulating neural activity in similar brain regions further suggest thata link between the width of the TBW and the strength of bindingprocesses that are indexed by multisensory illusions.

Finally, a connection between the width of an individual’s TBWand that person’s ability to perceive multisensory illusions has alsobeen seen in clinical populations in which impairments in multi-sensory processing have been seen, including autism spectrumdisorders (ASD; Foss-Feig et al., 2010; Kwakye et al., 2011),dyslexia (Bastien-Toniazzo et al., 2009), and schizophrenia(Bleich-Cohen, Hendler, Kotler, & Strous, 2009; Stone et al.,2011; Williams, Light, Braff, & Ramachandran, 2010). In each ofthese conditions, sensory impairments include dysfunctions inperceptual fusion. Perhaps the strongest evidence has been seen inautism, in which individuals are often characterized by an impairedability to combine multisensory information into a unified perceptand, more specifically, show a weakened ability to perceive theMcGurk effect (Mongillo et al., 2008; Taylor et al., 2010; Wil-liams, Massaro, Peel, Bosseler, & Suddendorf, 2004). Recentevidence also shows that autistic individuals exhibit an atypicallywide TBW (Foss-Feig et al., 2010; Kwakye et al., 2011). As in theevidence discussed above for typical populations, the relationshipbetween the TBW and the ability to create unified multisensoryperceptions in these clinical populations converges with neuralevidence. Studies investigating the neural correlates of autismshow irregularities in regions known to be involved with multi-sensory integration such as the pSTS (Boddaert et al., 2003;Boddaert et al., 2004; Boddaert & Zilbovicius, 2002; Gervais et al.,2004; Levitt et al., 2003; Pelphrey & Carter, 2008a, 2008b). Again,this finding of atypically wide TBWs being associated with de-creases in the perception of multisensory illusions provides addi-tional evidence for a mechanistic relationship between TBWs andstrength of an individual’s multisensory integration, and also sug-gests it may have clinical implications. This evidence has ledmultiple investigators to propose that the impairments in multi-sensory binding stem from deficits in temporal processing or that

1525TEMPORAL PROCESSING AND AUDIOVISUAL ILLUSIONS

both deficits have a common etiology (Brunelle, Boddaert, &Zilbovicius, 2009; Kwakye et al., 2011; Stevenson et al., 2011;Zilbovicius et al., 2000). Given this relationship between temporalprocessing and multisensory integration, and the known behavioraland perceptual benefits of multisensory integration, perceptualtraining focused on improving multisensory temporal processingas has been carried out in TD individuals (Powers et al., 2009) mayprove to be a useful means to improve sensory function, and thehigher order processes dependent on multisensory binding, in thesepopulations.

Outside of the relationship between multisensory illusions andthe TBW, the measurement of the TBW produced several addi-tional noteworthy findings. Our data replicated the commonlyshown finding that the right side of the TBW is wider than the leftside (Figure 2A). Our data also show that while the absolute valuesof these two sides are different, these widths remain proportionateacross individuals as can be seen in the correlation between indi-viduals’ left and right TBWs (Figure 2B). This finding suggeststhat while there are notable differences in synchrony perceptionand temporal binding dependent upon which stimulus modality ispresented first, the width of the total TBW reflects a singleunderlying cognitive process.

The data presented here provide strong supportive evidence thatthe temporal processing of multiple sensory signals and the merg-ing of these signals into a single, unified perception, are highlyrelated. Individuals with a narrower TBW will statistically encoun-ter fewer occurrences of stimulus inputs that are perceived to betemporally synchronous and subsequently, perceived synchronoussensory signals will be more unique, leading to greater perceptualbinding of synchronous inputs, and increased ability to dissociateasynchronous inputs. This finding has strong relevance for theneural networks subserving multisensory processing, in that theysuggest a common mechanistic link. In addition, the clinical im-plications of this work, coupled with previous demonstrationshighlighting the plasticity of the TBW, suggest that through ma-nipulations of the TBW one may be able to strengthen the abilityof an individual to integrate information from multiple sensorymodalities.

References

Andersen, T. S., Tiippana, K., & Sams, M. (2004). Factors influencingaudiovisual fission and fusion illusions. Cognitive Brain Research, 21,301–308. doi:S0926641004001636[pii]10.1016/j.cogbrainres.2004.06.004

Bastien-Toniazzo, M., Stroumza, A., & Cave, C. (2009). Audio-visualperception and integration in developmental dyslexia: An exploratorystudy using the McGurk effect. Current Psychology Letters, 25, 1–14.

Beauchamp, M. S. (2005). Statistical criteria in FMRI studies of multisen-sory integration. Neuroinformatics, 3, 93–113. doi:10.1385/NI:3:2:093

Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H., & Martin, A.(2004). Unraveling multisensory integration: Patchy organization withinhuman STS multisensory cortex. Nature Neuroscience, 7, 1190–1192.doi:10.1038/nn1333

Beauchamp, M. S., Lee, K. E., Argall, B. D., & Martin, A. (2004).Integration of auditory and visual information about objects in superiortemporal sulcus. Neuron, 41, 809 – 823. doi:10.1016/S0896-6273(04)00070-4

Beauchamp, M. S., Nath, A. R., & Pasalar, S. (2010). fMRI-Guidedtranscranial magnetic stimulation reveals that the superior temporal

sulcus is a cortical locus of the McGurk effect. Journal of Neuroscience,30, 2414–2417. doi:30/7/2414[pii]10.1523/JNEUROSCI.4865-09.2010

Bleich-Cohen, M., Hendler, T., Kotler, M., & Strous, R. D. (2009).Reduced language lateralization in first-episode schizophrenia: An fMRIindex of functional asymmetry. Psychiatry Research, 171, 82–93. doi:S0925-4927(08)00046-2[pii]10.1016/j.pscychresns.2008.03.002

Boddaert, N., Belin, P., Chabane, N., Poline, J. B., Barthelemy, C.,Mouren-Simeoni, M. C., . . . Zilbovicius, M. (2003). Perception ofcomplex sounds: Abnormal pattern of cortical activation in autism.American Journal of Psychiatry, 160, 2057–2060. doi:10.1176/appi.ajp.160.11.2057

Boddaert, N., Chabane, N., Gervais, H., Good, C. D., Bourgeois, M.,Plumet, M. H., . . . Zilbovicius, M. (2004). Superior temporal sulcusanatomical abnormalities in childhood autism: A voxel-based morphom-etry MRI study. Neuroimage, 23, 364–369. doi:10.1016/j.neuroimage.2004.06.016 S1053811904003258 [pii]

Boddaert, N., & Zilbovicius, M. (2002). Functional neuroimaging andchildhood autism. Pediatric Radiology, 32, 1–7. doi:10.1007/s00247-001-0570-x

Bolognini, N., Rossetti, A., Casati, C., Mancini, F., & Vallar, G. (2011).Neuromodulation of multisensory perception: A tDCS study of thesound-induced flash illusion. Neuropsychologia, 49, 231–237. doi:S0028-3932(10)00495-1[pii]10.1016/j.neuropsychologia.2010.11.015

Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10,433–436. doi:10.1163/156856897X00357

Brunelle, F., Boddaert, N., & Zilbovicius, M. (2009). Autisme et imageriecerebrale [Autism and brain imaging]. Bulletin De l’Academie NationaleDe Medecine, 193, 287–297; discussion 297–288.

Burnham, D., & Dodd, B. (2004). Auditory-visual speech integration byprelinguistic infants: Perception of an emergent consonant in theMcGurk effect. Developmental Psychobiology, 45, 204 –220. doi:10.1002/dev.20032

Colonius, H., & Diederich, A. (2004). Multisensory interaction in saccadicreaction time: A time-window-of-integration model. Journal of Cogni-tive Neuroscience, 16, 1000–1009. doi:10.1162/0898929041502733

Conrey, B. L., & Pisoni, D. B. (2004). Detection of auditory-visual asyn-chrony in speech and nonspeech signals. In D. B. Pisoni (Ed.), Researchon spoken language processing (Vol. 26, pp. 71–94). Bloomington, IN:Indiana University.

Conrey, B. L., & Pisoni, D. B. (2006). Auditory-visual speech perceptionand synchrony detection for speech and nonspeech signals. Journal ofthe Acoustical Society of America, 119, 4065– 4073. doi:10.1121/1.2195091

Corey, D. P., & Hudspeth, A. J. (1979). Response latency of vertebrate haircells. Biophysical Journal, 26, 499–506. doi:S0006-3495(79)85267-4[pii]10.1016/S0006-3495(79)85267-4

Dhamala, M., Assisi, C. G., Jirsa, V. K., Steinberg, F. L., & Kelso, J. A.(2007). Multisensory integration for timing engages different brain net-works. Neuroimage, 34, 764 –773. doi:S1053-8119(06)00751-8[pii]10.1016/j.neuroimage.2006.07.044

Diederich, A., & Colonius, H. (2004). Bimodal and trimodal multisensoryenhancement: Effects of stimulus onset and intensity on reaction time.Perception and Psychophysics, 66, 1388 –1404. doi:10.3758/BF03195006

Dixon, N. F., & Spitz, L. (1980). The detection of auditory visual desyn-chrony. Perception, 9, 719–721. doi:10.1068/p090719

Erber, N. P. (1972). Auditory, visual, and auditory-visual recognition ofconsonants by children with normal and impaired hearing. Journal ofSpeech and Hearing Research, 15, 413–422.

Foss-Feig, J. H., Kwakye, L. D., Cascio, C. J., Burnette, C. P., Kadivar, H.,Stone, W. L., & Wallace, M. T. (2010). An extended multisensorytemporal binding window in autism spectrum disorders. ExperimentalBrain Research, 203, 381–389. doi:10.1007/s00221-010-2240-4

Foucher, J. R., Lacambre, M., Pham, B. T., Giersch, A., & Elliott, M. A.

1526 STEVENSON, ZEMTSOV, AND WALLACE

(2007). Low time resolution in schizophrenia Lengthened windows ofsimultaneity for visual, auditory and bimodal stimuli. SchizophreniaResearch, 97, 118 –127. doi:S0920-9964(07)00371-4[pii]10.1016/j.schres.2007.08.013

Gaver, W. W. (1993). What in the world do we hear?: An ecologicalapproach to auditory event perception. Ecological Psychology, 5, 1–29.doi:10.1207/s15326969eco0501_1

Gervais, H., Belin, P., Boddaert, N., Leboyer, M., Coez, A., Sfaello, I., . . .Zilbovicius, M. (2004). Abnormal cortical voice processing in autism.Nature Neuroscience, 7, 801–802. doi:10.1038/nn1291 nn1291 [pii]

Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speechrecognition by hearing-impaired subjects: Consonant recognition, sen-tence recognition, and auditory-visual integration. Journal of the Acous-tical Society of America, 103, 2677–2690. doi:10.1121/1.422788

Hairston, W. D., Burdette, J. H., Flowers, D. L., Wood, F. B., & Wallace,M. T. (2005). Altered temporal profile of visual-auditory multisensoryinteractions in dyslexia. Experimental Brain Research, 166, 474–480.doi:10.1007/s00221-005-2387-6

Hershenson, M. (1962). Reaction time as a measure of intersensory facil-itation. Journal of Experimental Psychology, 63, 289–293. doi:10.1037/h0039516

Hillock, A. R., Powers, A. R., & Wallace, M. T. (2011). Binding of sightsand sounds: Age-related changes in multisensory temporal processing.Neuropsychologia, 49, 461– 467. doi:S0028-3932(10)00521-X[pii]10.1016/j.neuropsychologia.2010.11.041

Hirsh, I. J., & Fraisse, P. (1964). Simultaneite et succession de stimuli heterogenes[Simultaneous character and succession of heterogenous stimuli]. L’AnneePsychologique, 64, 1–19. doi:10.3406/psy.1964.27144

Hirsh, I. J., & Sherrick, C. E., Jr. (1961). Perceived order in different sensemodalities. Journal of Experimental Psychology, 62, 423–432. doi:10.1037/h0045283

James, T. W., & Stevenson, R. A. (2012). The use of fMRI to assessmultisensory integration. In M. H. Wallace & M. M. Murray (Eds.),Frontiers in the neural basis of multisensory processes (pp. 131–146).London, UK: Taylor & Francis.

James, T. W., Stevenson, R. A., & Kim, S. (in press). Inverse effectivenessin multisensory processing. In B. E. Stein (Ed.), The new handbook ofmultisensory processes. Cambridge, MA: MIT Press.

James, T. W., Stevenson, R. A., & Kim, S. (2009, October). Assessingmultisensory integration with additive factors and functional MRI. Paperpresented at the The International Society for Psychophysics, Dublin,Ireland.

James, T. W., VanDerKlok, R. M., Stevenson, R. A., & James, K. H.(2011). Multisensory perception of action in posterior temporal andparietal cortices. Neuropsychologia, 49, 108 –114. doi:S0028-3932(10)00456-2[pii]10.1016/j.neuropsychologia.2010.10.030

Keetels, M., & Vroomen, J. (2005). The role of spatial disparity andhemifields in audio-visual temporal order judgments. ExperimentalBrain Research, 167, 635–640. doi:10.1007/s00221-005-0067-1

King, A. J., & Palmer, A. R. (1985). Integration of visual and auditoryinformation in bimodal neurones in the guinea-pig superior colliculus.Experimental Brain Research, 60, 492–500. doi:10.1007/BF00236934

Kwakye, L. D., Foss-Feig, J. H., Cascio, C. J., Stone, W. L., & Wallace,M. T. (2011). Altered auditory and multisensory temporal processing inautism spectrum disorders. Frontiers in Integrative Neuroscience, 4,129. doi:10.3389/fnint.2010.00129

Lamb, T. D., & Pugh, E. N., Jr. (1992). A quantitative account of theactivation steps involved in phototransduction in amphibian photorecep-tors. Journal of Physiology, 449, 719–758.

Lennie, P. (1981). The physiological basis of variations in visual latency.Vision Research, 21, 815–824. doi:0042-6989(81)90180-2[pii]

Levitt, J. G., Blanton, R. E., Smalley, S., Thompson, P. M., Guthrie, D.,McCracken, J. T., . . . Toga, A. W. (2003). Cortical sulcal maps inautism. Cerebral Cortex, 13, 728–735. doi:10.1093/cercor/13.7.728

Lovelace, C. T., Stein, B. E., & Wallace, M. T. (2003). An irrelevant lightenhances auditory detection in humans: A psychophysical analysis ofmultisensory integration in stimulus detection. Cognitive Brain Re-search, 17, 447–453. doi:S0926641003001605[pii]

Macaluso, E., George, N., Dolan, R., Spence, C., & Driver, J. (2004).Spatial and temporal factors during processing of audiovisual speech: APET study. Neuroimage, 21, 725–732. doi:10.1016/j.neuroimage.2003.09.049

Massaro, D. W. (1984). Children’s perception of visual and auditoryspeech. Child Development, 55, 1777–1788. doi:10.2307/1129925

Massaro, D. W. (1998). Perceiving talking faces: From speech perceptionto a behavioral principle. Cambridge, MA: MIT Press.

Massaro, D. W. (2004). From multisensory integration to talking heads andlanguage learning. In G. Calvert, C. Spence, & B. E. Stein (Eds.), Thehandbook of multisensory processes (pp. 153–176). Cambridge, MA:MIT Press.

Massaro, D. W., Thompson, L. A., Barron, B., & Laren, E. (1986).Developmental changes in visual and auditory contributions to speechperception. Journal of Experimental Child Psychology, 41, 93–113.doi:0022-0965(86)90053-6[pii]

McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices.Nature, 264, 746–748. doi:10.1038/264746a0

Meredith, M. A., Nemitz, J. W., & Stein, B. E. (1987). Determinants ofmultisensory integration in superior colliculus neurons. I. Temporalfactors. Journal of Neuroscience, 7, 3215–3229.

Meredith, M. A., & Stein, B. E. (1986a). Spatial factors determine theactivity of multisensory neurons in cat superior colliculus. Brain Re-search, 365, 350–354. doi:10.1016/0006-8993(86)91648-3

Meredith, M. A., & Stein, B. E. (1986b). Visual, auditory, and somato-sensory convergence on cells in superior colliculus results in multisen-sory integration. Journal of Neurophysiology, 56, 640–662.

Miller, L. M., & D’Esposito, M. (2005). Perceptual fusion and stimuluscoincidence in the cross-modal integration of speech. Journal of Neu-roscience, 25, 5884–5893. doi:10.1523/JNEUROSCI.0896-05.2005

Mongillo, E. A., Irwin, J. R., Whalen, D. H., Klaiman, C., Carter, A. S., &Schultz, R. T. (2008). Audiovisual processing in children with andwithout autism spectrum disorders. Journal of Autism and Developmen-tal Disorders, 38, 1349–1358. doi:10.1007/s10803-007-0521-y

Nath, A. R., & Beauchamp, M. S. (2011a). Dynamic changes in superiortemporal sulcus connectivity during perception of noisy audiovisualspeech. Journal of Neuroscience, 31, 1704 –1714. doi:31/5/1704[pii]10.1523/JNEUROSCI.4853-10.2011

Nath, A. R., & Beauchamp, M. S. (2011b). A neural basis for interindividualdifferences in the McGurk effect, a multisensory speech illusion. Neuroim-age, 59, 781–787. doi:S1053-8119(11)00785-3[pii]10.1016/j.neuroimage.2011.07.024

Nelson, W. T., Hettinger, L. J., Cunningham, J. A., Brickman, B. J., Haas,M. W., & McKinley, R. L. (1998). Effects of localized auditory infor-mation on visual target detection performance using a helmet-mounted display. Human Factors, 40, 452– 460. doi:10.1518/001872098779591304

Noesselt, T., Rieger, J. W., Schoenfeld, M. A., Kanowski, M., Hinrichs, H.,Heinze, H. J., & Driver, J. (2007). Audiovisual temporal correspondencemodulates human multisensory superior temporal sulcus plus primarysensory cortices. Journal of Neuroscience, 27, 11431–11441. doi:27/42/11431[pii]10.1523/JNEUROSCI.2252-07.2007

Pearl, D., Yodashkin-Porat, D., Katz, N., Valevski, A., Aizenberg, D.,Sigler, M., . . . Kikinzon, L. (2009). Differences in audiovisual integra-tion, as measured by McGurk phenomenon, among adult and adolescentpatients with schizophrenia and age-matched healthy control groups.Comprehensive Psychiatry, 50, 186–192. doi:S0010-440X(08)00089-8[pii]10.1016/j.comppsych.2008.06.004

Pelli, D. G. (1997). The VideoToolbox software for visual psychophysics:

1527TEMPORAL PROCESSING AND AUDIOVISUAL ILLUSIONS

Transforming numbers into movies. Spatial Vision, 10, 437–442. doi:10.1163/156856897X00366

Pelphrey, K. A., & Carter, E. J. (2008a). Brain mechanisms for socialperception: Lessons from autism and typical development. Annals of theNew York Academy of Sciences, 1145, 283–299. doi:NYAS1145007[pii]10.1196/annals.1416.007

Pelphrey, K. A., & Carter, E. J. (2008b). Charting the typical and atypicaldevelopment of the social brain. Development and Psychopathology, 20,1081–1102. doi:S0954579408000515[pii]10.1017/S0954579408000515

Pöppel, E., Schill, K., & von Steinbuchel, N. (1990). Sensory integrationwithin temporally neutral systems states: A hypothesis. Naturwissen-schaften, 77, 89–91. doi:10.1007/BF01131783

Powers, A. R., 3rd, Hillock, A. R., & Wallace, M. T. (2009). Perceptualtraining narrows the temporal window of multisensory binding. Journalof Neuroscience, 29, 12265–12274. doi:29/39/12265[pii]10.1523/JNEUROSCI.3501-09.2009

Quinto, L., Thompson, W. F., Russo, F. A., & Trehub, S. E. (2010). Acomparison of the McGurk effect for spoken and sung syllables. Atten-tion, Perception and Psychophysics, 72, 1450 –1454. doi:72/6/1450[pii]10.3758/APP.72.6.1450

Roach, N. W., Heron, J., Whitaker, D., & McGraw, P. V. (2011). Asyn-chrony adaption reveals neural population code for audio-visual timing.Proceedings of the Royal Society, 278, 9. doi:10.1098/rspb.2010.1737

Rosenblum, L. D., Schmuckler, M. A., & Johnson, J. A. (1997). TheMcGurk effect in infants. Perception and Psychophysics, 59, 347–357.doi:10.3758/BF03211902

Schall, S., Quigley, C., Onat, S., & Konig, P. (2009). Visual stimuluslocking of EEG is modulated by temporal congruency of auditorystimuli. Experimental Brain Research, 198, 137–151. doi:10.1007/s00221-009-1867-5

Seitz, A. R., Nanez, J. E., Sr., Holloway, S. R., & Watanabe, T. (2006).Perceptual learning of motion leads to faster flicker perception. PLoSONE, 1 e28. doi:10.1371/journal.pone.0000028

Sekiyama, K., Kanno, I., Miura, S., & Sugita, Y. (2003). Auditory-visualspeech perception examined by fMRI and PET. Neuroscience Research,47, 277–287. doi:S0168010203002141[pii]

Senkowski, D., Talsma, D., Grigutsch, M., Herrmann, C. S., & Woldorff,M. G. (2007). Good times for multisensory integration: Effects of theprecision of temporal synchrony as revealed by gamma-band oscilla-tions. Neuropsychologia, 45, 561–571. doi:10.1016/j.neuropsychologia.2006.01.013

Shams, L., Kamitani, Y., & Shimojo, S. (2000). Illusions. What you see iswhat you hear. Nature, 408, 788. doi:10.1038/35048669

Spence, C., Baddeley, R., Zampini, M., James, R., & Shore, D. I. (2003).Multisensory temporal order judgments: When two locations are betterthan one. Perception and Psychophysics, 65, 318–328. doi:10.3758/BF03194803

Stein, B., & Meredith, M. A. (1993). The merging of the senses. Boston,MA: MIT Press.

Stein, B. E., & Wallace, M. T. (1996). Comparisons of cross-modalityintegration in midbrain and cortex. Progress in Brain Research, 112,289–299. doi:10.1016/S0079-6123(08)63336-1

Stevenson, R. A., Altieri, N. A., Kim, S., Pisoni, D. B., & James, T. W.(2010). Neural processing of asynchronous audiovisual speech percep-tion. Neuroimage, 49, 3308 –3318. doi:S1053-8119(09)01288-9[pii]10.1016/j.neuroimage.2009.12.001

Stevenson, R. A., Geoghegan, M. L., & James, T. W. (2007). SuperadditiveBOLD activation in superior temporal sulcus with threshold non-speechobjects. Experimental Brain Research, 179, 85–95. doi:10.1007/s00221-006-0770-6

Stevenson, R. A., & James, T. W. (2009). Audiovisual integration inhuman superior temporal sulcus: Inverse effectiveness and the neuralprocessing of speech and object recognition. Neuroimage, 44, 1210–

1223. doi:S1053-8119(08)00992-0[pii]10.1016/j.neuroimage.2008.09.034

Stevenson, R. A., Kim, S., & James, T. W. (2009). An additive-factorsdesign to disambiguate neuronal and areal convergence: Measuringmultisensory interactions between audio, visual, and haptic sensorystreams using fMRI. Experimental Brain Research, 198, 183–194. doi:10.1007/s00221-009-1783-8

Stevenson, R. A., VanDerKlok, R. M., Pisoni, D. B., & James, T. W.(2011). Discrete neural substrates underlie complementary audiovisualspeech integration processes. Neuroimage, 55, 1339–1345. doi:S1053-8119(10)01664-2[pii]10.1016/j.neuroimage.2010.12.063

Stevenson, R. A., Zemtsov, R. K., & Wallace, M. T. (under review).Consistency in individual’s multisensory temporal binding windowsacross task and stimulus complexity. Experimental Brain Research.

Stone, D. B., Urrea, L. J., Aine, C. J., Bustillo, J. R., Clark, V. P., &Stephen, J. M. (2011). Unisensory processing and multisensory integrationin schizophrenia: A high-density electrical mapping study. Neuropsycholo-gia. doi:S0028-3932(11)00341-1[pii]10.1016/j.neuropsychologia.2011.07.017

Summerfield, Q. (1987). Some preliminaries to a comprehensive accountof audio-visual speech perception. In B. Dodd & B. A. Campbell (Eds.),Hearing by eye: The psychology of lip reading (pp. 3–52). London, UK:Erlbaum.

Talsma, D., Senkowski, D., & Woldorff, M. G. (2009). Intermodal atten-tion affects the processing of the temporal alignment of audiovisualstimuli. Experimental Brain Research, 198, 313–328. doi:10.1007/s00221-009-1858-6

Taylor, N., Isaac, C., & Milne, E. (2010). A comparison of the develop-ment of audiovisual integration in children with autism spectrum disor-ders and typically developing children. Journal of Autism and Develop-mental Disorders, 40, 1403–1411. doi:10.1007/s10803-010-1000-4

Tremblay, C., Champoux, F., Voss, P., Bacon, B. A., Lepore, F., &Theoret, H. (2007). Speech and non-speech audio-visual illusions: Adevelopmental study. PLoS One, 2, e742. doi:10.1371/journal.pone.0000742

van Atteveldt, N. M., Formisano, E., Blomert, L., & Goebel, R. (2007). Theeffect of temporal asynchrony on the multisensory integration of lettersand speech sounds. Cerebral Cortex, 17, 962–974. doi:bhl007[pii]10.1093/cercor/bhl007

Van der Burg, E., Cass, J., Olivers, C. N., Theeuwes, J., & Alais, D. (2010).Efficient visual search from synchronized auditory signals requirestransient audiovisual events. PLoS One, 5, e10664. doi:10.1371/journal.pone.0010664

van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporalwindow of integration in auditory-visual speech perception. Neuropsy-chologia, 45, 598 – 607. doi:S0028-3932(06)00011-X[pii]10.1016/j.neuropsychologia.2006.01.001

Vatakis, A., & Spence, C. (2006). Audiovisual synchrony perception formusic, speech, and object actions. Brain Research, 1111, 134–142.doi:S0006-8993(06)01590-3[pii]10.1016/j.brainres.2006.05.078

Vroomen, J., & Keetels, M. (2010). Perception of intersensory synchrony:A tutorial review. Attention, Perception and Psychophysics, 72, 871–884. doi:72/4/871[pii]10.3758/APP.72.4.871

Wallace, M. H., & Murray, M. M. (Eds.). (2011). Frontiers in the neuralbasis of multisensory processes. London, UK: Taylor & Francis.

Wallace, M. T., Roberson, G. E., Hairston, W. D., Stein, B. E., Vaughan,J. W., & Schirillo, J. A. (2004). Unifying multisensory signals acrosstime and space. Experimental Brain Research, 158, 252–258. doi:10.1007/s00221-004-1899-9

Watkins, S., Shams, L., Josephs, O., & Rees, G. (2007). Activity in humanV1 follows multisensory perception. Neuroimage, 37, 572–578. doi:S1053-8119(07)00432-6[pii]10.1016/j.neuroimage.2007.05.027

Watkins, S., Shams, L., Tanaka, S., Haynes, J. D., & Rees, G. (2006).Sound alters activity in human V1 in association with illusory visual

1528 STEVENSON, ZEMTSOV, AND WALLACE

perception. Neuroimage, 31, 1247–1256. doi:S1053-8119(06)00045-0[pii]10.1016/j.neuroimage.2006.01.016

Werner, S., & Noppeney, U. (2009). Superadditive responses in superiortemporal sulcus predict audiovisual benefits in object categorization.Cerebral Cortex, 20, 1829 –1842. doi:bhp248[pii]10.1093/cercor/bhp248

Werner, S., & Noppeney, U. (2010). Distinct functional contributions ofprimary sensory and association areas to audiovisual integration inobject categorization. Journal of Neuroscience, 30, 2662–2675. doi:30/7/2662[pii]10.1523/JNEUROSCI.5091-09.2010

Wilkinson, L. K., Meredith, M. A., & Stein, B. E. (1996). The role ofanterior ectosylvian cortex in cross-modality orientation and approachbehavior. Experimental Brain Research, 112, 1–10. doi:10.1007/BF00227172

Williams, J. H., Massaro, D. W., Peel, N. J., Bosseler, A., & Suddendorf,T. (2004). Visual-auditory integration during speech imitation in autism.Research in Developmental Disabilities, 25, 559 –575. doi:S0891422204000757[pii]10.1016/j.ridd.2004.01.008

Williams, L. E., Light, G. A., Braff, D. L., & Ramachandran, V. S. (2010).Reduced multisensory integration in patients with schizophrenia on a

target detection task. Neuropsychologia, 48, 3128–3136. doi:S0028-3932(10)00267-8[pii]10.1016/j.neuropsychologia.2010.06.028

Zampini, M., Guest, S., Shore, D. I., & Spence, C. (2005). Audio-visualsimultaneity judgments. Perception and Psychophysics, 67, 531–544.doi:10.3758/BF03193329

Zampini, M., Shore, D. I., & Spence, C. (2003). Audiovisual temporalorder judgments. Experimental Brain Research, 152, 198–210. doi:10.1007/s00221-003-1536-z

Zampini, M., Shore, D. I., & Spence, C. (2005). Audiovisual prior entry.Neuroscience Letters, 381, 217–222. doi:S0304-3940(05)00157-6[pii]10.1016/j.neulet.2005.01.085

Zilbovicius, M., Boddaert, N., Belin, P., Poline, J. B., Remy, P., Mangin,J. F., . . . Samson, Y. (2000). Temporal lobe dysfunction in childhoodautism: A PET study. Positron emission tomography. American Journalof Psychiatry, 157, 1988–1993. doi:10.1176/appi.ajp.157.12.1988

Received August 9, 2011Revision received December 27, 2011

Accepted January 5, 2012 �

1529TEMPORAL PROCESSING AND AUDIOVISUAL ILLUSIONS