Spatial attention and audiovisual processing

12
Q Stein—The New Handbook of Multisensory Processes 19 Spatial Attention and Audiovisual Processing Valerio Santangelo and Emiliano Macaluso In everyday life a number of external signals continu- ously reach our different senses. Sometimes these mul- tisensory inputs need to be integrated into a unique percept because they belong to the same external object (such as the sound associated with a bouncing ball). Other times, visual and auditory signals might be elic- ited by different external objects, but the brain still has to perform some multisensory analysis to reach such a conclusion. Many different regions of the brain are known to contribute to these processes including corti- cal areas (such as the parietal, e.g., Duhamel, Colby, & Goldberg, 1998; Mullette-Gillman, Cohen, & Groh, 2005; and premotor cortex, e.g., Kohler et al., 2002; Pesaran, Nelson, & Andersen, 2006) as well as subcorti- cal regions (such as the superior colliculus; Stein & Meredith, 1993). Multisensory neurons in these areas typically exhibit enhanced responses to spatially and temporally coincident multisensory stimulation that are significantly greater than the responses evoked by the sum of the component unimodal stimuli (especially when the stimuli are presented at near threshold levels; e.g., Wallace, Meredith, & Stein, 1998). These effects have been often proposed as a neural index of integra- tive processes (e.g., multisensory feature integration, signal detection, motor orienting) associated with mul- tisensory stimuli. Traditionally multisensory integration (MI) has been thought to occur automatically (or preattentively) with the incoming sensory input, rather than the “internal state” of the brain, determining the level of multisen- sory interaction (e.g., the spatiotemporal correspon- dence of the unimodal inputs). However, recent evidence suggests that multisensory processing can be modulated by several cognitive factors that do not strictly depend only on the external input. In particular, we will review functional imaging data indicating that brain response to the very same (multi)sensory audio- visual (AV) input can change as a function of attention. We will discuss factors concerning the type of atten- tional deployment (spatial vs. nonspatial, endogenous vs. exogenous; e.g., Klein & Shore, 2000) and the nature of sensory input (e.g., speech vs. nonspeech stimuli) that may determine the role of attention for the pro- cessing of AV stimuli. Orienting of Attention in Space and across Sensory Modalities The ability to select the relevant information from the surrounding environment while at the same time filter- ing out all potential distracters is the key element of what is usually referred to as selective attention. Selec- tive attention can be oriented over a source of interest (e.g., a spatial location, a visual or auditory stimulus) by means of voluntary (endogenous) or reflexive (exoge- nous) mechanisms. In the following sections, we will highlight the neural correlates involved with the endog- enous and exogenous orienting of spatial attention for vision and audition. Control of Unimodal Spatial Attention in Vision and Audition Orienting of spatial attention has been studied exten- sively in both the visual and the auditory modality. In vision, the most popular approach involves cuing atten- tion to one location and presenting visual targets either at the cued location (valid trials) or at a different loca- tion (invalid trials; Posner, Snyder, & Davidson, 1980). In the so-called endogenous version of this task, a cen- tral-symbolic cue (e.g., an arrow pointing to left or right side) predicts the target location on the majority of trials (i.e., 75–80% of the trials are valid). The target is then presented after a stimulus onset asynchrony (SOA) of about 800–1000 msec. In the exogenous version of the task, a peripheral spatial cue precedes the target at a shorter SOA (around 100–200 msec). Critically, in this case the cue is nonpredictive with regard to the location of the forthcoming target (50% valid, 50% invalid trials), thus tapping on bottom-up/reflexive, rather than strategic, control of spatial attention. Behaviorally, both procedures lead to faster and more accurate responses for valid than invalid trials, suggesting that the cues generate (voluntarily, for the endogenous 8466_019.indd 359 12/21/2011 6:01:09 PM

Transcript of Spatial attention and audiovisual processing

Q

Stein—The New Handbook of Multisensory Processes

19 Spatial Attention and Audiovisual

Processing

Valerio Santangelo and Emiliano Macaluso

In everyday life a number of external signals continu-ously reach our different senses. Sometimes these mul-tisensory inputs need to be integrated into a unique percept because they belong to the same external object (such as the sound associated with a bouncing ball). Other times, visual and auditory signals might be elic-ited by different external objects, but the brain still has to perform some multisensory analysis to reach such a conclusion. Many different regions of the brain are known to contribute to these processes including corti-cal areas (such as the parietal, e.g., Duhamel, Colby, & Goldberg, 1998; Mullette-Gillman, Cohen, & Groh, 2005; and premotor cortex, e.g., Kohler et al., 2002; Pesaran, Nelson, & Andersen, 2006) as well as subcorti-cal regions (such as the superior colliculus; Stein & Meredith, 1993). Multisensory neurons in these areas typically exhibit enhanced responses to spatially and temporally coincident multisensory stimulation that are significantly greater than the responses evoked by the sum of the component unimodal stimuli (especially when the stimuli are presented at near threshold levels; e.g., Wallace, Meredith, & Stein, 1998). These effects have been often proposed as a neural index of integra-tive processes (e.g., multisensory feature integration, signal detection, motor orienting) associated with mul-tisensory stimuli.

Traditionally multisensory integration (MI) has been thought to occur automatically (or preattentively) with the incoming sensory input, rather than the “internal state” of the brain, determining the level of multisen-sory interaction (e.g., the spatiotemporal correspon-dence of the unimodal inputs). However, recent evidence suggests that multisensory processing can be modulated by several cognitive factors that do not strictly depend only on the external input. In particular, we will review functional imaging data indicating that brain response to the very same (multi)sensory audio-visual (AV) input can change as a function of attention. We will discuss factors concerning the type of atten-tional deployment (spatial vs. nonspatial, endogenous vs. exogenous; e.g., Klein & Shore, 2000) and the nature of sensory input (e.g., speech vs. nonspeech stimuli)

that may determine the role of attention for the pro-cessing of AV stimuli.

Orienting of Attention in Space and across Sensory Modalities

The ability to select the relevant information from the surrounding environment while at the same time filter-ing out all potential distracters is the key element of what is usually referred to as selective attention. Selec-tive attention can be oriented over a source of interest (e.g., a spatial location, a visual or auditory stimulus) by means of voluntary (endogenous) or reflexive (exoge-nous) mechanisms. In the following sections, we will highlight the neural correlates involved with the endog-enous and exogenous orienting of spatial attention for vision and audition.

Control of Unimodal Spatial Attention in Vision and Audition

Orienting of spatial attention has been studied exten-sively in both the visual and the auditory modality. In vision, the most popular approach involves cuing atten-tion to one location and presenting visual targets either at the cued location (valid trials) or at a different loca-tion (invalid trials; Posner, Snyder, & Davidson, 1980). In the so-called endogenous version of this task, a cen-tral-symbolic cue (e.g., an arrow pointing to left or right side) predicts the target location on the majority of trials (i.e., 75–80% of the trials are valid). The target is then presented after a stimulus onset asynchrony (SOA) of about 800–1000 msec. In the exogenous version of the task, a peripheral spatial cue precedes the target at a shorter SOA (around 100–200 msec). Critically, in this case the cue is nonpredictive with regard to the location of the forthcoming target (50% valid, 50% invalid trials), thus tapping on bottom-up/reflexive, rather than strategic, control of spatial attention. Behaviorally, both procedures lead to faster and more accurate responses for valid than invalid trials, suggesting that the cues generate (voluntarily, for the endogenous

8466_019.indd 359 12/21/2011 6:01:09 PM

Q

Stein—The New Handbook of Multisensory Processes

360 VAlerIO SANTANGelO AND eMIlIANO MACAluSO

version; and reflexively for the exogenous version) a shift of spatial attention toward the cued location. This facilitates target processing on valid trials (attentional benefits) while requiring additional reorienting of attention from the cued location to the new target loca-tion on invalid trials (attentional costs; see Driver, 2001, for a review).

Functional imaging studies have used many different versions of these paradigms, highlighting the key role of frontoparietal (FP) cortices for visuospatial attention control. Activation in dorsal frontoparietal regions (FeF, the frontal eye fields; and IPS, intraparietal sulcus) has been associated with voluntary/endogenous orient-ing (e.g., Corbetta, Kincade, Ollinger, McAvoy, & Shulman, 2000; Corbetta, Miezin, Shulman, & Peter-son, 1993; Kincade, Abrams, Astafiev, Shulman, & Cor-betta, 2005; see Corbetta & Shulman, 2002, for a review), whereas a ventral frontoparietal network (TPJ, tempo-roparietal junction; and IFG, inferior frontal gyrus) activates primarily for targets presented at the unat-tended location (reorienting on invalid trials: Arrington, Carr, Mayer, & rao, 2000; Corbetta et al., 2000). More recently, studies in the visual modality have also indi-cated that the two systems do not operate fully indepen-dently; rather, they interact in a dynamic manner integrating endogenous and exogenous factors to select the currently relevant location (He et al., 2007; Natale, Marzi, & Macaluso, 2009; Shulman et al., 2009; see also Corbetta, Patel, & Shulman, 2008, for a review).

Dorsal and ventral frontoparietal regions have also been activated in unimodal studies of auditory spatial attention (e.g., Mayer, Harrington, Adair, & lee, 2006; Mayer, Harrington, Stephen, Adair, & lee, 2007; Salmi, rinne, Koistinen, Salonen, & Alho, 2009; Tzourio et al., 1997; Wu, roberts, & Woldorff, 2007), although less work has been dedicated so far to the distinction between endogenous and exogenous factors for this modality. early neuroimaging studies of auditory spatial orienting used experimental designs likely to combine both endogenous and exogenous factors (Mayer et al., 2006; Tzourio et al., 1997), but more recently these components have been separated. For example, Wu and colleagues used central-symbolic cues (human voice saying “left” or “right”) to instruct spatial orient-ing toward one or the other side, thus specifically evoking endogenous auditory spatial attention. Brain activity associated with these spatial orienting cues was compared with activity elicited by neutral cues (human voice saying “past”). This highlighted activation of a large set of regions including, dorsally, the superior frontal gyrus and the superior parietal lobe; but also, ventrally, the inferior parietal lobe. Wu et al. also per-formed a comparative analysis with an equivalent study

in the visual modality (see Woldorff et al., 2004), reveal-ing overlapping activation in the superior parietal lobe (SPl) and FeF for the two modalities. This latter result provides initial evidence for—at least to some extent—common neural substrates for endogenous spatial attention control in audition and vision (see also Smith and colleagues, 2009, who investigated visual and audi-tory endogenous orienting within the same experiment, also revealing common activations in the superior frontal gyrus, SPl, and middle frontal gyrus).

Fewer neuroimaging studies investigated exogenous spatial attention in the auditory modality. Mayer and colleagues (2007) used nonpredictive monaural (i.e., lateralized) sounds followed by same- or opposite-side auditory targets. The fMrI analyses revealed that exog-enous reorienting of auditory spatial attention (invalid vs. valid trials) activated frontal oculomotor regions and bilateral middle and left inferior frontal gyri but failed to highlight a core region of the ventral frontoparietal network such as the TPJ (Corbetta & Shulman, 2002). Additional evidence about possible substrates of exog-enous auditory orienting comes from paradigms that used combinations of endogenous and exogenous ori-enting signals (see also the later section here on fron-toparietal networks for shifting spatial attention). For example, Salmi and colleagues (2009) asked partici-pants to selectively attend to the left or to the right ear (according to a cue-guided attention shift) and to respond to loudness-deviating tones at the attended ear only. However on some trials deviant tones were pre-sented at the unattended ear, capturing spatial atten-tion in a reflexive/exogenous manner. The fMrI analysis revealed that top-down controlled attention shifts activated the bilateral SPl, IPS, TPJ, and IFG/MFG. Interestingly, deviant tones on the unattended side recruited a largely overlapping frontoparietal network. These results suggest that bottom-up and top-down shifting of auditory spatial attention engage a common system, including both ventral and dorsal regions of the FP network.

To summarize, unimodal studies of visual and audi-tory attention revealed the key role of frontoparietal areas during spatial orienting in both sensory modali-ties. In the visual modality, extensive investigation of the contribution of endogenous and exogenous factors for spatial orienting revealed some segregation between control functions in dorsal and ventral FP regions, respectively (Arrington et al., 2000; Corbetta et al., 1993, 2000; Kincade et al., 2005; but see also He et al., 2007, for interactions between these networks). In the auditory modality, this segregation appears to be less pronounced, often with both systems activating together in studies of endogenous attention (Smith et al., 2009;

8466_019.indd 360 12/21/2011 6:01:09 PM

Q

Stein—The New Handbook of Multisensory Processes

SPATIAl ATTeNTION AND AuDIOVISuAl PrOCeSSING 361

Wu et al., 2007) and in studies of exogenous attention (Mayer et al., 2007; Salmi et al., 2009).

Attention Control when Stimulating Vision and Audition Bimodally

The studies described above indicate overlapping acti-vation for visual and auditory attention studied in isola-tion, but multisensory processing in everyday life typically involves co-occurring signals in both modali-ties. Hence, we now turn to studies of spatial attention involving selection of location and/or modality on pre-sentation of multisensory AV stimuli.

Studies of endogenous intermodal attention typically involve asking subjects to sustain attention to one modality while ignoring simultaneous stimulation in a different modality (e.g., Degerman et al., 2007; Johnson & Zatorre, 2006; laurienti et al., 2002; Santangelo, Fagioli, & Macaluso, 2010). For example, laurienti and colleagues asked participants to attend either to vision or to audition (or to both stimuli in a third condition). Visual occipital cortex and auditory temporal cortex showed significant activation when attention was directed toward the corresponding sensory modality. By contrast, deactivation was observed in sensory-specific areas when attention was focused to the other modality (e.g., deactivation in visual occipital cortex while attend-ing audition). These results suggests that under condi-tions of bimodal stimulation, selective intermodal attention operates by boosting activity within regions processing the relevant stimuli and, at the same time, by suppressing activation in areas dedicated to the pro-cessing of the irrelevant stimuli/modality (see also Baier, Kleinschmidt, & Müller, 2006; Ciaramitaro, Buracas, & Boynton, 2007; Degerman et al., 2007; Johnson & Zatorre, 2005; Petkov et al., 2004).

However, a different picture emerged in studies that manipulated the stimulus spatial configuration. Adapt-ing the exogenous version of the classical visuospatial cuing paradigm (Posner et al., 1980; Spence, McDon-ald, & Driver, 2004), fMrI and erP studies compared trials with vision and audition on the same side (50% valid trials) versus trials with the two stimuli presented in different locations (50% invalid trials). For example, McDonald and Ward (2000) presented left or right visual targets preceded by a nonpredictive tone either on the same or on the opposite side. Although the tone was entirely task-irrelevant, this was found to modulate the erP components elicited by the subsequent visual target over sensory specific (extrastriate) visual cortex. This indicates that spatial orienting toward stimuli in one modality (sounds in this case) can boost sensory processing of stimuli at the same location even when

these occur in a different modality (i.e., the visual targets here; see Macaluso, Frith, & Driver, 2000, for similar findings between vision and touch).

To sum up, attentional selection in conditions of bimodal AV stimulation can have different conse-quences depending on the specific task requirement. When “space” is not relevant for the current task (e.g., AV stimuli are presented centrally), selective intermo-dal attention operates by enhancing activity within sensory regions processing the relevant stimuli while simultaneously suppressing activation in areas dedicated to the processing of the irrelevant stimuli/modality (Johnson & Zatorre, 2005; laurienti et al., 2002). By contrast, when space is task-relevant (e.g., target detection or discrimination has to be carried out at a specific location), attention operates by enhancing activity related to stimuli presented on the attended spatial location even when these are in differ-ent modalities (Macaluso et al., 2000; McDonald & Ward, 2000).

Classical Evidence for Automatic Integration of AV Signals

The literature reviewed so far indicates that attentional selection can significantly affect the processing of incoming auditory and visual signals and that the distri-bution of spatial attention in one modality can affect processing of stimuli in a different modality, thus suggesting the existence of supramodal systems of spatial attention control (eimer, van Velzen, Forster, & Driver, 2003; Farah, Wong, Monheir, & Morrow, 1989; see also McDonald & Ward, 2000). Hence, the question arises whether attention may affect not only the process-ing of the two unimodal components but also how these interact with each other; that is, whether attentional selection and MI are two separate processes or whether they co-occur, determining in an interactive manner the fate of the incoming multisensory input. In this section, we review some of the classical behavioral evi-dence suggesting that multisensory interactions take place preattentively, overall supporting the notion that attention and MI are separate (noninteracting) processes.

The McGurk Effect

In the nonspatial domain, MI between vision and audi-tion has been often studied by means of the so-called McGurk effect (McGurk & MacDonald, 1976). This is an illusion that arises when people see and hear con-flicting speech information. When lip movements corresponding to the syllable /ga/ are presented simul-

8466_019.indd 361 12/21/2011 6:01:09 PM

Q

Stein—The New Handbook of Multisensory Processes

362 VAlerIO SANTANGelO AND eMIlIANO MACAluSO

taneously with the sound of the syllable /ba/, people often report of having heard the syllable /da/, thus showing a form of integration between the two uni-modal sources of information (see also Manuel, repp, Studdert-Kennedy, & liberman, 1983). This AV effect is traditionally thought to take place unavoidably and irrespectively of attentional constraints (see Navarra, Alsius, Soto-Faraco, & Spence, 2010, for a recent review). In particular, the McGurk effect is not abolished by awareness of the mismatch between what the observers are seeing and what they are hearing (Manuel et al., 1983; McGurk & MacDonald, 1976); the illusion still takes place when the auditory and visual signals are desynchronized (e.g., Munhall, Gribble, Sacco & Ward, 1996; Soto-Faraco, Navarra, & Alsius, 2004; though see Soto-Faraco & Alsius, 2009) or under conditions of con-flicting auditory information, as given by the combina-tion of a male voice and a female face (Green, Kuhl, Meltzoff, & Stevens, 1991; although see Vatakis & Spence, 2007). Moreover, the magnitude of the McGurk effect has been shown to be unaffected by explicitly asking participants to attend to one or the other modal-ity (Massaro, 1987) or by the degree of spatial separa-tion (varying from 0° to 90°) between the auditory and visual speech signals when they were presented synchro-nously (Jones & Munhall, 1997; see also Soto-Faraco et al., 2004).

The Ventriloquism Effect

A second line of evidence, now in the spatial domain, for automatic mechanisms of MI between audition and vision comes from the so-called ventriloquism effect (e.g., Jack & Thurlow, 1973). When visual and auditory stimuli are presented at the same time (i.e., synchro-nously) but at different spatial locations, people tend to hear the sound as originating from the location of the visual stimulus. This effect is exploited by profes-sional ventriloquists who are able to produce speech without any visible facial movements, with the voice appearing to come from a puppet they move in syn-chrony with the speech. It should be noted that this illusion can arise also for nonlinguistic stimuli, and often, experimental manipulations have employed simple sounds (e.g., tones) and simple visual stimuli (flashes of light) to investigate this phenomenon (e.g., Bertelson, Vroomen, de Gelder, & Driver, 2000; Vroomen, Bertelson, & de Gelder, 2001).

Several studies have shown that the ventriloquism effect is immune to manipulations of endogenous or exogenous spatial attention, suggesting preattentive integration of AV spatial information in this conflicting bimodal setting. For example, Bertelson et al. (2000)

asked participants to localize the source of a sound presented with or without a concurrent flash of light in the left or right hemifield. At the same time, partici-pants were also required to monitor a stream of visual events located either centrally or laterally, and to detect occasional targets there. The aim of this secondary task was to secure visuospatial endogenous attention at one specific location, thus making it possible to study the ventriloquism illusion as a function of the endogenously attended location. The results showed that sound local-ization was biased toward the position of the peripheral flash (i.e., spatial ventriloquism) and that this pattern of results was unaffected by the location (central or peripheral) where the participants focused endogenous visual attention.

Analogous findings were reported by Vroomen et al. (2001), who manipulated exogenous rather than endogenous attention. The authors used a visual display consisting of a row of four squares, three with the same dimension and one smaller than the others. The latter was presented either at the leftmost or rightmost posi-tion, and served to reflexively capture spatial attention (i.e., it was a singleton; see Theeuwes, 1993, for a review). The participants’ task was to discriminate the left vs. right location of sound bursts. The results showed that side judgments of sound bursts were not biased toward the location of the singleton, although the sin-gleton was shown to successfully capture attention in a follow-up study. Thus, as for the case of nonspatial (and speech-related) McGurk effect, behavioral studies seem to suggest that spatial interactions between vision and audition are unaffected by attentional manipulations, suggesting a high degree of independence between MI and attention control.

Evidence for Attentional Modulation

In the previous section, we have reviewed behavioral literature indicating that AV integration is a mandatory and preattentive process. However, more recent studies suggest that attentional selection can—in some cases—modulate integration of AV signals (Alsius, Navarra, Campbell, & Soto-Faraco, 2005; Tiippana,Anderson, & Sams, 2004). Alsius and colleagues reported a study in which participants were required to repeat back the words uttered by a speaker in a video clip. The words were dubbed so as to potentially elicit the McGurk illu-sion at unpredictable times. Half of the participants simply repeated back the words they had heard; the other half, together with the primary task, also per-formed a concurrent high-demanding task (i.e., the detection of repetitions in a rapidly presented sequence of visual or auditory events). The results showed a

8466_019.indd 362 12/21/2011 6:01:10 PM

Q

Stein—The New Handbook of Multisensory Processes

SPATIAl ATTeNTION AND AuDIOVISuAl PrOCeSSING 363

reduction of the McGurk illusion when participants performed the concurrent task (visual or auditory detection), suggesting attentional influences on the integration process. Following the perceptual load theory (lavie, 2005), it might be hypothesized that when the amount of resources required to perform a cognitive operation does not exceed the system capac-ity, the remaining attention resources may spill over to other stimuli or processes, even when they are entirely task-irrelevant. Given that attentional demands in many paradigms that failed to show any attentional modula-tion were relatively low (e.g., Massaro, 1987; McGurk & MacDonald, 1976; Soto-Faraco et al., 2004), it can be hypothesized that, in those studies, spare attentional resources may have contributed to the integration of the AV stimuli.

Functional Imaging of Endogenous Spatial Attention and MI in Speech

Further evidence for modulatory effects of attention on multisensory processing come from neuroimaging studies. using fMrI, Fairhall and Macaluso (2009) tested the effect of endogenous visuospatial attention on the processing of AV speech signals. Participants were presented with two simultaneous visual streams (one in each hemifield) consisting of speaking lips, plus a central audio stream consisting of spoken words. The audio stream could be congruent with either the left or right visual stream, and participants had to covertly attend to one or the other visual stream (or just central fixation, in the baseline condition). It should be noted that the sensory input (and any congruence/incongru-ence between vision and audition) was held constant across the critical experimental conditions. This allowed the assessment of any modulatory effect of attending to the visual component of a bimodal stimulus that was either temporally/semantically congruent or incongru-ent with the task-irrelevant auditory stream. The imaging data revealed that attending to the congruent visual stimulus, which should trigger greater integration with the auditory stream, resulted in increased activation in an extensive network of cortical and subcortical regions. This included the superior temporal sulcus (typically involved in MI of AV speech stimuli; see, e.g., Beau-champ, Argall, Bodurka, Duyn, & Martin, 2004; Beau-champ, lee, Argall, & Martin, 2004; Calvert, Campbell, & Brammer, 2000), striate and extrastriate visual cortex (demonstrating an interplay between attention and MI in sensory-specific areas; see also Van Atteveldt, Formi-sano, Goebel, & Blomert, 2007) and in the superior colliculus (a key region involved in MI, Stein & Meredith, 1993).

Functional Imaging of Endogenous Spatial Attention and MI Using Nonspeech Material

Additional evidence of the modulatory effect of atten-tion on AV interaction comes from studies that manipu-lated endogenous spatial attention using nonspeech stimuli. using functional imaging, Busse and colleagues (Busse, roberts, Crist, Weissman, & Woldorff, 2005) demonstrated that spatial attention can modulate AV integration and multisensory perception (see also Talsma, Senkowski, & Woldorff, 2009, for converging evidence from an erP study that manipulated the tem-poral synchrony/asynchrony of AV stimuli rather than their spatial alignment). Busse et al. presented visual stimuli either in the left or right hemifield, asking par-ticipants to covertly attend to one side only (blocked endogenous attention). Half of visual stimuli were accompanied by a simultaneous task-irrelevant tone presented centrally. Behavioral results showed that par-ticipants detected more visual targets when these were accompanied by the irrelevant tone than when they were presented alone (see Van der Burg, Olivers, Bronk-horst, & Theeuwes, 2008; Fiebelkorn, Foxe, Schwartz, & Mulholm, 2010, for similar results). Contrasting fMrI responses for these AV events to those corresponding to the visual-alone stimuli, Busse et al. found enhanced activity in the auditory cortex. Importantly, this activa-tion was significantly larger when the auditory stimuli were synchronized with attended as compared to unat-tended visual stimuli. These data indicate that the activ-ity related to the processing of identical auditory stimuli (centrally presented) was modulated by endogenous attention devoted toward left or right visual stimuli. The authors proposed that this effect is a consequence of the integration of the AV components in two different multisensory “objects”: left visual stimulus plus central auditory stimulus, or right visual stimulus plus central auditory stimulus, depending on the visually attended side (see also Ciaramitaro et al., 2007).

Frontoparietal Networks for Shifting Spatial Attention in AV Contexts

The imaging studies described above demonstrate that endogenous voluntary attention can modulate the pro-cessing of AV stimuli; more specifically, that attention can affect how AV signals spatially interact with each other. However, these paradigms did not highlight any interactions within frontoparietal networks that are thought to control attentional deployment (see for reviews Corbetta et al., 2008; Macaluso, 2009; see also Mayer et al., 2007; Salmi et al., 2009; Smith et al., 2009; Wu et al., 2007, for studies in the auditory modality).

8466_019.indd 363 12/21/2011 6:01:10 PM

Q

Stein—The New Handbook of Multisensory Processes

364 VAlerIO SANTANGelO AND eMIlIANO MACAluSO

One reason for this might be that the endogenous studies discussed above compared situations with atten-tion spatially focused and maintained at a single posi-tion, whereas frontoparietal attention control areas are particularly active when attention is shifted between different locations (e.g., Hopfinger & West, 2006; Natale et al., 2009, for endogenous and exogenous shifts).

A few recent neuroimaging and erP studies investi-gated the orienting of spatial attention using combina-tions of stimuli in different modalities to trigger shifts of spatial attention and activation within the FP net-works (Santangelo et al., 2009; Serences & Yantis, 2007; Shomstein & Yantis, 2004). For instance, we used a modified version of the classical spatial cuing paradigm (Posner et al., 1980) that many previous imaging studies related with activation of the ventral FP attentional network (e.g., Arrington et al., 2000; Corbetta et al., 2000; Kincade et al., 2005), and we asked whether task-irrelevant audition can affect reorienting of visuospatial attention and activity in the ventral FP (Santangelo et al., 2009).

As noted at the beginning of the chapter, the ventral FP attention network does not merely activate for task-irrelevant stimuli presented outside the focus of atten-tion (as pure “exogenous” nonpredictive cues; Kincade et al. 2005; see also Indovina & Macaluso, 2007) but rather requires these stimuli to be also task-set relevant (Corbetta et al., 2008). These effects lead Corbetta and colleagues to hypothesize that filtering mechanisms prevent fully task-irrelevant signals from entering the ventral FP network via inhibition of the functional con-nection between visual cortex and the temporoparietal junction in the ventral FP (see Corbetta et al., 2008, for more details on this). Thus, the question arises whether stimuli in a modality other than vision are also under analogous filtering/inhibitory control or rather may have access to the ventral spatial reorienting system irrespective of task-set relevance. Santangelo et al. (2009) explored this issue using a multisensory varia-tion of the double-cuing paradigm (see also Berger, Henik, & rafal, 2005, for the original behavioral imple-mentation of this approach, but in the visual modality only). Specifically, we used central predictive cues to exert endogenous shifts of visual attention toward the most likely location of a subsequent visual target (75% of validity). In the interval between the endogenous cue and the visual target, nonpredictive auditory stimuli were briefly presented in either the left or the right hemifield, thus allowing us to assess whether completely task-irrelevant auditory signals modulate activity in the ventral FP network. Behavioral results showed that both endogenous visual and the exogenous auditory cues

affected target discrimination, with an overall pattern of reaction times analogous to that seen in purely visual double-cue experiments (Berger et al., 2005; see also Natale et al., 2009). As expected, fMrI results revealed activation of the ventral FP when the visual targets were presented at the uncued side compared with validly cued targets (main effect of endogenous invalid cues), thus confirming that the ventral FP activates for spatial reorienting of attention toward task-relevant visual stimuli (Indovina & Macaluso, 2007 see also Corbetta et al., 2008). But critically, the side of the auditory stimuli was found to modulate activity in the ventral FP network. reorienting-related activation in the right TPJ (endogenous invalid minus valid trials) was reduced when the auditory stimulus was on the same side as the upcoming visual target, thus anticipating the position of the endogenous invalidly cued target. This demon-strates that, unlike nonpredictive visual cues (see Natale et al., 2009, who used an analogous double-cuing para-digm, but within vision only), auditory stimuli are able to affect reorienting processes in the ventral FP network despite these stimuli being completely task-irrelevant.

These findings demonstrate that vision and audition can also spatially interact within the ventral FP network, with constraints that are different from those observed for pure visual attention (i.e., the spatial influence of fully irrelevant audition). These data suggest that mul-tisensory stimuli can bypass filtering/inhibitory mecha-nisms proposed for the visual modality (Corbetta et al., 2008; Corbetta & Shulman, 2002), leading to AV inter-actions that are less under set-relevance control and consistent with a special status of attentional orienting in multisensory contexts (Macaluso, 2009).

Frontoparietal Networks for Dividing Attention between Locations and Modalities

The previous sections described experiments involving multisensory AV stimulation with spatial attention either focused at one location (e.g., Busse et al., 2005) or shifted between one and another location (e.g., Serences & Yantis, 2007). However, everyday life situa-tions can also require the selection of relevant (visual or auditory) information arising from spatially separate locations (i.e., divided spatial attention). In the visual modality, monitoring of multiple streams at different locations consistently results in behavioral costs (e.g., Castiello & umiltà, 1992) and typically leads to activa-tion of the dorsal frontoparietal network (Fagioli & Macaluso, 2009; Hahn et al., 2008; Nebel et al., 2005) plus some modulation in the sensory cortex (see McMains & Somers, 2004, 2005). Activation of higher-order associative regions (i.e., the dorsal frontoparietal

8466_019.indd 364 12/21/2011 6:01:10 PM

Q

Stein—The New Handbook of Multisensory Processes

SPATIAl ATTeNTION AND AuDIOVISuAl PrOCeSSING 365

network) has been associated with the greater require-ment of processing when monitoring multiple sensory streams (Fagioli & Macaluso, 2009; Hahn et al., 2008). Also in the multisensory context, monitoring two modalities at the same time compared with monitoring a single modality was found to activate the dorsolateral prefrontal cortex (Johnson & Zatorre, 2006; see also Johnson, Strafella, & Zatorre, 2007, for a related TMS study), suggesting common supramodal attentional resources to deal with these high-processing-load situations.

In a recent study, we asked how factors of space (attend to two vs. one location) and modality (monitor two vs. one modality) contribute to the activation of dorsal frontoparietal areas (Santangelo et al., 2010). Participants were presented with four concurrent streams of stimuli (vision and audition, in the left and right hemifields). In different blocks, the volunteers were asked to attend to either one or two modalities, at either at one single location (focused spatial attention) or in the two opposite hemifields (divided spatial atten-tion). The behavioral results showed that dividing atten-tion in space yielded to smaller costs of monitoring two versus one modality, compared with the same costs assessed while participants attended to a single spatial location. The fMrI data showed that activity in dorsal FP regions increased both for attending multiple loca-tions (divided vs. focused spatial attention) and for monitoring multiple modalities (attend to two vs. one modality). But critically, these factors of space and modality were seen to interact in the posterior parietal cortex, where the activation associated with simultane-ous monitoring of two modalities was larger when atten-tion was spatially divided (i.e., the spatial condition yielding to smaller behavioral costs of monitoring two vs. one modality). Accordingly, activity in posterior pari-etal cortex increased relatively more when subjects monitored the two modalities at separate locations com-pared to monitoring two modalities at one single loca-tion. This pattern of activation cannot be simply explained as an overall increase of task difficulty because the behavioral costs of monitoring two versus one modality were largest in the focused attention condi-tions. rather, increased activation in posterior parietal cortex can be related with the engagement of addi-tional processing resources when monitoring two modalities at spatially separate locations (cf. lower behavioral costs in this condition). We suggest that, when attention is spatially divided, additional modality-specific resources can engage efficiently by focusing independently on different spatial locations. By con-trast, in the spatially focused condition (greater behav-ioral costs), supramodal processing resources suffer

from high levels of competition between the signals arising in the two task-relevant modalities at the same attended location (see also discussion, below).

In the final section of this chapter, we further discuss these findings, suggesting that spatial aspects of MI and spatial attention control jointly contribute to select relevant locations in space, yielding to beneficial or detrimental interactions depending on the stimulus characteristics and task constraints.

Discussion

Traditional views of multisensory processing emphasize that stimuli in different modalities are integrated in a bottom-up manner and that any effect of attention control will follow these initial automatic interactions. In this chapter we have reviewed several studies showing effects of attention at multiple levels of the processing of multisensory stimuli, suggesting a close interplay between attention control and some of the processes related to MI (e.g., spatial covert orienting with multi-sensory stimuli, integration of multisensory signals for object perception, AV integration for the comprehen-sion of speech material). First, we highlighted that the same frontoparietal attention control systems (dorsal for endogenous and ventral for exogenous attention) activate in spatial orienting tasks for both the visual and the auditory modality, which is indicative of supramodal control of spatial attention (eimer et al., 2003; Farah et al., 1989). On bimodal stimulation (i.e., situations involving bottom-up multisensory interactions), selec-tive attention increases activation in sensory areas pro-cessing the relevant modality (e.g., occipital areas when attending to vision) and suppresses activity in areas dedicated to other modalities (e.g., temporal auditory cortex when attending to vision; Johnson & Zatorre, 2005; laurienti et al., 2002). However, when space is a relevant dimension (e.g., attend to one modality at one location), the effect of spatial selective attention can spread across modalities, boosting sensory responses for stimuli in the irrelevant modality as well, but at the attended location (e.g., increased activation in right occipital visual cortex when attending to left audition; Ciaramitaro et al., 2007). Moreover, attention not only modulates each unimodal component of a bimodal stimulus, it can also affect how these components inter-act with each other. Within sensory areas, attention can influence multisensory interactions that arise as a result of the spatial and/or temporal relationship between auditory and visual stimuli (Busse et al., 2005; Fairhall & Macaluso, 2009); and the spatial relationship between AV stimuli can modulate activations related to spatial orienting in dorsal (endogenous) and ventral

8466_019.indd 365 12/21/2011 6:01:10 PM

Q

Stein—The New Handbook of Multisensory Processes

366 VAlerIO SANTANGelO AND eMIlIANO MACAluSO

(exogenous) attentional networks (Santangelo et al., 2009; Santangelo et al., 2010).

Most of the evidence indicating that attention and MI are separated, unrelated processes comes from the “lin-guistic” domain (Green et al., 1991; Jones & Munhall, 1997; Manuel et al., 1983; Massaro, 1987; McGurk & MacDonald, 1976; Munhall et al., 1996; Soto-Faraco et al., 2004; but see also Bertelson et al., 2000; Vroomen et al., 2001, who used nonlinguistic material). Studies using linguistic stimuli are likely to entail some influ-ence of participants’ a priori semantic knowledge and might be considered a “special case” of multisensory processing (e.g., Massaro, 2004). However, here we suggest that it may be not the type of material (i.e., speech) per se to favor a certain resistance to atten-tional modulations, but rather the high degree of asso-ciation between auditory and visual signals in speech stimuli (e.g., highly congruent visual lip movements and corresponding speech sounds; see also Soto-Faraco & Alsius, 2009). Indeed, Baier and colleagues (2006) showed that the level of association between AV signals can critically affect multisensory interactions (enhanced vs. suppressed), even when nonspeech stimuli are used. Specifically, when subjects expected highly “associated” AV signals (auditory pitch and visual tilt associated on 90% of the trials), activity increased in both auditory and visual areas; by contrast, when they expected “non-associated” stimuli, activity increased only in the sensory cortex processing the task-relevant modality.

More recently, Fiebelkorn, Foxe, & Molholm (2010) extended these findings, suggesting the existence of a dual mechanism for AV integration, also dependent on the level of association between the two unimodal signals. In their study, Fiebelkorn and colleagues pre-sented multiple exemplars of either semantically con-gruent multisensory objects (e.g., dogs with barks) or semantically incongruent multisensory objects (e.g., guitars with barks) and assessed the amplitude of the auditory N1 event-related component. Participants were asked to detect any consecutive repetition of the same visual object (e.g., two consecutive pictures of a dog), while ignoring the auditory stimuli. This study found a “stimulus-driven” spread of attention from the attended visual stimulus to the task-irrelevant sound (see also Busse et al., 2005), here indexed by a larger (more negative) N1 component when the task-irrele-vant sound was presented simultaneously with the attended visual stimulus compared with auditory stimuli alone. This stimulus-driven effect was found irrespec-tive of the congruency between the visual and the audi-tory stimuli, indicating that AV interactions can also take place for weakly associated unisensory components (i.e., the semantically incongruent condition). However,

the same study also found a further modulation of audi-tory N1 component that was selective for the semanti-cally congruent conditions entailing high levels of association: N1 for the congruent conditions was more negative following the presentation of visual targets (e.g., dogs with barks when dogs were the target objects) than following the presentation of visual non-targets (dogs with barks—i.e., the same AV object—when guitars were the target objects). By contrast, N1 for the semantically incongruent conditions was identical irre-spective of the presentation of visual targets or nontar-gets. The authors interpreted this additional effect as evidence for a “representation-driven” spread of atten-tion associated with stored representations of multisen-sory object features (see also Molholm, Martinez, Shpaner, & Foxe, 2007). Thus, in the semantically con-gruent conditions, stimulus-driven and representation-driven mechanisms would work together facilitating the processing of well-learned AV associations.

On the basis of this literature, we suggest that in situ-ations of high association between the auditory and visual input (e.g., as in natural speech or well-learned AV associations), automatic factors dominate multisen-sory processing, with attention having a modest influ-ence on the processing of the stimuli. By contrast, in conditions of low association, which often would also entail a greater uncertainty regarding the external input (e.g., as in spatial cuing studies using nonpredic-tive cues, see below), attention plays a more prominent role, affecting both the processing of the two unimodal components as well as how these interact with each other.

In the context of exogenous spatial attention, many studies have now demonstrated that nonpredictive stimuli in one modality (e.g., auditory cues) can modu-late responses to subsequent targets in another modal-ity (e.g., visual targets; Spence et al., 2004). In these studies the spatial association between the cues and targets is weak, with half of the trials entailing same-side AV stimulation and the other half entailing opposite-side stimuli, randomly intermixed in nonpredictable sequences. Accordingly, bottom-up sensory signals do not provide any consistent information about the spatial layout of the multisensory input, and spatial attention is seen to modulate sensory responses cross-modally (e.g., auditory cues boost responses to visual targets in occipital cortex; McDonald & Ward, 2000). Combining exogenous nonpredictive auditory cues and an endog-enous spatial cuing visual task, we have been able to demonstrate cross-modal effects within the ventral fron-toparietal network as well (Santangelo et al., 2009). In fact, auditory nonpredictive cues modulated activity in the TPJ, a key area of the reorienting system that

8466_019.indd 366 12/21/2011 6:01:10 PM

Q

Stein—The New Handbook of Multisensory Processes

SPATIAl ATTeNTION AND AuDIOVISuAl PrOCeSSING 367

engages when subjects expect a visual target in one position but it is presented somewhere else (invalid trials). This effect of the auditory stimuli indicates not only that the ventral system can engage for both visual (e.g., Corbetta et al., 2000) and auditory (e.g., Salmi et al., 2009) attention but also that the two modalities interact in the ventral attentional system to guide spatial selection (see also Macaluso, 2009, for further discus-sion on this point).

Attention was also found to affect the processing of multisensory signals in conditions of endogenous/stra-tegic attention, provided that there is a weak association between the unimodal components of the multisensory stimulation. In the nonspeech domain, we discussed the study of Busse and colleagues (2005) showing that directing endogenous visual attention toward one side can modulate the processing of a centrally presented tone coupled with a lateralized (attended vs. nonat-tended) visual target. Also, in this study the association between vision and audition was weak: visual and audi-tory signals were never presented at the same location (as instead would happen for any multisensory object in real life), and the tones were coupled with left visual targets on half of the trials and with right targets in the other half of the trials (i.e., the spatial association between the two modalities was fully unpredictable). The results showed modulation in the auditory cortex, demonstrating that endogenous attention can affect multisensory signals in primary sensory cortex.

related results were obtained by Fairhall and Macaluso (2009), but using speech material. It might be argued that in this study there was an high level of association between vision and audition because one of the two visual streams was temporally synchronous (and semantically related) with the central auditory streams. Thus, low-level “preattentive” integration should domi-nate the processing of the multisensory input with little chance for attention to exert any influence. On the contrary, endogenous spatial attention was found to modulate processing at multiple cortical and subcorti-cal levels, including visual occipital cortex, the superior temporal sulcus, and the superior colliculus. One pos-sible explanation for this is that, although temporal and semantic information were highly related in the two modalities, the spatial relationship between audition and vision was weak. The auditory stream was presented centrally, at an equal distance between the two visual sources. We suggest that the weak spatial association between the two semantically congruent streams (i.e., central audition and the congruent—but lateralized—visual stream) reduced any preattentive AV association, enabling endogenous visuospatial attention to modu-late the AV interactions (i.e., greater activation when

attending to the visual stream that was synchronous with the auditory stream). Accordingly, the results of this study would also be in agreement with the proposal that conditions of low association between the two uni-modal components of a multisensory stimulus may enable/facilitate the influence of endogenous atten-tion on the processing of the multisensory input.

Finally, using a paradigm of divided spatial attention, we have shown that endogenous attention and multi-sensory processing can also interact within the dorsal attention system (Santangelo et al., 2010). using fully unrelated AV inputs, this study demonstrated that the costs of monitoring two sensory modalities simultane-ously decreased when attention was divided in space, as compared to attention focused in one location, and that this recruited a specific region of the posterior parietal cortex. By contrast, when attention was spatially focused, this study revealed side-specific modulations in extrastriate visual cortex irrespective of the modality that the participants were asked to judge. These find-ings lead us to hypothesize that the focusing of spatial attention is mediated by supramodal attention control (eimer et al., 2003; Farah et al., 1989) via multisensory representations of space (see Graziano, Taylor, Moore, & Cooke, 2002; Pesaran et al., 2006). This would yield to an automatic tendency to boost processing of all signals arising from a single external location irrespec-tive of sensory modality (see also Ciaramitaro et al., 2007), which in turn would make it difficult to judge unrelated signals at the attended location.

By contrast, when attention is distributed over more than one location (divided spatial attention), spatial selection may involve multiple sensory-specific repre-sentations of space (e.g., Andersen & Buneo, 2002; Mullette-Gillman et al., 2005; Xing & Andersen, 2000) and the engagement of modality-specific attentional resources (e.g., rizzolatti, riggio, & Sheliga, 1994). Modality-specific control would make “in-parallel” mon-itoring of signals in different senses more efficient (see also Talsma, Doty, Stroud, & Woldorff, 2006, who reported greater attentional capacity for processing simultaneous stimuli in different modalities rather than within a single modality). However, note that this advan-tage would hold specifically when the signals in the two modalities are independent. By contrast, supramodal attention control would facilitate monitoring of highly associated and congruent multisensory signals, if these have been already integrated by preattentive mechanisms.

In summary, in this chapter we have reviewed evi-dence that spatial attention can affect the processing of AV signals. Attention can affect multisensory processing in sensory cortices, with differential effects depending

8466_019.indd 367 12/21/2011 6:01:10 PM

Q

Stein—The New Handbook of Multisensory Processes

368 VAlerIO SANTANGelO AND eMIlIANO MACAluSO

on whether “space” is a relevant task dimension or not. Thus, attending to one location in one modality can boost the processing of stimuli not only in the relevant modality but also of stimuli in other modalities when these are located at the attended position (Macaluso et al., 2000; McDonald & Ward, 2000). Conversely, mon-itoring a stimulus modality irrespective of spatial loca-tion leads to an enhancement of activity associated with the relevant modality but a reduction of activity for stimuli in the other modalities (Johnson & Zatorre, 2005; laurienti et al., 2002). Spatial attention can also affect how AV stimuli are processed within attention control regions of the ventral and dorsal FP networks (e.g., Santangelo et al., 2009, 2010). We suggest that spatially focused attention may operate supramodally via multimodal representations of space (Graziano et al., 2002), whereas conditions of divided spatial atten-tion may entail the recruitment of modality-specific resources (Andersen & Buneo, 2002).

Overall, the picture emerging from our review is that the interplay between vision and audition can have an effect on many different processes (e.g., signal detec-tion, spatial orienting, object categorization) and can affect brain activity at different cortical and subcortical levels (see also Werner & Noppeney, 2010). At some of these levels, attention also plays a role: attention can modulate the responses associated with the unisensory components and influence how these interact with each other. Here we put forward the hypothesis that the level of association between the unisensory input may be a significant factor in determining to what extent atten-tion can influence multisensory interactions. Specifi-cally, we propose that weakly associated unimodal components (e.g., situations entailing spatially mis-aligned, temporally asynchronous, and/or semantically unrelated AV signals) give rise to minimal preattentive interactions and are more likely to be affected by atten-tional factors.

References

Alsius, A., Navarra, J., Campbell, r., & Soto-Faraco, S. (2005). Audiovisual integration of speech falters under high atten-tion demands. Current Biology, 15, 839–843.

Andersen, r. A., & Buneo, C. A. (2002). Intentional maps in posterior parietal cortex. Annual Review of Neuroscience, 25, 189–220.

Arrington, C. M., Carr, T. H., Mayer, A. r., & rao, S. M. (2000). Neural mechanisms of visual attention: object-based selection of a region in space. Journal of Cognitive Neuroscience, 2, 106–117.

Baier, B., Kleinschmidt, A., & Müller, N. G. (2006). Cross-modal processing in early visual and auditory cortices depends on expected statistical relationship of multisen-sory information. Journal of Neuroscience, 26, 12260–12265.

Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H., & Martin, A. (2004). unraveling multisensory integration: patchy organization within human STS multisensory cortex. Nature Neuroscience, 7, 1190–1192.

Beauchamp, M. S., lee, K. e., Argall, B. D., & Martin, A. (2004). Integration of auditory and visual information about objects in superior temporal sulcus. Neuron, 41, 809–823.

Berger, A., Henik, A., & rafal, r. (2005). Competition between endogenous and exogenous orienting of visual attention. Journal of Experimental Psychology, 134, 207–221.

Bertelson, P., Vroomen, J., de Gelder, B., & Driver, J. (2000). The ventriloquist effect does not depend on the direction of deliberate visual attention. Perception & Psychophysics, 62, 321–332.

Busse, l., roberts, K. C., Crist, r. e., Weissman, D. H., & Woldorff, M. G. (2005). The spread of attention across modalities and space in a multisensory object. Proceedings of the National Academy of Sciences of the United States of America, 102, 18751–18756.

Calvert, G. A., Campbell, r., & Brammer, M. J. (2000). evi-dence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex. Current Biology, 10, 649–657.

Castiello, u., & umiltà, C. (1992). Splitting focal attention. Journal of Experimental Psychology. Human Perception and Per-formance, 18, 837–848.

Ciaramitaro, V. M., Buracas, G. T., & Boynton, G. M. (2007). Spatial and cross-modal attention alter responses to unat-tended sensory information in early visual and auditory human cortex. Journal of Neurophysiology, 98, 2399–2413.

Corbetta, M., Kincade, J. M., Ollinger, J. M., McAvoy, M. P., & Shulman, G. l. (2000). Voluntary orienting is dissociated from target detection in human posterior parietal cortex. Nature Neuroscience, 3, 292–297.

Corbetta, M., Miezin, F. M., Shulman, G. l., & Petersen, S. e. (1993). A PeT study of visuospatial attention. Journal of Neuroscience, 13, 1202–1226.

Corbetta, M., Patel, G., & Shulman, G. l. (2008). The reori-enting system of the human brain: from environment to theory of mind. Neuron, 58, 306–324.

Corbetta, M., & Shulman, G. l. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews. Neuroscience, 3, 201–215.

Degerman, A., rinne, T., Pekkola, J., Autti, T., Jääskeläinen, I. P., Sams, M., et al. (2007). Human brain activity associated with audiovisual perception and attention. NeuroImage, 34, 1683–1691.

Driver, J. (2001). A selective review of selective attention research from the past century. British Journal of Psychology, 92, 53–78.

Duhamel, J. r., Colby, C. l., & Goldberg, M. e. (1998). Ventral intraparietal area of the macaque: congruent visual and somatic response properties. Journal of Neurophysiology, 79, 126–136.

eimer, M., van Velzen, J., Forster, B., & Driver, J. (2003). Shifts of attention in light and in darkness: an erP study of supra-modal attentional control and crossmodal links in spatial attention. Brain Research, 15, 308–323.

Fagioli, S., & Macaluso, e. (2009). Attending to multiple visual streams: interactions between location-based and category-based attentional selection. Journal of Cognitive Neuroscience, 21, 1628–1641.

8466_019.indd 368 12/21/2011 6:01:10 PM

Q

Stein—The New Handbook of Multisensory Processes

SPATIAl ATTeNTION AND AuDIOVISuAl PrOCeSSING 369

Fairhall, S., & Macaluso, e. (2009). Spatial attention can mod-ulate audiovisual integration at multiple cortical and sub-cortical sites. European Journal of Neuroscience, 29, 1247–1257.

Farah, M. J., Wong, A. B., Monheit, M. A., & Morrow, l. A. (1989). Parietal lobe mechanisms of spatial attention: modality-specific or supramodal? Neuropsychologia, 27, 461–470.

Fiebelkorn, I. C., Foxe, J. J., & Molholm, S. (2010). Dual mechanisms for the cross-sensory spread of attention: how much do learned associations matter? Cerebral Cortex, 20, 109–120.

Fiebelkorn, I. C., Foxe, J. J., Schwartz, T. H., & Molholm, S. (2010). Staying within the lines: the formation of visuospa-tial boundaries influences multisensory feature integration. European Journal of Neuroscience, 31, 1737–1743.

Graziano, M. S., Taylor, C. S., Moore, T., & Cooke, D. F. (2002). The cortical control of movement revisited. Neuron, 36, 349–362.

Green, K. P., Kuhl, P. K., Meltzoff, A. N., & Stevens, e. B. (1991). Integrating speech information across talkers, gender, and sensory modality: female faces and male voices in the McGurk effect. Perception & Psychophysics, 50, 524–536.

Hahn, B., Wolkenberg, F. A., ross, T. J., Myers, C. S., Heishman, S. J., Stein, D. J., et al. (2008). Divided versus selective attention: evidence for common processing mechanisms. Brain Research, 1215, 137–146.

He, B. J., Snyder, A. Z., Vincent, J. l., epstein, A., Shulman, G. l., & Corbetta, M. (2007). Breakdown of functional con-nectivity in frontoparietal networks underlies behavioral deficits in spatial neglect. Neuron, 53, 905–918.

Hopfinger, J. B., & West, V. M. (2006). Interactions between endogenous and exogenous attention on cortical visual processing. NeuroImage, 31, 774–789.

Indovina, I., & Macaluso, e. (2007). Dissociation of stimulus relevance and saliency factors during shifts of visuospatial attention. Cerebral Cortex, 17, 1701–1711.

Jack, C. e., & Thurlow, W. r. (1973). effects of degree of visual association and angle of displacement on the “ventrilo-quism” effect. Perceptual and Motor Skills, 38, 967–979.

Johnson, J. A., Strafella, A. P., & Zatorre, r. J. (2007). The role of the dorsolateral prefrontal cortex in bimodal divided attention: two transcranial magnetic stimulation studies. Journal of Cognitive Neuroscience, 19, 907–920.

Johnson, J. A., & Zatorre, r. J. (2005). Attention to simultane-ous unrelated auditory and visual events: behavioral and neural correlates. Cerebral Cortex, 15, 1609–1620.

Johnson, J. A., & Zatorre, r. J. (2006). Neural substrates for dividing and focusing attention between simultaneous audi-tory and visual events. NeuroImage, 31, 1673–1681.

Jones, J. A., & Munhall, K. G. (1997). The effects of separating auditory and visual sources on audiovisual integration of speech. Canadian Acoustics, 25, 13–19.

Kincade, J. M., Abrams, r. A., Astafiev, S. V., Shulman, G. l., & Corbetta, M. (2005). An event-related functional mag-netic resonance imaging study on voluntary and stimulus-driven orienting of attention. Journal of Neuroscience, 25, 4593–4604.

Klein, r. M., & Shore, D. (2000). relations among modes of visual orienting. In S. Monsell & J. Driver (eds.), Attention and performance XVIII: Control of cognitive processes (pp. 195–208). Cambridge, MA: MIT Press.

Kohler, e., Keysers, C., umiltà, M. A., Fogassi, l., Gallese, V., & rizzolatti, G. (2002). Hearing sounds, understanding actions: ction representation in mirror neurons. Science, 297, 846–848.

laurienti, P. J., Burdette, J. H., Wallace, M. T., Yen, Y. F., Field, A. S., & Stein, B. e. (2002). Deactivation of sensory-specific cortex by cross-modal stimuli. Journal of Cognitive Neurosci-ence, 14, 420–429.

lavie, N. (2005). Distracted and confused? Selective attention under load. Trends in Cognitive Sciences, 9, 75–82.

Macaluso, e. (2009). Orienting of spatial attention and the interplay between the senses. Cortex, 46, 282–297.

Macaluso, e., Frith, C. D., & Driver, J. (2000). Modulation of human visual cortex by crossmodal spatial attention. Science, 289, 1206–1208.

Manuel, S. Y., repp, B., Studdert-Kennedy, M., & liberman, A. (1983). exploring the “McGurk effect.” Journal of the Acoustical Society of America, 74, S66.

Massaro, D. W. (1987). Speech perception by ear and eye: a para-digm for psychological inquiry (pp. 66–74). Hillsdale, NJ: lawrence erlbaum Associates.

Massaro, D. W. (2004). From multisensory integration to talking heads and language learning. In G. Calvert, C. Spence, & B. e. Stein (eds.), The handbook of multisensory processes (pp. 153–176). Cambridge, MA: MIT Press.

Mayer, A. r., Harrington, D. l., Adair, J. C., & lee, r. r. (2006). The neural networks underlying endogenous audi-tory covert orienting and reorienting. NeuroImage, 30, 938–949.

Mayer, A. r., Harrington, D. l., Stephen, J., Adair, J. C., & lee, r. r. (2007). An event-related fMrI study of exogenous facilitation and inhibition of return in the auditory modality. Journal of Cognitive Neuroscience, 19, 455–467.

McDonald, J. J., & Ward, l. M. (2000). Involuntary listening aids seeing: evidence from human electrophysiology. Psycho-logical Science, 11, 167–171.

McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 265, 746–748.

McMains, S. A., & Somers, D. C. (2004). Multiple spotlights of attentional selection in human visual cortex. Neuron, 42, 677–686.

McMains, S. A., & Somers, D. C. (2005). Processing efficiency of divided spatial attention mechanisms in human visual cortex. Journal of Neuroscience, 25, 9444–9448.

Molholm, S., Martinez, A., Shpaner, M., & Foxe, J. J. (2007). Object-based attention is multisensory: co-activation of an object’s representations in ignored sensory modalities. European Journal of Neuroscience, 26, 499–509.

Mullette-Gillman, O. A., Cohen, Y. e., & Groh, J. M. (2005). eye-centered, head-centered, and complex coding of visual and auditory targets in the intraparietal sulcus. Journal of Neurophysiology, 94, 2331–2352.

Munhall, K. G., Gribble, P., Sacco, l., & Ward, M. (1996). Temporal constraints on the McGurk effect. Perception & Psychophysics, 58, 351–362.

Natale, e., Marzi, C. A., & Macaluso, e. (2009). FMrI corre-lates of visuo-spatial reorienting investigated with an atten-tion shifting double-cue paradigm. Human Brain Mapping, 30, 2367–2381.

Navarra, J., Alsius, A., Soto-Faraco, S., & Spence, C. (2010). Assessing the role of attention in the audiovisual integra-tion of speech. Information Fusion, 11, 4–11.

8466_019.indd 369 12/21/2011 6:01:10 PM

Q

Stein—The New Handbook of Multisensory Processes

370 VAlerIO SANTANGelO AND eMIlIANO MACAluSO

Nebel, K., Wiese, H., Stude, P., de Greiff, A., Diener, H. C., & Keidel, M. (2005). On the neural basis of focused and divided attention. Brain Research. Cognitive Brain Research, 25, 760–776.

Pesaran, B., Nelson, M. J., & Andersen, r. A. (2006). Dorsal premotor neurons encode the relative position of the hand, eye, and goal during reach planning. Neuron, 51, 125–134.

Petkov, C. I., Kang, X., Alho, K., Bertrand, O., Yund, e. W., & Woods, D. l. (2004). Attentional modulation of human auditory cortex. Nature Neuroscience, 7, 658–663.

Posner, M. I., Snyder, C. r., & Davidson, B. J. (1980). Atten-tion and the detection of signals. Journal of Experimental Psychology, 109, 160–174.

rizzolatti, G., riggio, l., & Sheliga, B. M. (1994). Space and selective attention. In C. umiltà & M. Moscovitch (eds.), Attention and performance, XV. Conscious and nonconscious information processing (pp. 231–265). Cambridge, MA: MIT Press.

Salmi, J., rinne, T., Koistinen, S., Salonen, O., & Alho, K. (2009). Brain networks of bottom-up triggered and top-down controlled shifting of auditory attention. Brain Research, 1286, 155–164.

Santangelo, V., Fagioli, S., & Macaluso, e. (2010). The costs of monitoring simultaneously two sensory modalities decrease when dividing attention in space. NeuroImage, 49, 2717–2727.

Santangelo, V., Olivetti Belardinelli, M., Spence, C., & Macaluso, e. (2009). Interactions between voluntary and stimulus-driven spatial attention mechanisms across sensory modalities. Journal of Cognitive Neuroscience, 21, 2384–2397.

Serences, J. T., & Yantis, S. (2007). Spatially selective repre-sentations of voluntary and stimulus-driven attentional pri-ority in human occipital, pariental, and frontal cortex. Cerebral Cortex, 17, 284–293.

Shomstein, S., & Yantis, S. (2004). Control of attention shifts between vision and audition in human cortex. Journal of Neuroscience, 24, 10702–10706.

Shulman, G. l., Astafiev, S. V., Franke, D., Pope, D. l., Snyder, A. Z., McAvoy, M. P., et al. (2009). Interaction of stimulus-driven reorienting and expectation in ventral and dorsal frontoparietal and basal ganglia-cortical networks. Journal of Neuroscience, 29, 4392–4407.

Smith, D. V., Davis, B., Niu, K., Healy, e. W., Bonilha, l., Fridriksson, J., et al. (2009). Spatial attention evokes similar activation patterns for visual and auditory stimuli. Journal of Cognitive Neuroscience, 22, 347–361.

Soto-Faraco, S., & Alsius, A. (2009). Deconstructing the McGurk-MacDonald illusion. Journal of Experimental Psychol-ogy. Human Perception and Performance, 35, 580–587.

Soto-Faraco, S., Navarra, J., & Alsius, A. (2004). Assessing automaticity in audiovisual speech integration: evidence from the speeded classification task. Cognition, 92, B13–B23.

Spence, C., McDonald, J., & Driver, J. (2004). exogenous spatial-cuing studies of human crossmodal attention and multisensory integration. In C. Spence & J. Driver (eds.),

Crossmodal space and crossmodal attention (pp. 277–320). Oxford: Oxford university Press.

Stein, B. e., & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT Press.

Talsma, D., Doty, T. J., Strowd, r., & Woldorff, M. G. (2006). Attentional capacity for processing concurrent stimuli is lager across sensory modalities than within modalities. Psy-chophysiology, 43, 541–549.

Talsma, D., Senkowski, D., & Woldorff, M. G. (2009). Inter-modal attention affects the processing of the temporal alignment of audiovisual stimuli. Experimental Brain Research, 198, 313–328.

Theeuwes, J. (1993). Visual selective attention: a theoretical analysis. Acta Psychologica, 83, 93–154.

Tiippana, K., Andersen, T. S., & Sams, M. (2004). Visual atten-tion modulates audiovisual speech perception. European Journal of Cognitive Psychology, 16, 457–472.

Tzourio, N., Massioui, F. e., Crivello, F., Joliot, M., renault, B., & Mazoyer, B. (1997). Functional anatomy of human auditory attention studied with PeT. NeuroImage, 5, 63–77.

Van Atteveldt, N. M., Formisano, e., Goebel, r., & Blomert, l. (2007). Top-down task effects overrule automatic multi-sensory responses to letter-sound pairs in auditory associa-tion cortex. NeuroImage, 36, 1345–1360.

Van der Burg, e., Olivers, C. N., Bronkhorst, A. W., & Theeu-wes, J. (2008). Pip and pop: nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychol-ogy. Human Perception and Performance, 34, 1053–1065.

Vatakis, A., & Spence, C. (2007). Crossmodal binding: evaluating the “unity assumption” using audiovisual speech and non-speech stimuli. Perception & Psychophysics, 69, 744–756.

Vroomen, J., Bertelson, P., & de Gelder, B. (2001). The ven-triloquist effect does not depend on the direction of auto-matic visual attention. Perception & Psychophysics, 63, 651–659.

Wallace, M. T., Meredith, M. A., & Stein, B. e. (1998). Multi-sensory integration in the superior colliculus of the alert cat. Journal of Neurophysiology, 80, 1006–1010.

Werner, S., & Noppeney, u. (2010). Distinct functional con-tributions of primary sensory and association areas to audiovisual integration in object categorization. Journal of Neuroscience, 30, 2662–2675.

Woldorff, M. G., Hazlett, C. J., Fichtenholtz, H. M., Weissman, D. H., Dale, A. M., & Song, A. W. (2004). Functional parcel-lation of attentional control regions of the brain. Journal of Cognitive Neuroscience, 16, 149–165.

Wu, C.-T., Weissman, D. H., roberts, K. C., & Woldorff, M. G. (2007). The neural circuitry underlying the executive control of auditory spatial attention. Brain Research, 134, 187–198.

Xing, J., & Andersen, r. A. (2000). Models of the posterior parietal cortex which perform multimodal integration and represent space in several coordinate frames. Journal of Cog-nitive Neuroscience, 12, 601–614.

8466_019.indd 370 12/21/2011 6:01:10 PM