Spence, C., & Deroy, O. (2013b). How automatic are crossmodal correspondences?

Review

How automatic are crossmodal correspondences?

Charles Spence a,!, Ophelia Deroy b

a Crossmodal Research Laboratory, Department of Experimental Psychology, University of Oxford, Oxford, UKb Centre for the Study of the Senses, University of London, London, UK

a r t i c l e i n f o

Article history:Received 18 August 2012

Keywords:Crossmodal correspondenceAutomaticityStrategicVoluntaryStimulus-drivenSynaesthesia

a b s t r a c t

The last couple of years have seen a rapid growth of interest (especially amongst cognitivepsychologists, cognitive neuroscientists, and developmental researchers) in the study ofcrossmodal correspondences – the tendency for our brains (not to mention the brains ofother species) to preferentially associate certain features or dimensions of stimuli acrossthe senses. By now, robust empirical evidence supports the existence of numerous cross-modal correspondences, affecting people’s performance across a wide range of psycholog-ical tasks – in everything from the redundant target effect paradigm through to studies ofthe Implicit Association Test, and from speeded discrimination/classification tasks throughto unspeeded spatial localisation and temporal order judgment tasks. However, one ques-tion that has yet to receive a satisfactory answer is whether crossmodal correspondencesautomatically affect people’s performance (in all, or at least in a subset of tasks), asopposed to reflecting more of a strategic, or top-down, phenomenon. Here, we reviewthe latest research on the topic of crossmodal correspondences to have addressed thisissue. We argue that answering the question will require researchers to be more precisein terms of defining what exactly automaticity entails. Furthermore, one’s answer to theautomaticity question may also hinge on the answer to a second question: Namely,whether crossmodal correspondences are all ‘of a kind’, or whether instead there may beseveral different kinds of crossmodal mapping (e.g., statistical, structural, and semantic).Different answers to the automaticity question may then be revealed depending on thetype of correspondence under consideration. We make a number of suggestions for futureresearch that might help to determine just how automatic crossmodal correspondencesreally are.

! 2013 Elsevier Inc. All rights reserved.

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2462. Automaticity: Defining features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2483. Reviewing the evidence concerning the automaticity of crossmodal correspondences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

3.1. Goal independence and intentionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2503.2. The problem of stimulus salience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2513.3. Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

3.3.1. The cognitive neuroscience of crossmodal correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2544. On automaticity and different kinds of crossmodal correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2555. Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

1053-8100/$ - see front matter ! 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.concog.2012.12.006

! Corresponding author. Address: Crossmodal Research Laboratory, Department of Experimental Psychology, University of Oxford, South Parks Road,Oxford OX1 3UD, UK. Fax: +44 1865 310447.

E-mail address: [email protected] (C. Spence).

Consciousness and Cognition 22 (2013) 245–260

Contents lists available at SciVerse ScienceDirect

Consciousness and Cognition

journal homepage: www.elsevier .com/locate /concog

http://crossmark.dyndns.org/dialog/?doi=10.1016/j.concog.2012.12.006&domain=pdf

http://dx.doi.org/10.1016/j.concog.2012.12.006

mailto:[email protected]

http://dx.doi.org/10.1016/j.concog.2012.12.006

http://www.sciencedirect.com/science/journal/10538100

http://www.elsevier.com/locate/concog

5.1. Closing comments: Where do we stand with respect to the notion of automaticity? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

1. Introduction

The term ‘‘crossmodal correspondences’’ is but one of a range of terms that has been used over the years by researchers inorder to refer to our brain’s tendency to systematically associate certain features or dimensions of stimuli across the senses(see Marks, 2004; Spence, 2011, for reviews). Crossmodal correspondences have now been documented between manydifferent pairs of stimulus dimensions: So, for example, auditory pitch has been shown to map onto visual elevation (seeBen-Artzi & Marks, 1995; Bernstein & Edelstein, 1971; Evans & Treisman, 2010; Melara & O’Brien, 1987; Miller, 1991;Patching & Quinlan, 2002; Proctor & Cho, 2006; Rusconi, Kwan, Giordano, Umiltà, & Butterworth, 2006), brightness and light-ness (Hubbard, 1996; Ludwig, Adachi, & Matzuzawa, 2011; Marks, 1987; Martino & Marks, 1999; Melara, 1989; Mondloch &Maurer, 2004), size (Bien, ten Oever, Goebel, & Sack, 2012; Evans & Treisman, 2010; Gallace & Spence, 2006; Mondloch &Maurer, 2004; Parise & Spence, 2009, 2012), angularity of shape (Marks, 1987; Parise & Spence, in press), direction ofmovement (Clark & Brownell, 1976; Maeda, Kanai, & Shimojo, 2004; Sadaghiani, Maier, & Noppeney, 2009), and even spatialfrequency (Evans & Treisman, 2010; Heron, Roach, Hanson, McGraw, & Whitaker, 2012).

The majority of the studies of crossmodal correspondences that have been published to date have involved the presen-tation of auditory and visual stimuli. That said, similar crossmodal correspondences also exist between auditory pitch andthe elevation of tactile stimuli (Occelli, Spence, & Zampini, 2009), not to mention the size of objects experienced haptically(Walker & Smith, 1985),1 and between tastes/odours and the angularity of visual stimuli or the pitch of auditory stimuli (Belkin,Martin, Kemp, & Gilbert, 1997; Crisinel & Spence, 2010, 2011, 2012; Deroy & Valentin, 2011; Hanson-Vaux, Crisinel, & Spence,2013; see Deroy, Crisinel, & Spence, in press; Spence & Ngo, 2012a, for reviews).

One important, but as yet unconvincingly answered question in the area of crossmodal correspondences research, con-cerns whether they affect performance (in tasks involving, for example, participants having to make speeded responses)in an automatic manner, or whether instead they affect performance in more of a strategic manner, emerging only as a func-tion of the specific task demands and instructions imposed on the participant by the experimenter. Addressing the issue ofthe automaticity of crossmodal correspondences means, however, breaking the notion of automaticity down into a numberof distinct sub-components (see Section 2) and then trying to make sense of the apparently contradictory results that havebeen published in the area recently (see Section 3). This exercise will further help to draw attention to the differences thatexist between synaesthesia and crossmodal correspondences (see Section 4) while agreeing that, as there certainly are var-ious types of crossmodal correspondence, one perhaps needs to accept that one’s answer to the automaticity question mightvary as a function of the type of crossmodal correspondence under consideration. This said, the review of the literature rel-evant to the automaticity claim outlined here leads to the generation of a number of specific hypotheses that deserve furthertesting in future research on crossmodal correspondences (see Section 5).

The original evidence that prompted researchers to make the automaticity claim came from the many speeded classifi-cation studies demonstrating that the speeded discrimination of target stimuli in one modality (e.g., discriminating larger vs.smaller circles, for visual stimuli presented on a monitor) was affected by the presentation of a completely task-irrelevantauditory stimulus that varied randomly on a trial-by-trial basis between high and low pitch (see Marks, 2004; Spence, 2011,for reviews). However, the suggested automaticity of crossmodal correspondences has been questioned by a series of neg-ative results from studies that have sometimes failed to show any difference in behaviour between those conditions in whichcongruent vs. incongruent pairs of visual and auditory stimuli have been presented (see also Chiou & Rich, 2012a; Heronet al., 2012; Klapetek, Ngo, & Spence, 2012; Klein, Brennan, D’Aloisio, D’Entremont, & Gilani, 1987; Klein, Brennan, & Gilani,1987; Sweeny, Guzman-Martinez, Ortega, Grabowecky, & Suzuki, 2012). Explaining why such differences between studieshave been obtained represents a worthwhile endeavour: And, what is more, in answering the question of the degree of auto-maticity of crossmodal correspondences, two further related questions also come to the fore, as detailed below.

The first question concerns the link between crossmodal correspondences and other phenomena such as coloured-hearing synaesthesia,2 where the presence, or experience, of a stimulus in one modality (for instance, audition) induces a con-scious concurrent in another, unstimulated modality (for instance, vision). Crossmodal ‘mappings’ or ‘correspondences’ betweensay, pitch and brightness can, at first, sometimes appear just as surprising as synaesthesia. In particular, it may not always beimmediately obvious whether (or that) they are tracking, or picking-up on, some statistical regularity of the environment (seeSpence & Deroy, 2012). The initially unexplainable nature of at least certain crossmodal correspondences has led to their being

1 Here it is worth noting that auditory stimuli tend to be assigned to specific elevations even in the absence of any stimuli being presented in another sensorymodality (e.g., see Cabrera & Morimoto, 2007; Pedley & Harper, 1959; Pratt, 1930; Roffler & Butler, 1968; Trimble, 1934). The matching of auditory pitch toelevation has also been demonstrated under those conditions in which the participants have to respond to (i.e., discriminate) a centrally-presented visual targetby pressing one of two vertically-arrayed buttons, while the pitch of an accessory sound is varied (see Keller & Koch, 2006).

2 Canonical cases of synaesthesia include such examples as coloured-hearing, tasted shapes, etc. (see Ward, 2012, for a recent review).

246 C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

described by a handful of researchers as synaesthetic correspondences (e.g., Crisinel & Spence, 2010; Walker et al., 2010), oreven, on occasion, to their being subsumed under the heading of synaesthesia proper (see Martino & Marks, 2001; Rader &Tellegen, 1987; Rudmin & Cappelli, 1983; see Deroy & Spence, in press, for a discussion).

Elsewhere, we have argued against this growing tendency to place crossmodal correspondences and synaesthesia on oneand the same continuum, together with the related assumption that both phenomena can be explained by the same under-lying neural mechanisms (e.g., Bien et al., 2012; Chiou & Rich, 2012a; Ward, Huckstep, & Tsakanikos, 2006). A key differencehere concerns the fact that crossmodal correspondences are not necessarily associated with a conscious sensory concurrent(though see also Spence & Deroy, 2013). However, using this characteristic to distinguish between the two phenomena soonbecomes complicated: First, because of the practical difficulty associated with deciding how to assess the occurrence of aconscious sensory concurrent or with sorting conscious cases of mental imagery guided by crossmodal correspondencesfrom the supposedly distinct synaesthetic cases (Spence & Deroy, 2013); And, second, because of controversies surroundingthe possibility of unconscious synaesthesia in certain difficult cases (e.g., see Cohen Kadosh & Terhune, 2011; Deroy &Spence, submitted for publication, for a discussion).

Another important place to look, then, in order to try and distinguish synaesthesia from crossmodal correspondences con-cerns the automaticity of the process that, in both cases, ties together the two sensory stimuli. In the case of synaesthesia (atleast in sensory as opposed to conceptual forms of synaesthesia), the processing of the synaesthetic concurrent is largelyinvoluntary (with only a limited degree of control over the presence/experience of the concurrent being reported by somesynaesthetes, see Rich, Bradshaw, & Mattingley, 2005; Rich & Mattingley, 2003; though see Price & Mattingley, in press), atleast once the inducer has been attended to (e.g., Mattingley, Payne, & Rich, 2006; Rich & Mattingley, 2003, 2010; Sagiv, Heer,& Robertson, 2006; Ward, 2012, pp. 321–322). This is where deciding on the automaticity or involuntariness of the occur-rence of crossmodal correspondences matters: For should crossmodal correspondences turn out not to be automatic thatwould provide additional grounds for distinguishing them from synaesthesia.

Answering the automaticity question turns out to be important not only when it comes to trying to distinguish crossmo-dal correspondences from canonical cases of synaesthesia, but also because it may help researchers to assess how crossmo-dal correspondences fit more generally into the framework of multisensory integration research (e.g., Bremner, Lewkowicz, &Spence, 2012; Calvert, Spence, & Stein, 2004; Stein, 2012). The idea that crossmodal correspondences may influence multi-sensory integration, and that their effect is, or at least can be, perceptual rather than necessarily just decisional (e.g., possiblyreflecting some sort of response bias) in nature is a relatively recent one (see Gallace & Spence, 2006; Parise & Spence, 2009;Sweeny et al., 2012). That said, it makes sense in those cases in which a perceptual effect of crossmodal correspondences hasbeen demonstrated in the laboratory to ask whether or not the multisensory integration of the component signals is influ-enced by the focus of a participant’s attention.

However, little is currently known about this topic, and the investigation is made all the more complicated by the largenumber of correspondences that have been reported to date, and which perhaps belong to different kinds (see Deroy et al., inpress; Sadaghiani et al., 2009; Spence, 2011). What is more, a wide variety of tasks has been used to test each phenomenon(see Spence, 2011, for a review).3 Here, we suggest narrowing the discussion down to audiovisual correspondences, which arethe best documented to date, and which perhaps represent the most likely place to find automaticity in crossmodalcorrespondences.4

One reason for considering audiovisual correspondences as representing one of the best potential candidates for being‘automatic’ is that they have been suggested to operate as ‘coupling priors’ in Bayesian Decision Theory models of multisen-sory integration (see Ernst, 2007; Spence, 2011). Many coupling priors are considered to operate in an automatic manner:Helbig and Ernst (2008), for example, have demonstrated that the integration of visual and haptic shape information is unaf-fected by the focus of a participant’s attention to a specific sensory modality by means of an attentional load manipulation.This result then suggests some degree of automaticity for this particular form of multisensory integration.5 Another popularexample of multisensory integration that is seemingly immune to spatial, or modality-based, attentional manipulations is theaudiovisual ventriloquism effect (see Bertelson, Vroomen, de Gelder, & Driver, 2000; Vroomen, Bertelson, & de Gelder, 2001;though see Fairhall & Macaluso, 2009; Röder & Büchel, 2009). The kinds of multisensory integration that underlie these twoeffects therefore appear to operate in a fairly automatic manner, at least in the sense of their not being intentional, nor underan observer’s conscious control.

By contrast, the audiovisual integration at stake in the McGurk effect is modulated by variations in cognitive load (seeAlsius, Navarra, Campbell, & Soto-Faraco, 2005; Alsius, Navarra, & Soto-Faraco, 2007, for an exception; see Navarra, Alsius,Soto-Faraco, & Spence, 2009, for a review). Such mixed results suggest a range of degrees of automaticity in which to locatethese crossmodal correspondences which can also act as ‘coupling priors’. In other words, it is important to see whetherthese crossmodal correspondences exhibit a different degree or kind of automaticity than other factors that are known to

3 Of course, the same problem raises its head for those researchers who are interested in assessing the automaticity of synaesthesia (Blake, Palmeri, Marois, &Kim, 2005; Esterman, Verstynen, Ivry, & Robertson, 2006; Lupiáñez & Callejas, 2006; Treisman, 2005).

4 Note here also that the majority of synaesthesia researchers who have attempted to tackle the automaticity question have tended to focus on just a fewspecific cases (or kinds of synaesthesia), normally those cases involving a visual concurrent (e.g., Blake et al., 2005; Esterman et al., 2006; Lupiáñez & Callejas,2006; Treisman, 2005).

5 We would argue that the matching (or integration) of shape information across modalities should be considered as an example of amodal stimulusmatching, rather than as an example of crossmodal correspondence (though not every researcher necessarily makes such a distinction, see Maurer & Mondloch,2005).

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260 247

influence the way in which our brains combine sensory cues. In the next section, we highlight the way in which researchershave broken the notion of automaticity down into a number of distinct sub-components (or criteria).

2. Automaticity: Defining features

Recent studies have delivered seemingly contradictory evidence concerning the automaticity of crossmodal correspon-dences (see Chiou & Rich, 2012a; Evans & Treisman, 2010; Klapetek et al., 2012; Parise & Spence, 2012; Peiffer-Smadja,2010). These studies have utilised a variety of different experimental paradigms, including speeded classification (Evans &Treisman, 2010), exogenous spatial attentional cuing (Chiou & Rich, 2012a; Mossbridge, Grabowecky, & Suzuki, 2011), visualsearch (Klapetek et al., 2012), and a simplified variant of the Implicit Association Test (IAT; Parise & Spence, 2012;Peiffer-Smadja, 2010). Making comparison of these results somewhat harder, these studies have also tested several differentaudiovisual crossmodal correspondences, and have used a variety of stimuli (see Table 1).

The authors of these various studies have come to conclusions that, at least on the face of it, appear to be mutually incon-sistent: Chiou and Rich (2012a), for instance, have argued recently that crossmodal correspondences are not automatic in thesense that they are ‘primarily mediated by cognitive processes after initial sensory encoding’ and occur at a ‘relatively late stage ofvoluntary attention orienting’ (see Chiou & Rich, 2012a, p. 339). Similarly, Klapetek et al. (2012, p. 1161) have suggested thatthe crossmodal correspondence between auditory pitch and visual brightness operates ‘‘at a more strategic (i.e., rather than atan automatic or involuntary) level.’’

By contrast, Evans and Treisman (2010), Parise and Spence (2012), and Peiffer-Smadja (2010) all argue that the availableevidence suggests that crossmodal correspondences are automatic. Evans and Treisman, for example, suggest that crossmo-dal correspondences ‘‘happen in an automatic fashion’’ (Evans & Treisman, 2010, p. 1) and later in their article state that ‘‘Theyare certainly automatic and independent of attention.’’ (Evans & Treisman, 2010, p. 10). Meanwhile, Parise and Spence point tothe fact that crossmodal correspondences influenced even the fastest of their participants’ discrimination responses (i.e.,those occurring within 400 ms of stimulus onset) and suggest that such evidence is at least consistent with claims regardingthe automaticity of crossmodal correspondences.

At this point, one might ask which of the above criteria should apply to evaluating these claims regarding the automa-ticity of crossmodal correspondences. This is by no means a simple question to answer given the widely-publicised difficultyassociated with any attempt to define automaticity and drawing a clear line between those processes that are automatic andthose that are not (e.g., Bargh, 1992, 1994; Logan, 1985; MacLeod & Dunbar, 1988; Schneider, Dumais, & Shiffrin, 1984;Shiffrin, 1988; see Moors & De Houwer, 2006, for a review). As Moors and De Houwer (2006, p. 297) succinctly put it:‘‘Despite its central nature, there is no consensus about what automaticity means.’’

Following on from Moors and De Houwer (2006), though, it seems both theoretically and pragmatically more appropriatenot to try and choose between the various criteria that have been put forward by researchers over the years, but rather toconsider automaticity as an umbrella term which encompasses distinct features (or sometimes sets of closely related fea-tures). According to Moors and De Houwer, there are four, non-overlapping diagnostic features: the goal-independence cri-terion; the non-conscious criterion; the load-insensitivity criterion; and the speed criterion (cf. Santangelo & Spence, 2008;Treisman, 2005). One obvious advantage of this pluralist approach is to stress that these features are assessed separately andthat specific experimental protocols usually only establish or measure one aspect or part of automaticity. Another theoreticaladvantage (although one might call it a challenge) is to raise the question regarding the way in which these various featuresrelate to one another, whether they recommend the breaking of automaticity into degrees or distinct kinds, and how thesedegrees or features relate to the degree or kind of control that can be exerted on a certain process. Leaving these larger issuesaside, here we are interested in determining whether crossmodal correspondences might satisfy all, or a subset, of these cri-teria, or satisfy them to varying degrees. The first of the criteria (the goal-independence criterion) eliminates goal-dependentor strategic processes from being categorised as automatic. A goal-directed process can be defined as one in which a personengages with the intention of pursuing a particular goal and over which s/he will exert a degree of control in terms ofwhether or not that goal is achieved. An automatic qua goal-independent process, then, has to be non-intentional: An indi-vidual cannot voluntarily prevent an automatic process from taking place. It also has to be out of the individual’s cognitivecontrol. Note here that these two aspects can come apart: That is, a process can be automatic in the sense that an individualcannot prevent it from occurring but s/he may still be able to exert some control over it once it has started (cf. Mattingley,2009; Ward, 2012).

In terms of empirical testing, the intentionality of a process can be assessed by demonstrating that it only occurs whenparticipants are instructed (or decide) to engage in a certain task. Assessing the controlled vs. uncontrolled character of aprocess is rather more complicated: For one thing, no process seems to be totally uncontrollable. Even classic involuntaryresponses, such as, for example, the knee-jerk, turn out to be under at least some degree of voluntary control (Matthews,1991). Methodologically, then, the most appropriate solution here is, by default, to consider a process as non-controlledunless there are clear signs that the process cannot be completed without monitoring or evidence of control (see Moors& De Houwer, 2006, for further discussion of this point).

The non-conscious criterion at first looks to be closely related to the goal-independent criterion, at least in the sense thatby intentional and controlled, one often means consciously intentional and under conscious control. However, there arereasons to believe that unconscious control and unconscious decisions also exist, and hence to try and draw a distinction


between the consciousness and goal-dependent aspects of automaticity. There are at least two ways in which to assess thenon-conscious character of a given process: In a strong version, the non-conscious character comes from demonstrating thata process is pre-attentive. The mere presence of the stimulus (or target) is sufficient to start the process, while awareness ofits presence is unnecessary. This can, for example, be tested by looking for pop-out effects in visual search paradigms(Mattingley, 2009; Treisman, 2005; Ward, Jonas, Dienes, & Seth, 2010; though see Mack & Rock, 1998).

In a weaker version, at stake in a variety of tasks such as the IAT (Greenwald, McGhee, & Schwartz, 1998) and its variants(e.g., Demattè, Sanabria, & Spence, 2007; Parise & Spence, 2012; Peiffer-Smadja, 2010), all that is needed for a process tocount as non-conscious is for it to occur without the participant’s conscious volition or control once the stimulus/targethas been attended to (cf. Chen, Yeh, & Spence, 2011). Indeed, this seems to be very much the reasoning behind Evans andTreisman’s (2010) assertion that the various audiovisual correspondences that they studied were automatic. Their claimwas that since the crossmodal correspondence between the auditory and visual stimuli affected participants’ performanceeven when their presence was completely irrelevant to a participant’s task (that is, they occurred without monitoring, touse Tzelgov’s, 1997, terminology) it therefore meant that they were unconscious and non-intentional.

According to the load-insensitivity criterion, a process is automatic if it is not hindered when the simultaneous informa-tion load goes up – such as, for example, when the perceptual load of a participant’s task is increased (Lavie, 2005). This isusually assessed by means of performance in dual-task interference paradigms: For example, by investigating whether per-formance in one task is affected by varying the perceptual resources that simultaneously need to be allocated by a partici-pant to a second task (Alsius et al., 2005, 2007; Eramudugolla, Kamke, Soto-Faraco, & Mattingley, 2011; Helbig & Ernst, 2008;Santangelo & Spence, 2008; Spence, 2010).

According to the fourth and final criterion, the speed criterion, a process is more likely to be automatic if it can bedemonstrated that it affects the very earliest stages of information processing. Hackley (1993), for example, suggests thatinformation processing is strongly automatic until 15 ms for audition and about 80 ms for vision (see Moors & De Houwer,2006, for further discussion of this criterion).

In summary, the evidence reviewed in this section supports two conclusions: (1) Researchers disagree about whethercrossmodal correspondences are automatic or not; and (2) While difficult to define, four criteria (goal-independence,non-conscious, load-insensitivity, and speed) seem critical to evaluating claims that a particular cognitive process, or phe-nomenon, is automatic. That said, the speed criterion appears to be the weakest of the four criteria, and so should perhapsbe weighted as somewhat less important that the others when it comes to assessing the automaticity of a given cognitiveprocess.

With these various criteria in mind, we can now turn to the empirical evidence concerning audiovisual crossmodalcorrespondences. As mentioned already, we will focus on those correspondences that have been postulated to be statisticalin origin. The studies reviewed here have forwarded somewhat different conclusions regarding the automaticity question (inpart, or so we will try to show below, because the researchers concerned have been using the term ‘automaticity’ to meanrather different things).

Table 1Summary of recent studies relevant to assessing claims regarding the automaticity of crossmodal correspondences (see text for details).

Crossmodalcorrespondence withauditory pitch

Task Did the crossmodalcorrespondence affectperformance?

Auditory pure tone stimuli(and their duration)

Klein, Brennan, D’Aloisio, et al.(1987) and Klein, Brennan,and Gilani (1987)

Visual elevation Speeded detection No Rising (700–1200 Hz) vs. declining(900–400 Hz); 250 ms

Evans and Treisman (2010) Visual elevation;visual size; visualspatial frequency

Speededclassification

Yes 1000 vs. 1500 Hz; 120 ms

Mossbridge et al. (2011) Visual elevation Go/No Go Yes Two ascending from 300 Hz to 450 Hzor to 450 to 600 Hz; two descendingfrom 450 to 300 Hz or 600 to 450 Hz

Chiou and Rich (2012a) Visual elevation Speeded detection Yes 300 vs. 1500 Hz; 200 msNo 300 vs. 400 Hz; 200 msYes 100 Hz vs. 900 Hz; 900 Hz vs. 1700 Hz;

200 msYes Pure tone, 250 Hz vs. 2500 Hz; 50 ms

Fernández-Prieto et al. (2012) Visual elevation Speeded detection Yes Rising (200–700 Hz) vs. falling tone(700–200 Hz); 200 ms

Klapetek et al. (2012) Visual brightness Speeded visualsearch

Yes 250 vs. 2000 Hz; 60 ms

Parise and Spence (2012) Visual size; visualangularity

Speeded IAT Yes 300 vs. 4500 Hz; 300 ms

Heron et al. (2012) Visual spatial frequency Speeded detection(asynchrony)

No 500 vs. 2000 Hz; 20 ms


3. Reviewing the evidence concerning the automaticity of crossmodal correspondences

3.1. Goal independence and intentionality

One crossmodal correspondence that has attracted perhaps more research interest than any other is the crossmodal map-ping of auditory pitch onto visual elevation (see Ben-Artzi & Marks, 1995; Bernstein & Edelstein, 1971; Evans & Treisman,2010; Klein, Brennan, D’Aloisio, et al., 1987; Klein, Brennan, & Gilani, 1987; Melara & O’Brien, 1987; Miller, 1991; Patching& Quinlan, 2002; Pedley & Harper, 1959). Specifically, people appear to associate higher-pitched sounds with higher visualstimuli while associating lower-pitched sounds with lower visual stimuli. People have also been shown to associate auditorystimuli with an ascending pitch with higher positions than auditory stimuli whose pitch descends (e.g., Fernández-Prieto,Vera-Constán, García-Morera, & Navarra, 2012; Mossbridge et al., 2011).

A recent study relevant to the question of the automaticity of crossmodal correspondences investigated this mapping be-tween auditory pitch and visual elevation. In particular, Chiou and Rich (2012a) conducted a series of exogenous spatialcrossmodal cuing studies (see Spence, 2010, for a review) in which the participants had to make simple speeded detectionresponses to a series of visual targets presented randomly from either above or below a central fixation point. Shortly beforethe presentation of each target, an auditory cue was presented from the centre of the display. The idea was that if higherpitched sounds are associated with higher spatial locations then an exogenous crossmodal spatial cuing effect might be ob-served (such that participants would detect upper targets more rapidly than lower targets following the presentation of ahigher-pitched than following a lower-pitched auditory cue). The results (see Fig. 1) revealed that participants respondedsignificantly more rapidly to the upper visual targets following the presentation of the higher pitched auditory cue(1500 Hz), whereas, lower visual targets were detected more rapidly following the presentation of lower-pitched auditorycues (300 Hz).

Previous research has revealed that exogenous spatial cuing effects usually dissipate within a few hundred millisecondsof the presentation of a lateralised auditory cue (see Spence, 2010; Spence, McDonald, & Driver, 2004, for reviews). Interest-ingly, however, Chiou and Rich’s (2012a) data did not reveal a significant interaction between Congruency and SOA. This nullresult led the authors to suggest that the effect on spatial attention on participants’ performance was just as large across thewhole range of SOAs tested. That said, closer inspection of their data (see Fig. 1), suggests that there might have been a dif-ferent pattern of results (namely the absence of a spatial cuing effect) at the shortest SOA tested (0 ms). Nevertheless, theseresults do provide some of the first empirical evidence that the crossmodal mapping of pitch to elevation can affect the spa-tial allocation of a participant’s exogenous attention.

Similar results have since been reported by Fernández-Prieto et al. (2012) in a study in which a rising (from 200 to700 Hz) or falling pitch (from 700 to 200 Hz) auditory cue was presented for 200 ms prior to a visual target located inone of four positions arranged in a square around fixation (see also Mossbridge et al., 2011). While the 4 ms spatial cuingeffect failed to reach statistical significance at the shorter SOA (400 ms) tested in this study, it did at the longer interval(550 ms; the cuing effect was approximately 7 ms at this SOA). Once again, the participants in this study had to make aspeeded simple detection response.6

In another of Chiou and Rich’s (2012a) experiments, the pitch of the auditory cue was made informative with regard tothe likely elevation of the visual target. In particular, a low-pitched (250 Hz) auditory cue predicted that the visual targetwould be presented from the upper target location on the majority (80%) of trials whereas the presentation of the high-pitched tone (2500 Hz) predicted that the target would likely appear in the lower position instead. The participants weretold about the meaning of the cue (and the probabilities concerned). The results (see Fig. 2) revealed that participants’ atten-tion was directed to the likely (rather than the crossmodally corresponding) target location, though this reversal of the spa-tial cuing effect (as compared to that reported in Chiou and Rich’s other experiments) took some time to emerge. Chiou andRich argued that the latter results demonstrated that the triggering of the crossmodal correspondence between auditorypitch and visual elevation was under the participants’ voluntary attentional control, and hence not ‘automatic’. In the contextof the present review, we would add here that this pattern of results also implies that the processing of crossmodal corre-spondences is, to some degree, goal-dependent.

However, it could be argued that there are potentially two influences on participants’ performance: One is the endoge-nous (or voluntary) shift of attention that is triggered by the informative cue, the other an exogenous (or stimulus-driven)shift of attention elicited by the natural crossmodal correspondence between relative pitch and elevation. When endogenousand exogenous attention pull in opposite directions, as in Chiou and Rich’s (2012a) experiment, it may simply be that thestronger of the two (apparently the endogenous effect), simply overrides the other (the exogenous effect) in terms of deter-mining where a participant’s attention is ultimately allocated (see Chica, Sanabria, Lupiáñez, & Spence, 2007).

Crucially, though, such a result should not be taken to demonstrate that the exogenous, or natural, crossmodal mappingdoes not exert any influence on a participant’s behaviour. In order to support such a claim, one would further need to dem-onstrate that there was no difference in the time-course and magnitude of spatial cuing effects under those conditions in

6 One unfortunate limitation with all of Chiou and Rich’s (2012a) studies (as well as with similar studies reported by Fernández-Prieto et al. (2012), Klein,Brennan, D’Aloisio, et al. (1987) and Klein, Brennan, and Gilani (1987) is that the use of a speeded simple detection response paradigm (even one with catchtrials) means that it is impossible to rule out a criterion shifting explanation of any spatial cuing effects that were observed (see Spence & Driver, 1997). Hence,one cannot know for sure whether any crossmodal cuing effects reported in these studies were decisional or perceptual in nature (cf. Mossbridge et al., 2011).


which the participants’ exogenous and endogenous attention pulls in the same vs. opposite direction (see Klein, Brennan,D’Aloisio, et al., 1987; Klein, Brennan, & Gilani, 1987). Should there be such a difference between these conditions, it couldperhaps be accounted for in terms a short-lasting exogenous cuing effect triggered by the presentation of the cue that is thenoverridden by the top-down (or strategic) allocation of a participant’s spatial attention. Therefore, until such empirical datahas been collected, the question of whether or not some automatic coding of crossmodal correspondence takes place remainsunresolved. And, indeed, in the only study we know of to have compared performance in these two conditions, participantsdid indeed find it easier to direct their attention when the informative value of a rising or falling pitch tone (regarding thelikelihood of a visual target appearing above or below fixation) matched the natural crossmodal mapping than when thesetwo were put into opposition (Juckes & Klein, unpublished; Klein & Juckes, 1989), as in Chiou and Rich’s (2012a) study.

In conclusion, the evidence reviewed in this section suggests, but does not unequivocally support, the claim that theaudiovisual crossmodal correspondence between auditory pitch and visual elevation is goal-dependent. In the next section,we continue to evaluate the goal-independence criterion in light of the problem of stimulus salience.

3.2. The problem of stimulus salience

Klapetek et al. (2012) investigated whether the pitch of an auditory cue would affect participants’ performance in aversion of the ‘pip-and-pop’ visual search task (Ngo & Spence, 2010; Van der Burg, Olivers, Bronkhorst, & Theeuwes,2008). Participants in this study had to search for a horizontal or vertical target bar in amongst displays containing 23,35, or 47 tilted (22" from the horizontal or vertical) distractor items (see Fig. 3). The brightness of the target and distractorschanged frequently (between light and dark grey, against a mid-grey background) in a seemingly randomly manner during

Fig. 1. Results of Chiou and Rich’s (2012a; Experiment 1) crossmodal cuing study. Mean reaction times (RTs) are plotted as a function of the SOA andcrossmodal correspondence between the pitch of the centrally-presented auditory cue and the position of the visual target (congruent = open symbols;incongruent = filled symbols). (A) Null crossmodal spatial cuing effect observed when the visual targets were presented to the left or right of fixation; (B)Significant crossmodal spatial cuing effect observed when the visual targets were presented from above or below fixation instead. Error bars represent1 SEM. (Reprinted with permission from Chiou and Rich (2012a, Fig. 2).)

Fig. 2. Results of Chiou and Rich’s (2012a; Experiment 4) study of the effects of endogenous attentional orienting and crossmodal correspondence on themagnitude of spatial cuing effects observed in the vertical dimension. Mean RTs are plotted as a function of the SOA and crossmodal correspondencebetween the elevation of the visual target and the pitch of the centrally-presented auditory cue (crossmodally congruent = open symbols; crossmodallyincongruent = filled symbols). (A) The results when the pitch of the centrally-presented auditory cue was non-predictive with regard to the location (uppervs. lower) of the visual target (essentially replicating the results shown in Fig. 1.B). The results when the pitch of the auditory cue was made predict of theopposite location (opposite in the sense of contradicting the natural crossmodal mapping). Error bars represent 1 SEM. (Reprinted with permission fromChiou and Rich (2012a, Fig. 5).)


the course of each trial. The brightness of the visual target switched between light and dark grey or vice versa once everysecond or so.

On two thirds of the trials, a task-irrelevant sound with an alternating pitch was presented in time with the changinglightness of the visual target. The auditory stimulus could either be crossmodally congruent or incongruent with the visualtarget: i.e., the lower-pitched 250 Hz sound synchronised with the presentation of the darker target, and the high-pitched2000 Hz tone presented in synchrony with the brighter target or incongruent, with the mapping reversed. On the remaining1/3rd of trials, no auditory cue was presented. The participants had to make a speeded manual discrimination responseregarding the orientation of the visual target (i.e., the changing brightness of the target was irrelevant to their task). Theresults demonstrated that the presentation of the auditory cue resulted in a significant facilitation of participants’ targetdiscrimination latencies (though interestingly no change in the search slope). Crucially, however, this crossmodal benefitwas not modulated by the crossmodal correspondence between the pitch of the sound and the brightness of the visual target(see Fig. 4A).

In a second experiment, however, Klapetek et al. (2012) investigated whether informing the participants about the cross-modal correspondence that existed between the stimuli presented in the auditory and visual modalities, together withblocking the presentation of the crossmodally congruent and incongruent trials (as compared to a random presentationof congruent and incongruent trials in their Experiment 1), would affect the pattern of results that was obtained. Interest-ingly, under such conditions, the crossmodal correspondence between the auditory cue and the visual target exerted a sig-nificant modulatory effect on participants’ performance (see Fig. 4B). Note, here, that exactly the same experimental stimuliwere used in both of Klapetek et al.’s studies. All that changed were simply the instructions given to the participant and theblocking of the trial types.

Taken together, Klapetek et al.’s (2012) results therefore provide further evidence (this time, from a visual search task),that the crossmodal correspondence between auditory pitch and visual brightness isn’t solely stimulus driven. This, how-ever, still does not demonstrate that the process underlying the effect of crossmodal correspondence is conscious, controlled,and/or goal-dependent. Another way in which to interpret the influence of explicit instruction on participants’ performanceis that it simply serves to make a certain correspondence more salient to the participant (cf. Chiou & Rich, 2012a). Here, thereis potentially an analogy with what happens when people look at images such as the famous black and white picture of theDalmatian dog hidden amongst a ground of leaves (see Life Magazine, 19th February, 1965, p. 120; Marr, 1982, p. 101,Fig. 3.1; or Ahissar & Hochstein, 2004, for other examples). Initially, many people will fail to recognise the dog. However,once they have ‘seen’ it, then, whenever they subsequently look at the picture again they will seemingly ‘automatically’see the animal. In a sense, then, the viewer’s response to the picture is not purely stimulus-driven, since information, or timespent inspecting the figure, can change the way in which they process/respond to it. However, once in that perceptive state,the dog is seen automatically, and the awareness of its presence cannot be suppressed voluntarily.

Although perceptual salience can be modulated by top-down processes (Theeuwes, 2010), its core comes from bottom-up, stimulus-driven signals indicating that a certain location or element in a display is sufficiently different from its sur-roundings to be worthy of attention (though see also Awh, Belopolsky, & Theeuwes, 2012). It is important, then, to try tounderstand which differences in the audiovisual context make the corresponding dimension salient – especially given themultidimensional nature of both the auditory and the visual percepts. Relevant here are the results of another of Chiouand Rich’s (2012a) experiments in which they demonstrated that if the magnitude of the frequency difference betweenthe high and low pitched auditory cues (presented, once again, in the context of an exogenous spatial cuing study) wasreduced from 1200 Hz down to only 100 Hz, then the crossmodal correspondence between the pitch of the auditory cue

Fig. 3. Visual search display with 36 stimuli used in Klapetek et al.’s (2012) studies investigating the effect of the crossmodal correspondence betweenauditory pitch and the lightness of a visual target in a variant of the ‘pip-and-pop’ visual search task. The visual target (a vertically-oriented bar) ishighlighted by a dotted yellow circle (not present in the actual experiment). (Figure reprinted with permission from Klapetek et al. (2012, Fig. 1).) (Forinterpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)


and the elevation of the visual target no longer influenced participants’ behavioural responses. That is, the effect of the con-gruent/incongruent crossmodal mapping between auditory pitch and visual elevation disappeared across the whole range ofSOAs (from 100 to 900 ms) that were tested.

Chiou and Rich (2012a) also reported that whether the auditory cue directed the participants’ attention to the upper orlower visual field depended on stimulus context, or rather, on the range of stimuli that were presented within a block of tri-als. Thus, while the presentation of a 900 Hz pure tone led to an upward shift of participants’ spatial attention when theother tone in a block of trials was 100 Hz, it gave rise to a borderline-significant trend toward a downward shift of attentionwhen the very same tone was presented amongst 1700 Hz tones instead. Such results confirm previous suggestions that theeffects of crossmodal correspondence on a participant’s behaviour appear to be determined in more of a relative rather thanan absolute manner (see Deroy & Spence, in press; Gallace & Spence, 2006; Marks, 1987; Pedley & Harper, 1959; Spence,2011; though see also Guzman-Martinez, Ortega, Grabowecky, Mossbridge, & Suzuki, 2012, for evidence that at least certaincrossmodal correspondences may show an absolute mapping).7

Now, one can legitimately ask whether these results really do show that the crossmodal shift of spatial attention elicitedby the crossmodal correspondence between the pitch of the auditory cue and the elevation of the subsequently-presentedvisual target is ‘underpinned by voluntary attention’ (Chiou & Rich, 2012a, p. 348)?8 Here we would like to argue that it is notaltogether clear what necessary implications the demonstration that an effect is relative and/or context-dependent has in termsof determining whether or not it is voluntary. It could be argued that these are somewhat orthogonal debates (see also Sperber,2005). What Chiou and Rich’s results more minimally demonstrate is, once again, that the crossmodally corresponding dimen-sions need to be salient to the participant, and that this salience is determined (in part) by the context of the experiment.

In summary, then, although the research reviewed in this section (and the last) appears to show that the processing ofcrossmodal correspondences is goal-dependent, and only operates in a strategic, top-down manner, the role of instructioncan be more minimally interpreted as a way in which to make the dimensions on which the crossmodal correspondenceoperates perceptually salient to the participant: Once those dimensions have been made salient, the crossmodal correspon-dence might then operate in a manner that is both automatic and goal-independent.

3.3. Speed

Recently, Parise and Spence (2012) reported a series of five experiments in which they tested various examples of cross-modal correspondences (including several different examples of sound symbolism) between auditory and visual stimuli. Theparticipants in these experiments had to make speeded manual discrimination responses to a random sequence of unimodalauditory and visual target stimuli. So, for example, in one experiment, the participants had to press one key in response tothe presentation of the smaller of two target circles and to the higher-pitched of two tones, while pressing the other responsekey whenever they were presented with either the larger circle or the tone with the lower-pitch (see Fig. 5). Meanwhile, in

Fig. 4. RTs as a function of sound (congruent, incongruent, no sound) and set size (24, 36, 48) in Klapetek et al.’s (2012) recent study. (A) The crossmodalcorrespondence between the changing pitch of the auditory stimulus and the changing brightness of the visual target was varied on a trial-by-trial basis inExperiment 1, but failed to modulate participants’ performance on the visual search task. (B) However, in a second experiment, in which the congruent andincongruent trials were blocked, and where the participants were also informed about the crossmodal correspondence at the start of each block of trials, asignificant effect of crossmodal correspondence was observed. The error bars represent the standard errors of the means for each combination of the twofactors. (Figure reprinted with permission from Klapetek et al. (2012, Fig. 2).)

7 Here one might wonder whether it is the general context that matters, or rather, the context of the transition from the sound presented on the immediatelypreceding trial (cf. Spence, Nicholls, & Driver, 2001). It is certainly possible to imagine a study of crossmodal correspondences in which a low, medium, and highpitched auditory cue were to be presented randomly on each trial. One could then isolate those trials on which the medium pitched sound is presented, andlook at whether that tone behaves like a high pitched sound if the tone on the immediately preceding trial was lower in pitch, but like a low pitched sound if theauditory cue on the immediately preceding trial was high pitched instead.

8 In full, Chiou and Rich’s (2012a, p. 348) argument is as follows: ‘‘The finding that contextual relative pitch determines the direction of attention shifts implies thatsubstantial mental processes after early auditory encoding mediate the pitch cuing effect. This leads us to suspect, despite pitch causing a shift in attention when it isnon-predictive, the effect may be underpinned by volitional attention.’’


other blocks of experimental trials, the mapping of the auditory and visual stimuli to the two response keys was reversed (sothat one response key was associated with larger circles and higher-pitched tones while the other response key was asso-ciated with smaller circles and lower-pitched tones). The participants in this study were instructed to respond as rapidlyand accurately as possible.

The results of all five of Parise and Spence’s (2012) experiments revealed a significant compatibility effect, with faster(and somewhat more accurate) responses being recorded in those putatively congruent blocks of trials than in those blocksof trials that were expected (according to the findings of previous research) to be crossmodally incongruent instead. What ismore, these crossmodal correspondence effects were present in even the fastest of participants’ responses (as revealed by abin analysis showing that the crossmodal correspondence between the auditory and visual stimuli even influenced thoseresponses that were elicited within 400 ms of stimulus onset). This was the result that led Parise and Spence to argue againsta strategic account of such results, and instead, like many before them, to suggest that crossmodal correspondences affectperformance in a seemingly automatic manner instead.

While one can certainly question whether the results of the IAT (and its variants) necessarily provide evidence of the im-plicit processing of the crossmodal correspondence between the auditory and visual stimuli that the participant is instructedto respond to (e.g., see De Houwer, Teige-Mocigemba, Spruyt, & Moors, 2009), Parise and Spence’s (2012) results are, in thecontext of the present review, nevertheless still interesting for several reasons. First, because the modulation of participants’performance that resulted from the crossmodal correspondence between the stimuli associated with a particular responsekey occurred under those conditions in which only a single unimodal target stimulus was presented on each trial. Such re-sults therefore demonstrate that crossmodal congruency can impact on participants’ performance when an explanation ofthe effect in terms of a bias in selective attention (to one or other stimulus; Marks, 2004) can effectively be ruled out (sinceonly a single stimulus was presented on each trial, and hence there was no stimulus-driven competition for a participant’sattention). Second, Parise and Spence’s results also rule out an account of crossmodal correspondences solely in terms ofmultisensory integration. To be clear, since only a single unisensory stimulus was presented on each trial, there was presum-ably no opportunity for multisensory integration to have influenced participants’ performance in any of Parise and Spence’sexperiments (see Ngo & Spence, 2012; Spence & Ngo, 2012b).

However, a potential problem arises when taking Parise and Spence’s (2012) results to support the automaticity of cross-modal correspondences. They based their assertion regarding automaticity on the fact that the influence of crossmodal cor-respondences was evident in even the fastest of their participants’ behavioural responses. ‘Fast’ in this case, though, actuallymeant manual responses that were initiated within 400 ms of the onset of target. Now while such results are most certainlyconsistent with the speed criterion of automaticity, it has to be said that they do not provide particularly strong support forit; after all, a lot happens within the first few hundreds of milliseconds of information processing (cf. Fiebelkorn, Foxe, & Mol-holm, 2010, 2012; Horowitz, Wolfe, Alvarez, Cohen, & Kuzmova, 2009). Of course, the speed criterion is only one aspect ofautomaticity and many very fast effects can be voluntary/optional. That said, much stronger evidence concerning how earlyin information processing crossmodal correspondences exert their effect could potentially come from neuroimaging studies.It is to the results of these that we turn next.

3.3.1. The cognitive neuroscience of crossmodal correspondencesOver the last couple of years or so, researchers have started to investigate crossmodal correspondences using various dif-

ferent cognitive neuroscience techniques (e.g., Bien et al., 2012; Kovic, Plunkett, & Westermann, 2010; Peiffer-Smadja, 2010;

Fig. 5. The stimulus–response mapping used and the results obtained in one of Parise and Spence’s (2012) recent studies using a variant of the IAT to assessthe consequences of varying the stimulus–response mapping between auditory pitch and visual size. The results demonstrate that participants found iteasier to pair crossmodally corresponding stimuli with the same response key (e.g., the large circle with the lower-pitched sound). (This figure is reprintedwith permission from Parise and Spence (in press, Fig. 3a and b).)


Seo et al., 2010; see also Nahm, Tranel, Damasio, & Damasio, 1993). The results of such research hold the potential to providemore robust evidence regarding the time-course of crossmodal correspondences (especially if the earliest neuronal re-sponses were to be affected by the crossmodal congruency between auditory and visual stimuli). So what do the data show?Sadaghiani et al. (2009) reported on an fMRI study in which a rising/falling pitch sound was presented while participantswere trying to determine whether an ambiguous visual motion display was drifting upwards or downwards. The crossmodalcorrespondence between the auditory pitch change and the direction of visual motion modulated neural activity in both vi-sual motion areas (specifically the left human motion complex; hMT+/V5+) as well as in higher areas (specifically, the rightintraparietal sulcus). That said, it was unclear from the results of this study whether this modulation of activity in the visualmotion areas resulted from feed-forward vs. feedback interactions. Hence, although certainly intriguing, Sadaghiani et al.’sresults do not provide any strong evidence regarding just how ‘early’ the influence of crossmodal correspondences can bepicked up in neural information processing. A similar limitation also affects the interpretation of Peiffer-Smadja’s (2010)fMRI study, where the author looked at the crossmodal correspondence between speech sounds and angularity.

More interesting here are the results of a combined TMS, EEG, and psychophysics study reported by Bien et al. (2012), inwhich the participants were simultaneously presented with a small or large circle and either a lower or higher pitchedsound. The lateral separation between the auditory and visual stimuli was varied and participants had to try and judgewhether the sound was presented to the left or right of the visual stimulus (this study was based on an earlier study by Pariseand Spence, 2009). On the basis of a combined ERP and repetitive TMS study, the authors concluded that the first neural signsassociated with a distinction between congruent and incongruent stimulus pairings started around 250 ms after stimulusonset in the right intraparietal sulcus (and was identified with the parietal P2).9 Note that such a temporal signature wouldnot normally qualify a process as being fast.

Here it should also be noted that the two other published ERP studies of crossmodal correspondences that we are awareof, although studying very different crossmodal correspondences, have come to roughly the same conclusion – namely thatthe effect of crossmodal correspondence (matching vs. mismatching) on ERPs first emerges around 150–200 ms after stim-ulus onset (see Kovic et al., 2010; Seo et al., 2010). While Kovic et al. described their ERP results as showing an ‘early’ mul-tisensory effect, it is worth noting that many other examples of ERP differences between matching and mismatching pairs ofauditory and visual stimuli have been documented much earlier in neural information processing (e.g., at 40 ms in Giard andPeronnet, 1999); see also Molholm, Ritter, Javitt, & Foxe, 2004; Molholm et al., 2002).

The involvement of the Lateral Occipital Complex (LOC), the angular gyrus (located within the temporal–parietal–occip-ital, TPO, region; Ramachandran & Hubbard, 2003), and even prefrontal cortex (in the bouba-kiki effect; Peiffer-Smadja,2010) in other crossmodal correspondence studies also points to an effect that doesn’t modulate the very earliest stagesof neural information processing (see also Fiebelkorn et al., 2010, 2012). At present, it is unclear to what extent any differ-ences in the patterns of neural activation in these studies should be attributed to differences in the tasks used or the corre-spondences investigated in these recent studies.

In conclusion, although crossmodal correspondences can affect the fastest of a participant’s behavioural responses, theirinfluence does not necessarily appear to impact on the earliest of their neural responses, and hence should not be taken asproviding strong support for the speed criterion. Note, though, that, contrary to other criteria like the conscious criterionwhich has a clear-cut satisfaction condition (a process has to be either conscious or not), the notion of ‘early’ and ‘fast’are relative, so that the conclusion should be that crossmodal correspondences are not among the earliest influences on hu-man information processing. Furthermore, given the lack of a clear-cut satisfaction condition on the speed criterion, this cri-terion should perhaps be weighted somewhat less heavily than the others when it comes to assessing the automaticity orotherwise of a cognitive process.

4. On automaticity and different kinds of crossmodal correspondence

As has becomes clear over the course of this review, the fact that no single answer emerges from these studies concerningthe automaticity of crossmodal correspondences is the sign that there is no simple answer to this question. Besides, differ-ences in the kind of criteria and standard of automaticity one chooses to focus on, and differences in the experimental taskused to test it, notable differences can be introduced by the selection of a particular crossmodal correspondence for testing.

First, when it comes to comparing studies that apparently focus on the same crossmodal relation, and that have utilisedseemingly identical tasks, as in the case, for example, for the crossmodal correspondence between pitch and elevation (Chiou& Rich, 2012a; Klein, Brennan, D’Aloisio et al., 1987; Klein, Brennan, & Gilani, 1987), there could be subtle differences attrib-utable to the particular stimuli (and, more importantly, the range of stimuli) that researchers happen to have used (see Ta-ble 1) which bear on the salience of the relevant dimensions. Pushing this hypothesis one step further, there might even be adifference between the crossmodal correspondences holding between pure tones and elevation (Chiou & Rich, 2012a) and,say, that holding between rising/descending sounds and elevation (Fernández-Prieto et al., 2012; Jeschonek, Pauen, &Babocsai, in press; Mossbridge et al., 2011), or even rising/descending visual stimuli.

9 TMS over parietal cortex disrupts the elicitation of the concurrent at least in certain forms of synaesthesia (see Esterman et al., 2006; Muggleton,Tsakanikos, Walsh, & Ward, 2007). Note, though, that the areas that are critically involved in crossmodal correspondences and synaesthesia appear to bedifferent, with TMS over the right parieto-occipital (PO) junction, but not left or right IPS or left PO knocking out the synaesthetic concurrent in Muggletonet al.’s study.


Pushing this one step further, it is important to stress why the question of there being different kinds of crossmodal cor-respondences matters. Simply as a methodological caution, it would seem sensible to remember that the conclusions drawnconcerning the crossmodal correspondence say, between auditory pitch and visual brightness by Klapetek et al. (2012)should not necessarily be taken to show that another crossmodal correspondence such as, for instance, the correspondencebetween pitch and elevation (Chiou & Rich, 2012a), need necessarily operate in exactly the same manner. Now, the argumentfor resisting the tendency to generalise from one type of crossmodal relation to another (even if that seems to be what isdone in many discussions of the automaticity of synaesthesia) comes from the more substantial reason that there are differ-ences in the origin and internalisation of the crossmodal correspondences.

According to Spence (2011; see also Deroy et al., in press; Sadaghiani et al., 2009), many crossmodal correspondenceslikely result from an organism picking up on the statistical regularities that are present in the environment, and occur forpairs of stimulus dimensions that happen to be correlated in nature (Spence & Deroy, 2012). Others, though, are likely toinvolve further structural or internal determinants that fall naturally out of the organisation of the organism’s mind/percep-tual system. The emerging cognitive neuroscience of crossmodal correspondences has only recently started to distinguishbetween the different areas involved in different audiovisual correspondences (e.g., Bien et al., 2012; Peiffer-Smadja,2010; Sadaghiani et al., 2009; Spence & Parise, 2012).

Continuing along these lines, one can further distinguish between correspondences that occur because of a common amo-dal coding (of, for instance, magnitude, shape, intensity, or space; e.g., Walker-Andrews, 1994; Walsh, 2003; see also Cohen,1934), or because of other indirect comparisons or mappings, in terms of their emotional effect (Palmer & Schloss, 2012;Schifferstein & Tanudjaja, 2004) or linguistic/semantic coding (Long, 1977; Martino & Marks, 1999, 2000; Sadaghianiet al., 2009; Smith & Sera, 1992). What’s more, other differences in the development of specific crossmodal correspondencesmight have an effect on their internalization or robustness, and therefore on the degree of automaticity of their effect: Onecould, for instance, hypothesise that those crossmodal correspondences that are more ‘natural’ and show at very early stagesof human development (such as the correspondence between pitch and brightness or pitch and size; see also Lewkowicz &Turkewitz, 1980) are more strongly internalised than others that may only develop later (for instance, those requiring fur-ther conceptual development, such as correspondences involving mass; Simner, Harrold, Creed, Monro, & Foulkes, 2009;Smith, Carey, & Wiser, 1985) and are more automatically (or at least rapidly) processed. Or perhaps it is all just a functionof exposure, with increased exposure making the crossmodal correspondence more likely to operate from an earlier stage ofinformation processing. In any case, further research will certainly need to keep these differences in mind: Instead of lookingfor a general answer to the automaticity question, researchers should perhaps focus instead on trying to understand whatkinds and degrees of automaticity in various tasks tell us about the access and control our minds have over kinds of corre-spondences that are variously internalised.10

In conclusion, the literature reviewed in this section draws attention to the possibility that difference answers to theautomaticity question may be obtained for different types of crossmodal correspondence (e.g., statistical, structural, and/or semantic). It might even be the case that different answers will be obtained when considering the correspondences thathold between different pairings of sensory modalities.

5. Conclusions

In conclusion, in the present article, we have reviewed a number of recent studies in which crossmodal correspondenceshave sometimes not impacted on participants’ performance. The interpretation of these failures (or perhaps, better said, nullresults) has resulted in some researchers arguing that crossmodal correspondences are strategic, goal-dependent, and/orintentional (e.g., Chiou & Rich, 2012a; Klapetek et al., 2012), require conscious control, and operate relatively late in infor-mation processing (Fernández-Prieto et al., 2012). Although our purpose in this article has been to qualify this conclusion, itshould, at least, be clear by now that the automaticity of the crossmodal correspondences discussed here are not in any waycomparable to the sort of automaticity that has been documented in the case of synaesthetic relations. Synaesthesia is lar-gely involuntary (Ward, 2012), meaning that the conscious sensory concurrent is automatically induced by the presentationof the inducing stimulus, just as long as that inducer is processed consciously by the synaesthete. It has been argued thatsynaesthesia occurs relatively early in information processing (though see also Simner & Hubbard, 2006), and is largelyload-insensitive (see Mattingley, 2009; Treisman, 2005, for reviews). In fact, many researchers take the automaticity ofthe concurrent to be a key defining characteristic of the condition (Hochel & Milán, 2008; see also Ward, 2012).

Note, though, that while some researchers have argued that (pre-attentive) pop-out occurs in the case of synaesthesia, theweight of scientific opinion is now against such claims (Mattingley, 2009; Ward, Jonas, Dienes, & Seth, 2010), leaving syn-aesthesia as a largely involuntary process. The synaesthetic association of sounds of different pitches with specific elevationsin space, as happens in (rare) cases of music-space synaesthesia have, for instance, been shown to result in automatic pro-cessing (at least in the sense of being involuntary and goal-independent) and to lead to compatibility effects in spatialStroop-like tasks: When both synaesthetic and non-synaesthetic participants are presented with a musical note and have

10 In the future, it may be helpful to try and develop a quantitative account of the acquisition of crossmodal correspondences. Relevant here may be thetraditional models of learning, such as those proposed by Estes (1950) and Bush and Mosteller (1951).


to reach for a visual target with the cursor of a mouse, only the synaesthetes are significantly faster when the target appearsin a compatible as compared to an incompatible location (see Linkovski, Akiva-Kabiri, Gertner, & Henik, 2012).

Whereas we agree with the general notion that crossmodal correspondences do not share the kind of automaticity exhib-ited by synaesthesia, we have also tried to make clear that there are a number of ways in which crossmodal correspondencesmight still present a high degree of automaticity. They do not qualify as automatic processes in the strong, or old-fashioned,sense of being systematic or necessary (e.g., Logan, 1985, 1989). This view has, though, now largely been given up on byresearchers (see Moors & De Houwer, 2006, for a review). That said, crossmodal correspondences can affect a participant’sbehaviour without their (conscious) control and in a relatively fast manner (though it should once again be acknowledgedthat the speed criterion is perhaps the weakest, or least important, of the four criteria when it comes to assessing the auto-maticity of a given cognitive process). Regarding the intentionality criterion, the evidence that the processing of crossmodalcorrespondences is goal-dependent and strategic remains weak. Regarding consciousness, we have highlighted how in themost interesting cases in which crossmodal correspondences appear to be ‘activated’, researchers seem to need to manipu-late the perceptual salience of the relevant dimensions primarily by means of the selection process involved or, on occasion,the verbal instructions given to participants (Klapetek et al., 2012).

We might also ask whether the differences in results between different studies of crossmodal correspondences are notalso to be explained by the fact that certain audiovisual correspondences (e.g., those operating between pitch and elevation)are not more robust or salient than others. It would be useful in this sense to vary the kind of audiovisual correspondencesproposed in most of the reviewed cases – for instance, by contrasting the results obtained with high-pitch-brightness vs.high-pitch-size vs. high-pitch-elevation within the same experimental paradigm (cf. Evans & Treisman, 2010; Parise &Spence, 2012).

When it comes to future investigation, it is worth bearing in mind that researchers studying the automaticity of crossmo-dal correspondences have typically used contrasting pairs of stimuli in different sensory modalities and used either explicitor implicit matching tasks. The need to resort to pairs of stimuli (that vary along a single dimension) is, however, perhaps notalways the most appropriate strategy as the task can appear surprising and hence might lead participants to adopt a morereflective strategy, for instance, reasoning by analogy (‘the more on the pitch scale should be matched with the more on thebright scale’, e.g., see Martino & Marks’, 1999, ‘semantic coding hypothesis’). If that is really what is happening, it could beargued that the results of such studies might not reflect internalised or learned crossmodal correspondences. As such, thepresentation of only a single stimulus in at least one of the sensory modalities might be a good way in which to investigatethose correspondences which have been internalised (see Guzman-Martinez et al., 2012).

Finally, further similarities and differences between the automaticity of crossmodal correspondences, other crossmodalpriors, and synaesthesia may emerge if one were to look at the load insensitivity of crossmodal correspondences, and theirpre-attentional character. So, for example, it would be interesting to determine whether increasing the perceptual load in agiven modality would necessarily reduce the magnitude of any crossmodal correspondence effects that are observed in a spe-cific experimental paradigm (Chiou & Rich, 2012b; cf. Helbig & Ernst, 2008; Lavie, 2005; Mattingley et al., 2006; Sweeny et al.,2012). Furthermore, with regard to the non-conscious criterion, it would be interesting to investigate whether visual stimuli(say a small or large circle) that a participant is not aware of, because say they are presented to one eye under conditions ofcontinuous flash suppression or binocular rivalry (Schwartz, Grimault, Hupé, Moore, & Pressnitzer, 2012; Sweeny et al.,2012) can nevertheless still influence a participant’s categorisation of the pitch of a sound say (see Marks, 2004; Spence, 2011).

5.1. Closing comments: Where do we stand with respect to the notion of automaticity?

Ultimately, it should now be clear that the traditional dichotomous, all-or-none view of a given cognitive process as beingeither automatic or not automatic is no longer tenable. Instead, when it comes to evaluating researchers’ claims regardingthe automaticity of a given cognitive process, or phenomenon, such as crossmodal correspodences, we need to consider theextent to which the various defining criteria (goal-independence, non-conscious, load-insensitivity, and speed) are met.According to this later proposal, then, a given cognitive processes will turn out to be more-or-less automatic, dependingon the number and extent of the defining features that characterise the phenomenon (or criteria that are met).

While the latter suggestion might seem an eminently sensible alternative to the traditional dichotomous view, it must beremembered that it leads to its own set of complications: First-and-foremost, one is immediately faced with the question ofhow to characterise these ‘‘different degrees of automaticity’’. How, for example, should one decide whether or not a givencognitive process (say crossmodal sensory correspondences) is more or less automatic than another phenomenon (such as,for example, synaesthesia)? How might one quantify the ‘degree’ to which each criterion is satisfied, and how should thedifferent criteria be weight in order to compute some absolute measure of ‘overall automaticity’? ‘‘Is so much load-invarianceworth so much goal-independence or ‘fastness’?’’, one might well ask.

Here we would like to end by stating our view that defining the common currency with which to quantify the degree ofautomaticity of a given cognitive process still remains a challenge of the first order; Furthermore, the challenge is likely toremain significant as long as researchers continue using different behavioural paradigms to study different crossmodal cor-respondences (see above). However, that said, we leave it for the reader to decide whether or not such difficulties would bestbe resolved by dispensing with the very notion of ‘automaticity’ (as suggested by one of the original reviewers of this paper),and perhaps simply focusing on the processes that lead to, or control, those patterns of behaviour that we may choose tocharacterise as being more or less automatic.


Acknowledgment

We would like to thank Cesare Parise for his detailed comments and suggestions on an earlier version of this manuscript.

References

Ahissar, M., & Hochstein, S. (2004). The reverse hierarchy theory of visual perceptive learning. Trends in Cognitive Sciences, 8, 457–464.Alsius, A., Navarra, J., Campbell, R., & Soto-Faraco, S. (2005). Audiovisual integration of speech falters under high attention demands. Current Biology, 15, 1–5.Alsius, A., Navarra, J., & Soto-Faraco, S. (2007). Attention to touch weakens audiovisual speech integration. Experimental Brain Research, 183, 399–404.Awh, E., Belopolsky, A. V., & Theeuwes, J. (2012). Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in Cognitive

Sciences, 16, 437–443.Bargh, J. A. (1992). The ecology of automaticity: Toward establishing the conditions needed to produce automatic processing effects. American Journal of

Psychology, 105, 181–199.Bargh, J. A. (1994). The four horsemen of automaticity: Awareness, intention, efficiency, and control in social cognition. In R. S. Wyer & T. K. Srull (Eds.).

Handbook of social cognition (Vol. 1, pp. 1–40). Hillsdale, NJ: Erlbaum.Belkin, K., Martin, R., Kemp, S. E., & Gilbert, A. N. (1997). Auditory pitch as a perceptual analogue to odor quality. Psychological Science, 8, 340–342.Ben-Artzi, E., & Marks, L. E. (1995). Visual–auditory interaction in speeded classification: Role of stimulus difference. Perception and Psychophysics, 57,

1151–1162.Bernstein, I. H., & Edelstein, B. A. (1971). Effects of some variations in auditory input upon visual choice reaction time. Journal of Experimental Psychology, 87,

241–247.Bertelson, P., Vroomen, J., de Gelder, B., & Driver, J. (2000). The ventriloquist effect does not depend on the direction of deliberate visual attention. Perception

and Psychophysics, 62, 321–332.Bien, N., ten Oever, S., Goebel, R., & Sack, A. T. (2012). The sound of size: Crossmodal binding in pitch-size synesthesia: A combined TMS, EEG, and

psychophysics study. NeuroImage, 59, 663–672.Blake, R., Palmeri, T., Marois, R., & Kim, C.-Y. (2005). On the perceptual reality of synesthesia. In L. C. Robertson & N. Sagiv (Eds.), Synesthesia: Perspectives

from cognitive neuroscience (pp. 47–73). New York: Oxford University Press.Bremner, A., Lewkowicz, D., & Spence, C. (Eds.). (2012). Multisensory development. Oxford: Oxford University Press.Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313–323.Cabrera, D., & Morimoto, M. (2007). Influence of fundamental frequency and source elevation on the vertical localization of complex tones and complex tone

pairs. Journal of the Acoustical Society of America, 122, 478.Calvert, G., Spence, C., & Stein, B. E. (Eds.). (2004). The handbook of multisensory processing. Cambridge, MA: MIT Press.Chen, Y.-C., Yeh, S.-L., & Spence, C. (2011). Crossmodal constraints on human visual awareness: Can auditory semantic context modulate binocular rivalry?

Frontiers in Perception Science, 2, 212.Chica, A., Sanabria, D., Lupiáñez, J., & Spence, C. (2007). Comparing intramodal and crossmodal cuing in the endogenous orienting of spatial attention.

Experimental Brain Research, 179, 353–364, 531.Chiou, R., & Rich, A. N. (2012a). Cross-modality correspondence between pitch and spatial location modulates attentional orienting. Perception, 41, 339–353.Chiou, R., & Rich, A. N. (2012b). Perceptual difficulty and speed pressure reveal different behavioural effects of voluntary and involuntary attention. Poster

session presented at the 39th Australasian Experimental Psychology Conference, Sydney, Australia.Clark, H. H., & Brownell, H. H. (1976). Position, direction, and their perceptual integrality. Perception and Psychophysics, 19, 328–334.Cohen, N. E. (1934). Equivalence of brightness across modalities. American Journal of Psychology, 46, 117–119.Cohen Kadosh, R., & Terhune, D. B. (2011). Redefining synaesthesia? British Journal of Psychology, 103, 20–23.Crisinel, A.-S., & Spence, C. (2010). As bitter as a trombone: Synesthetic correspondences in non-synesthetes between tastes and flavors and musical

instruments and notes. Attention, Perception, and Psychophysics, 72, 1994–2002.Crisinel, A.-S., & Spence, C. (2011). Crossmodal associations between flavoured milk solutions and musical notes. Acta Psychologica, 138, 155–161.Crisinel, A.-S., & Spence, C. (2012). A fruity note: Crossmodal associations between odors and musical notes. Chemical Senses, 37, 151–158.De Houwer, J., Teige-Mocigemba, S., Spruyt, A., & Moors, A. (2009). Implicit measures: A normative analysis and review. Psychological Bulletin, 135, 347–368.Demattè, M. L., Sanabria, D., & Spence, C. (2007). Olfactory-tactile compatibility effects demonstrated using the implicit association task. Acta Psychologica,

124, 332–343.Deroy, O., Crisinel, A., & Spence, C. (in press). Crossmodal correspondences between odours and contingent features: Odours, musical notes, and arbitrary

shapes, Psychonomic Bulletin & Review.Deroy, O., & Spence, C. (submitted for publication) Bordelines cases of crossmodal experiences, mind and language.Deroy, O., & Spence, C. (in press). Why we are not all synaesthetes (not even weakly so), Psychonomic Bulletin & Review.Deroy, O., & Valentin, D. (2011). Tasting liquid shapes: Investigating the sensory basis of cross-modal correspondences. Chemosensory Perception, 4, 80–90.Eramudugolla, R., Kamke, M., Soto-Faraco, S., & Mattingley, J. B. (2011). Perceptual load influences auditory space perception in the ventriloquist aftereffect.

Cognition, 118, 62–74.Ernst, M. O. (2007). Learning to integrate arbitrary signals from vision and touch. Journal of Vision, 7(5/7), 1–14.Esterman, M., Verstynen, T., Ivry, R. B., & Robertson, L. C. (2006). Coming unbound: Disrupting automatic integration of synesthetic color and graphemes by

TMS of the right parietal lobe. Journal of Cognitive Neuroscience, 18, 1570–1576.Estes, W. K. (1950). Toward a statistical theory of learning. Psychological Review, 57, 94–107.Evans, K. K., & Treisman, A. (2010). Natural cross-modal mappings between visual and auditory features. Journal of Vision, 10(1), 1–12 (article no. 6).Fairhall, S. L., & Macaluso, E. (2009). Spatial attention can modulate audiovisual integration at multiple cortical and subcortical sites. European Journal of

Neuroscience, 29, 1247–1257.Fernández-Prieto, I., Vera-Constán, F., García-Morera, J., & Navarra, J. (2012). Spatial recoding of sound: Pitch-varying auditory cues modulate up/down

visual spatial attention. Seeing and Perceiving, 25(Suppl.), 150–151.Fiebelkorn, I. C., Foxe, J. J., & Molholm, S. (2010). Dual mechanisms for the cross-sensory spread of attention: How much do learned associations matter?

Cerebral Cortex, 20, 109–120.Fiebelkorn, I. C., Foxe, J. J., & Molholm, S. (2012). Attention and multisensory feature integration. In B. E. Stein (Ed.), The new handbook of multisensory

processing (pp. 383–394). Cambridge, MA: MIT Press.Gallace, A., & Spence, C. (2006). Multisensory synesthetic interactions in the speeded classification of visual size. Perception and Psychophysics, 68,

1191–1203.Giard, M. H., & Peronnet, F. (1999). Auditory–visual integration during multimodal object recognition in humans: A behavioral and electrophysiological

study. Journal of Cognitive Neuroscience, 11, 473–490.Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The implicit association test. Journal of

Personality and Social Psychology, 74, 1464–1480.Guzman-Martinez, E., Ortega, L., Grabowecky, M., Mossbridge, J., & Suzuki, S. (2012). Interactive coding of visual spatial frequency and auditory amplitude-

modulation rate. Current Biology, 22, 383–388.Hackley, S. A. (1993). An evaluation of the automaticity of sensory processing using event-related potentials and brain-stem reflexes. Psychophysiology, 30,

415–428.


Hanson-Vaux, G., Crisinel, A.-S., & Spence, C. (2013). Smelling shapes: Crossmodal correspondences between odors and shapes. Chemical Senses, 38, 161–166.Helbig, H. B., & Ernst, M. O. (2008). Visual-haptic cue weighting is independent of modality-specific attention. Journal of Vision, 8(10), 1–16 (article no. 2).Heron, J., Roach, N. W., Hanson, J. V. M., McGraw, P. V., & Whitaker, D. (2012). Audiovisual time perception is spatially specific. Experimental Brain Research,

218, 477–485.Hochel, E., & Milán, E. G. (2008). Synaesthesia: The existing state of affairs. Cognitive Neuropsychology, 25, 93–117.Horowitz, T. S., Wolfe, J. M., Alvarez, G. A., Cohen, M. A., & Kuzmova, Y. I. (2009). The speed of free will. Quarterly Journal of Experimental Psychology, 62,

2262–2288.Hubbard, T. L. (1996). Synesthesia-like mappings of lightness, pitch, and melodic interval. American Journal of Psychology, 109, 219–238.Jeschonek, S., Pauen, S., & Babocsai, L. (in press). Cross-modal mapping of visual and acoustic displays in infants: The effect of dynamic and static

components, European, http://dx.doi.org/10.1080/17405629.2012.681590.Keller, P. E., & Koch, I. (2006). Exogenous and endogenous response priming with auditory stimuli. Advances in Cognitive Psychology, 2, 269–276.Klapetek, A., Ngo, M. K., & Spence, C. (2012). Do crossmodal correspondences enhance the facilitatory effect of auditory cues on visual search? Attention,

Perception, and Psychophysics, 74, 1154–1167.Klein, R., Brennan, M., D’Aloisio, A., D’Entremont, B., & Gilani, A. (1987). Covert cross-modality orienting of attention. Unpublished manuscript.Klein, R. M., Brennan, M., & Gilani, A. (1987). Covert cross-modality orienting of attention in space. Paper presented at the annual meeting of the Psychonomics

Society, Seattle (November).Klein, R. M., & Juckes, T. (1989). Can auditory frequency control the direction of visual attention. Paper presented at the Canadian Acoustic Association, Halifax,

NS, October (abstract published in proceedings).Kovic, V., Plunkett, K., & Westermann, G. (2010). The shape of words in the brain. Cognition, 114, 19–28.Lavie, N. (2005). Distracted and confused?: Selective attention under load. Trends in Cognitive Sciences, 9, 75–82.Lewkowicz, D. J., & Turkewitz, G. (1980). Cross-modal equivalence in early infancy: Auditory–visual intensity matching. Developmental Psychology, 16,

597–607.Linkovski, O., Akiva-Kabiri, L., Gertner, L., & Henik, A. (2012). Is it for real? Evaluating authenticity of musical pitch-space synesthesia. Cognitive Processing.

http://dx.doi.org/10.1007/s10339-012-0498-0.Logan, G. D. (1985). Skill and automaticity: Relations, implications, and future directions. Canadian Journal of Psychology, 39, 367–386.Logan, G. D. (1989). Automaticity and cognitive control. In J. S. Uleman & J. A. Bargh (Eds.), Unintended thought (pp. 52–74). New York: Guilford Press.Long, J. (1977). Contextual assimilation and its effect on the division of attention between nonverbal signals. Quarterly Journal of Experimental Psychology, 29,

397–414.Ludwig, V. U., Adachi, I., & Matzuzawa, T. (2011). Visuoauditory mappings between high luminance and high pitch are shared by chimpanzees (Pan

troglodytes) and humans. Proceedings of the National Academy of Sciences USA, 108, 20661–20665.Lupiáñez, J., & Callejas, A. (2006). Automatic perception and synaesthesia: Evidence from colour and photism naming in a Stroop-negative priming task.

Cortex, 42, 204–212.Mack, A., & Rock, I. (1998). Inattentional blindness. Cambridge, MA: MIT Press.MacLeod, C. M., & Dunbar, K. (1988). Training and Stroop-like interference: Evidence for a continuum of automaticity. Journal of Experimental Psychology:

Learning, Memory, and Cognition, 14, 126–135.Maeda, F., Kanai, R., & Shimojo, S. (2004). Changing pitch induced visual motion illusion. Current Biology, 14, R990–R991.Marks, L. E. (1987). On cross-modal similarity: Auditory–visual interactions in speeded discrimination. Journal of Experimental Psychology: Human Perception

and Performance, 13, 384–394.Marks, L. E. (2004). Cross-modal interactions in speeded classification. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), Handbook of multisensory processes

(pp. 85–105). Cambridge, MA: MIT Press.Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: W.H. Freeman and

Company.Martino, G., & Marks, L. E. (1999). Perceptual and linguistic interactions in speeded classification: Tests of the semantic coding hypothesis. Perception, 28,

903–923.Martino, G., & Marks, L. E. (2000). Cross-modal interaction between vision and touch: The role of synesthetic correspondence. Perception, 29, 745–754.Martino, G., & Marks, L. E. (2001). Synesthesia: Strong and weak. Current Directions in Psychological Science, 10, 61–65.Matthews, P. B. C. (1991). The human stretch reflex and the motor cortex. Trends in Neurosciences, 14, 87–91.Mattingley, J. B. (2009). Attention, automaticity and awareness in synaesthesia. Annals of the New York Academy of Sciences (The Year in Cognitive Science),

1156, 141–167.Mattingley, J. B., Payne, J. M., & Rich, A. N. (2006). Attentional load attenuates synaesthetic priming effects in grapheme-colour synaesthesia. Cortex, 42,

213–221.Maurer, D., & Mondloch, C. J. (2005). Neonatal synaesthesia: A reevaluation. In L. C. Robertson & N. Sagiv (Eds.), Synaesthesia: Perspectives from cognitive

neuroscience (pp. 193–213). Oxford: Oxford University Press.Melara, R. D. (1989). Dimensional interaction between color and pitch. Journal of Experimental Psychology: Human Perception and Performance, 15, 69–79.Melara, R. D., & O’Brien, T. P. (1987). Interactions between synesthetically corresponding dimensions. Journal of Experimental Psychology: General, 116,

323–336.Miller, J. O. (1991). Channel interaction and the redundant targets effect in bimodal divided attention. Journal of Experimental Psychology: Human Perception

and Performance, 17, 160–169.Molholm, S., Ritter, W., Javitt, D. C., & Foxe, J. J. (2004). Multisensory visual–auditory object recognition in humans: A high-density electrical mapping study.

Cerebral Cortex, 14, 452–465.Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C., Schroeder, C. E., & Foxe, J. J. (2002). Multisensory auditory–visual interactions during early sensory

processing in humans; A high-density electrical mapping study. Cognitive Brain Research, 14, 115–128.Mondloch, C. J., & Maurer, D. (2004). Do small white balls squeak? Pitch-object correspondences in your children. Cognitive, Affective, and Behavioral

Neuroscience, 4, 133–136.Moors, A., & De Houwer, J. (2006). Automaticity: A theoretical and conceptual analysis. Psychological Bulletin, 132, 297–326.Mossbridge, J. A., Grabowecky, M., & Suzuki, S. (2011). Changes in auditory frequency guide visual-spatial attention. Cognition, 121, 133–139.Muggleton, N., Tsakanikos, E., Walsh, V., & Ward, J. (2007). Disruption of synaesthesia following TMS of the right posterior parietal cortex. Neuropsychologia,

45, 1582–1585.Nahm, F. K. D., Tranel, D., Damasio, H., & Damasio, A. R. (1993). Cross-modal associations and the human amygdale. Neuropsychologia, 31, 727–744.Navarra, J., Alsius, A., Soto-Faraco, S., & Spence, C. (2009). Assessing the role of attention in the audiovisual integration of speech. Information Fusion, 11, 4–11.Ngo, M. K., & Spence, C. (2010). Auditory, tactile, and multisensory cues facilitate search for dynamic visual stimuli. Attention, Perception, and Psychophysics,

72, 1654–1665.Ngo, M., & Spence, C. (2012). Facilitating masked visual target identification with auditory oddball stimuli. Experimental Brain Research, 221, 129–136.Occelli, V., Spence, C., & Zampini, M. (2009). Compatibility effects between sound frequencies and tactually stimulated locations on the hand. NeuroReport,

20, 793–797.Palmer, S. E., & Schloss, K. B. (2012). Color, music and emotion. In École thématique interdisciplinaire CNRS. Coloeur: Approaches multisensorielles (pp. 43–58).

Roussillon: Ôkhra SCIC SA.Parise, C., & Spence, C. (2009). ‘When birds of a feather flock together’: Synesthetic correspondences modulate audiovisual integration in non-synesthetes.

PLoS ONE, 4(5), e5664.


http://dx.doi.org/10.1080/17405629.2012.681590

http://dx.doi.org/10.1007/s10339-012-0498-0

Parise, C. V., & Spence, C. (2012). Audiovisual crossmodal correspondences and sound symbolism: An IAT study. Experimental Brain Research, 220, 319–333.Parise, C. V., & Spence, C. (in press). Audiovisual crossmodal correspondences. In J. Simner, & E. Hubbard (Eds.), Oxford handbook of synaesthesia. Oxford:

Oxford University Press.Patching, G. R., & Quinlan, P. T. (2002). Garner and congruence effects in the speeded classification of bimodal signals. Journal of Experimental Psychology:

Human Perception and Performance, 28, 755–775.Pedley, P. E., & Harper, R. S. (1959). Pitch and the vertical localization of sound. The American Journal of Psychology, 72, 447–449.Peiffer-Smadja, N. (2010). Exploring the bouba/kiki effect: A behavioral and fMRI study. Unpublished Ms thesis, Universite Paris V, Descartes, France.Pratt, C. C. (1930). The spatial character of high and low tones. Journal of Experimental Psychology, 13, 278–285.Price, M. C., & Mattingley, J. B. (in press). Automaticity in sequence-space synaesthesia: A critical appraisal of the evidence. Cortex.Proctor, R. W., & Cho, Y. S. (2006). Polarity correspondence: A general principle for performance of speeded binary classification tasks. Psychological Bulletin,

132, 416–442.Rader, C. M., & Tellegen, A. (1987). An investigation of synesthesia. Journal of Personality and Social Psychology, 52, 981–987.Ramachandran, V. S., & Hubbard, E. M. (2003). Hearing colors, tasting shapes. Scientific American, 288(May), 43–49.Rich, A. N., Bradshaw, J. L., & Mattingley, J. B. (2005). A systematic, large-scale study of synaesthesia: Implications for the role of early experience in lexical-

colour associations. Cognition, 98, 53–84.Rich, A. N., & Mattingley, J. B. (2003). The effects of stimulus competition and voluntary attention on colour-graphemic synaesthesia. NeuroReport, 14, 1793–1798.Rich, A. N., & Mattingley, J. B. (2010). Out of sight, out of mind: Suppression of synaesthetic colours during the attentional blink. Cognition, 114, 320–328.Röder, B., & Büchel, C. (2009). Multisensory interactions within and outside the focus of visual spatial attention (commentary on Fairhall & Macaluso).

European Journal of Neuroscience, 29, 1245–1246.Roffler, S. K., & Butler, R. A. (1968). Factors that influence the localization of sound in the vertical plane. Journal of the Acoustical Society of America, 43,

1255–1259.Rudmin, F., & Cappelli, M. (1983). Tone-taste synesthesia: A replication. Perceptual and Motor Skills, 56, 118.Rusconi, E., Kwan, B., Giordano, B. L., Umiltà, C., & Butterworth, B. (2006). Spatial representation of pitch height: The SMARC effect. Cognition, 99, 113–129.Sadaghiani, S., Maier, J. X., & Noppeney, U. (2009). Natural, metaphoric, and linguistic auditory direction signals have distinct influences on visual motion

processing. Journal of Neuroscience, 29, 6490–6499.Sagiv, N., Heer, J., & Robertson, L. (2006). Does binding of synesthetic color to the evoking grapheme require attention? Cortex, 42, 232–242.Santangelo, V., & Spence, C. (2008). Is the exogenous orienting of spatial attention truly automatic? Evidence from unimodal and multisensory studies.

Consciousness and Cognition, 17, 989–1015.Schifferstein, H. N. J., & Tanudjaja, I. (2004). Visualizing fragrances through colors: The mediating role of emotions. Perception, 33, 1249–1266.Schneider, W., Dumais, S. T., & Shiffrin, R. (1984). Automatic and control processing and attention. In R. S. Parasuraman & D. R. Davies (Eds.), Varieties of

attention (pp. 1–27). New York: Academic Press.Schwartz, J.-L., Grimault, N., Hupé, J.-M., Moore, B. C. J., & Pressnitzer, D. (2012). Multistability in perception: Binding sensory modalities, an overview.

Proceedings of the Royal Society B, 367, 896–905.Seo, H.-S., Arshamian, A., Schemmer, K., Scheer, I., Sander, T., Ritter, G., et al (2010). Cross-modal integration between odors and abstract symbols.

Neuroscience Letters, 478, 175–178.Shiffrin, R. M. (1988). Attention. In R. C. Atkinson, R. J. Hernstein, G. Lindzey, & R. D. Luce (Eds.). Stevens’ handbook of experimental psychology (Vol. 2,

pp. 739–811). New York: Wiley.Simner, J., Harrold, J., Creed, H., Monro, L., & Foulkes, L. (2009). Early detection of markers for synaesthesia in childhood populations. Brain, 132, 57–64.Simner, J., & Hubbard, E. M. (2006). Variants of synesthesia interact in cognitive tasks: Evidence for implicit associations and late connectivity in cross-talk

theories. Neuroscience, 143, 805–814.Smith, C., Carey, S., & Wiser, M. (1985). On differentiation: A case study of the development of the concepts of size, weight, and density. Cognition, 21, 177–237.Smith, L. B., & Sera, M. D. (1992). A developmental analysis of the polar structure of dimensions. Cognitive Psychology, 24, 99–142.Spence, C. (2010). Crossmodal spatial attention. Annals of the New York Academy of Science (The Year in Cognitive Neuroscience), 1191, 182–200.Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, and Psychophysics, 73, 971–995.Spence, C., & Deroy, O. (2012). Are chimpanzees really synaesthetic? i-Perception, 3, 316–318.Spence, C., & Deroy, O. (2013). Crossmodal mental imagery. In S. Lacey, & R. Lawson (Eds.), Multisensory imagery: Theory and applications. (pp. 130–159). New

York: Springer.Spence, C., & Driver, J. (1997). Audiovisual links in exogenous covert spatial orienting. Perception and Psychophysics, 59, 1–22.Spence, C., McDonald, J., & Driver, J. (2004). Exogenous spatial cuing studies of human crossmodal attention and multisensory integration. In C. Spence & J.

Driver (Eds.), Crossmodal space and crossmodal attention (pp. 277–320). Oxford, UK: Oxford University Press.Spence, C., & Ngo, M. (2012a). Capitalizing on shape symbolism in the food and beverage sector. Flavour, 1, 12.Spence, C., & Ngo, M. K. (2012b). Does attention or multisensory integration explain the crossmodal facilitation of masked visual target identification? In B.

E. Stein (Ed.), The new handbook of multisensory processing (pp. 345–358). Cambridge, MA: MIT Press.Spence, C., Nicholls, M. E. R., & Driver, J. (2001). The cost of expecting events in the wrong sensory modality. Perception and Psychophysics, 63, 330–336.Spence, C., & Parise, C. V. (2012). The cognitive neuroscience of crossmodal correspondences. i-Perception, 3, 410–412.Sperber, D. (2005). Modularity and relevance: How can a massively modular mind be flexible and context-sensitive? In P. Carruthers, S. Laurence, & S. P.

Stich (Eds.), The innate mind: Structure and content (pp. 53–68). New York: Oxford University Press.Stein, B. E. (Ed.). (2012). The new handbook of multisensory processing. Cambridge, MA: MIT Press.Sweeny, T. D., Guzman-Martinez, E., Ortega, L., Grabowecky, M., & Suzuki, S. (2012). Sounds exaggerate visual shape. Cognition, 124, 194–200.Theeuwes, J. (2010). Top-down and bottom-up control of visual selection. Acta Psychologica, 135, 77–99.Treisman, A. (2005). Synesthesia: Implications for attention, binding, and consciousness – A commentary. In L. Robertson & N. Sagiv (Eds.), Synaesthesia:

Perspectives from cognitive neuroscience (pp. 239–254). Oxford, UK: Oxford University Press.Trimble, O. C. (1934). Localization of sound in the anterior posterior and vertical dimensions of auditory space. British Journal of Psychology, 24, 320–334.Tzelgov, J. (1997). Specifying the relations between automaticity and consciousness: A theoretical note. Consciousness and Cognition, 6, 441–451.Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J. (2008). Non-spatial auditory signals improve spatial visual search. Journal of Experimental

Psychology: Human Perception and Performance, 34, 1053–1065.Vroomen, J., Bertelson, P., & de Gelder, B. (2001). The ventriloquist effect does not depend on the direction of automatic visual attention. Perception and

Psychophysics, 63, 651–659.Walker, P., Bremner, J. G., Mason, U., Spring, J., Mattock, K., Slater, A., et al (2010). Preverbal infants’ sensitivity to synesthetic cross-modality

correspondences. Psychological Science, 21, 21–25.Walker, P., & Smith, S. (1985). Stroop interference based on the multimodal correlates of haptic size and auditory pitch. Perception, 14, 729–736.Walker-Andrews, A. (1994). Taxonomy for intermodal relations. In D. J. Lewkowicz & R. Lickliter (Eds.), The development of intersensory perception:

Comparative perspectives (pp. 39–56). Hillsdale, NJ: Lawrence Erlbaum.Walsh, V. (2003). A theory of magnitude: Common cortical metrics of time, space and quality. Trends in Cognitive Sciences, 7, 483–488.Ward, J. (2012). Synaesthesia. In B. E. Stein (Ed.), The new handbook of multisensory processes (pp. 319–333). Cambridge, MA: MIT Press.Ward, J., Huckstep, B., & Tsakanikos, E. (2006). Sound-colour synaesthesia: To what extent does it use cross-modal mechanisms common to us all? Cortex,

42, 264–280.Ward, J., Jonas, C., Dienes, Z., & Seth, A. (2010). Grapheme-colour synaesthesia improves detection of embedded shapes but without pre-attentive ‘pop-out’

of synaesthetic colour. Proceedings of the Royal Society of London. Section B. Biological Sciences, 277, 1021–1026.


Spence, C., & Deroy, O. (2013b). How automatic are crossmodal correspondences?

Documents

Transcript of Spence, C., & Deroy, O. (2013b). How automatic are crossmodal correspondences?