Post on 28-Apr-2023
Cortical speech processing unplugged:a timely subcortico-cortical frameworkSonja A. Kotz and Michael Schwartze
Max Planck Institute for Human Cognitive and Brain Sciences, IRG ‘‘Neurocognition of Rhythm in Communication’’,
Stephanstrasse 1a, 04103 Leipzig, Germany
Opinion
Glossary
Oscillation: An oscillation describes the unfolding of repeating events in terms
of frequency, i.e. the number of events repeated in a specific amount of time.
Serial order: Succession of events in the temporal dimension. Serial order
precludes simultaneity and is thus not identical to temporal structure.
Speech event: Speech events manifest as a set of linguistic and paralinguistic
categories (e.g. phoneme, syllable, word, voice, stress or phrase) with partly
overlapping borders that can be combined or decomposed into other speech
events.
Speech processing: Analogous to temporal processing, the term speech
processing comprises all mechanisms involved in the perception and
production of verbal expressions.
Synchronization: Temporal alignment of two or more oscillations. Synchroni-
zation is achieved when at least one oscillation adjusts its phase and/or period
to match that of another.
Temporal processing: The neurocognitive mechanisms that underlie the
encoding, decoding and evaluation of temporal structure in perception and
production. Temporal processing refers exclusively to duration and temporal
relations but not to the formal aspect of information.
Temporal structure: Arrangement of events in the temporal dimension. An
event in the temporal dimension originates from a contrast between static and
dynamic changes that result in a subdivision of time. Temporal structure can
Speech is inherently tied to time. This fundamentalquality has long been deemed secondary, and has con-sequently not received appropriate recognition inspeech processing models. We develop an integrativespeech processing framework by synthesizing evol-utionary, anatomical and neurofunctional concepts ofauditory, temporal and speech processing. These pro-cesses converge in a network that extends corticalspeech processing systems with cortical and subcorticalsystems associated with motor control. This subcortico-cortical multifunctional network is based on temporalprocessing and predictive coding of events to optimizeinteractions between the organism and the environ-ment. The framework we outline provides a novelperspective on speech processing and has implicationsfor future studies on learning, proficient use, and devel-opmental and acquired disorders of speech productionand perception.
The temporal nature of speechSpeech essentially conveys patterns of energy distributedover time. However, the temporal nature of speech more orless vanished from linguistics in the wake of structuralistand generative theories of language. This separation ren-ders language a phenomenon independent of temporal andcontextual variation. However, neurofunctional data donot consistently support such separation. In the following,we argue that the temporal nature of speech needs to bereappraised to develop a naturalistic model of brain–
language function [1].The speech signal constitutes a rich source of infor-
mation that is mirrored by sensitive mechanisms fortemporal and spectral integration in hearing [2]. Theauditory periphery ensures that central processing sys-tems have access to a detailed representation of the acous-tic signal. To achieve the main purpose of speechperception (the inference of meaning) and in view of con-textual, physiological and temporal variability, it is plaus-ible that speech perception makes immediate andopportunistic use of all information sources available: fromsound characteristics to syntax and pragmatics. Thisperspective implies that nonlinguistic and linguistic pro-cesses interact to facilitate this objective. For example,temporal processing mechanisms (i.e. mechanisms under-lying the explicit encoding, decoding and evaluation of
Corresponding author: Kotz, S.A. (kotz@cbs.mpg.de).
392 1364-6613/$ – see front matter � 2010 Elsevier Ltd. All rights
temporal information) need to be involved in the interpret-ation of the temporal structure of speech.
Timing is fundamental to efficient behavior andoriginates in evolutionarily primitive brain structures suchas the cerebellum (CE) and the basal ganglia (BG) [3].During speech acquisition their capacity can be used toestablish basic routines that advance more sophisticatedbehavior. Once these routines are acquired, the BG con-tribution can be reduced to a supplementary and correctivefunction, whereas the CE remains actively engaged in thecomputation of sensory information [4].
In this article we argue that temporal structure is usedto support well-studied fronto-temporal speech processingnetworks that therefore need to be extended by temporalprocessing systems. We propose that a subcortico-corticalnetwork that includes the CE and BG is engaged in con-stant attempts to detect temporal regularities in sensoryinput and to predict the future course of events to optimizecognitive and behavioral performance, including speechprocessing.
Speech constitutes events in timeUnlike spatial information, acoustic information is entirelydependent on time. Time and temporal structure arecoupled to change. In speech, changes generate events –
such as vowels, stressed or unstressed syllables, and
be characterized in terms of categories such as duration, order, tempo or
regularity of events. Duration describes an implicit property of the signal,
whereas the latter categories describe temporal relations and can be
considered explicit temporal information.
reserved. doi:10.1016/j.tics.2010.06.005 Trends in Cognitive Sciences 14 (2010) 392–399
[(Figure_1)TD$FIG]
Figure 1. Visualizations of the utterance ‘‘Time must flow for sound to exist’’ [5] in the form of a waveform (top) and a spectrogram (bottom) created using the software
PRAAT (developed by Paul Boersma and David Weenink of the Institute of Phonetics Sciences of the University of Amsterdam, http://www.praat.org/). This utterance can be
decomposed into smaller speech events and categories such as syllables, phonemes or vowels. However, as both visualizations illustrate, neither events nor categories are
discrete due to coarticulation, i.e. the anterograde or retrograde effect of continuously moving articulators on the acoustic signal.
Opinion Trends in Cognitive Sciences Vol.14 No.9
phrases – that evolve in time. Depending on the level ofanalysis, speech events comprise different categories andhierarchies (Figure 1).
Speech events can be combined or decomposed intoshorter and longer events (e.g. the onset of a vowel, con-sonants within a syllable, or words in a phrase). However,the arrangement of these events is not incidental. Con-catenated events follow an order that converges intospecific speech patterns. The order of events in a patterncan be strictly sequential, with one event determining animmediately following event, and/or hierarchical, with oneevent determining another in the presence of interveningevents. However, patterns in speech also imply serialorder. Lashley [6] remarked that serial order in behaviorrelies on the transition of spatially distributed memoryrepresentations into temporal sequence. This operationcan be attributed to syntax. In the broadest sense, syntax‘‘denotes the organization of any combinatorial system inthe mind’’ ([7] p. 276). Syntax can thus be defined as a ‘‘setof principles governing the combination of discrete struc-tural elements into sequences’’ [8].
Another important ordering principle is recurrence.Similar to serial order, recurrence allows the generationof predictions about when specific events are likely tooccur; thereby temporal regularity can facilitate speechand cognitive processing. In its simplest form, recurrence isperiodic and can be depicted as an oscillation whose periodreflects the temporal relation of successive events. Notably,this form of oscillation can complement gamma and thetaband oscillations that constitute an important componentof speech processing [9,10]. However, recognition of rela-tional information, such as temporal regularity, necessi-tates an explicit internal representation of temporalstructure generated by the CE andBG temporal processingsystems. We propose that speech processing exploits both:temporal structure even if it is not regular, but alsotemporal regularity to optimize comprehension. Thisrequires an early and fast interaction of auditory andtemporal processing in a neurofunctional network thatcomprises overlapping neural correlates of auditory andtemporal processing (Box 1). This approach also provides amore general perspective on the interaction between an
393
Box 1. Early auditory input to the temporal processing network
A recent activation-likelihood-estimation meta-analysis [4] showed
that several regions of the CE respond to auditory stimulation. Earlier
tracer studies and unit recordings in animals [11–13] identified a
neural pathway for rapid auditory transmission [14,15] between the
CE and the cochlear nuclei where the fibers of the auditory nerve
terminate. This pathway can transmit auditory information to the
cerebellar temporal processing system. In turn, the nonmotor part of
the cerebellar dentate [16], one of the primary output nuclei to the
thalamus, projects to the frontal cortex [17] (preSMA in monkeys) that
then connects to the BG.
Huang and Liu [18] assume that the CE serves as an interface
between the auditory and motor systems, possibly initiating tracking
behavior. Although the cerebellar neurons seem unfit to process
detailed frequency, intensity or duration information, they display
special sensitivity to temporal and intensity differences [19], functions
that are both important to signal when an event occurs and to track
temporal structure.
Click-evoked responses in the frontal monkey cortex after complete
removal or destruction of the temporal lobe, the CE and the medial
thalamus raise the question whether there are direct projections to the
frontal cortex from the thalamus [20]. The suprageniculate nucleus (SG)
is one possible candidate for such transmission. The SG is responsive
to auditory stimulation [21] and displays fine temporal tuning
characteristics [22]. After injections into the prefrontal cortex, Kobler
et al. [23] found labeled cells in the SG of bats, whereas Kurokawa et al.
[24] found labeled terminals in the frontal cortex and auditory areas of
the temporal cortex after injections into the SG. Reciprocal connections
were identified with injections into both the SG and the fastigial nucleus
of the CE [25] as well as the superior colliculus [26]. Connections from
the SG to the frontal cortex consist of separate neuronal groups of
different sizes and shapes [27] with Fr2, the target location in the frontal
cortex, corresponding to monkey SMA. Thus, the SG could constitute a
relay between the frontal and the cerebellar cortex that is connected to
several cortical auditory fields [28]. These are connected to the pontine
nuclei, indicating cerebellar involvement in a circular architecture. The
CE projects via dentato-thalamo-cortical pathways to areas from which
it receives input via the cortico-ponto-cerebellar pathways [29]. These
pathways form a link between cerebellar, temporal and fronto-striatal
circuitry in which the thalamus plays a pivotal role in earlier and later
processing stages.
Opinion Trends in Cognitive Sciences Vol.14 No.9
ever changing external environment and an equallydynamic cognitive environment.
A subcortico-cortical framework of speech perceptionand productionIf auditory processing is indeed coupled with temporalprocessing, then auditory and temporal processing systemsneed to interface to create a representation of auditorytemporal structure, the backbone of speech. There is someconsensus that classical motor systems are involved intemporal processing. Hence, in speech perception we dis-tinguish two parallel auditory pathways: (i) the preatten-tive encoding of event-based temporal structure in the CE,which forwards temporal information to the frontal cortexvia the thalamus and (ii) the retrieval of memory repres-entations in temporal cortex, which are projected to frontalcortex. The presupplementary motor area (preSMA) bindstemporal structure [30] and receives information from theCE [17]. It further transmits information to the dorsolat-eral prefrontal cortex (DLPFC), which then integratesmemory representations and temporal information to opti-mize comprehension. Additionally, the attention-depend-ent BG temporal processing system and its thalamo-cortical connections continuously evaluate temporalrelations and support the extraction of temporalregularity. It also engages in reanalysis and resequencingof information when the temporal structure of a stimulus isunfamiliar or incongruent (Figure 2A).
In the planning of speech production, the preSMA andthe BG in concert with the CE serve as a pacemaker thatprovides basic temporal structure. An interplay of SMA-proper, premotor and primary motor cortex then utilizesthis temporal structure to guide articulation (Figure 2B).This structural dissociation of the SMA is in line withevidence for preSMA involvement in word selection, encod-ing of word form and the control of syllable sequencing,whereas the SMAproper supports overt articulation [31].
The preSMA connects to the rostral striatum and to thesuperior/inferior frontal gyrus, whereas the SMAproper isconnected to the caudal putamen, the precentral gyrus andthe corticospinal tract [32–34]. Similar circuitry forms the
394
basis of temporal processing in the BG that depends onensembles of cortical oscillations conveying a signature oftemporal structure to the BG [3]. The CE in turn isfunctionally connected to the dorsolateral, medial andanterior prefrontal cortex [35], and via the thalamus tothe SMA. Evidently, subregions of these classical motorareas (e.g. the nonmotor part of the dentate [16], the rostralstriatum or the preSMA) primarily engage in perception,whereas other subregions (e.g. the motor part of the den-tate, the caudal putamen and the SMAproper) are involvedin production. Crucially, the thalamus mediates infor-mation flow in this framework (Box 2).
The most basic function of the motor system (to modifybody posture to produce proactive and reactive move-ments) improves with precise timing. Moreover, themutual influence of motor and cognitive processes suchas temporal processing could represent one of the drivingforces in the development of sophisticated motor and cog-nitive skills such as speech processing.
On the origin and development of speech processingcapacitiesUltimately, functional differentiation goes hand in handwith structural change. Converging morphological evi-dence supports the view that the motor system has recon-figured to meet the challenges posed by developingcommunicative and cognitive skills [41]. Simultaneousenlargement of the lateral CE and the frontal cortex aswell as the formation of a cerebello-cortical loop reflectdeveloping speech processing capacity [42]. The lateral CEengages in cognitive tasks and its increased size in homi-nids is most likely to be accompanied by a similar increasein the brain’s information processing capacity [43,44].Contralateral connections from the CE to the cortex andfrom the cortex to the CE substantiate speculations aboutlateralization at the subcortical and cortical level as evi-denced by functional temporal asymmetry in both hemi-spheres (Box 3).
These reciprocal connections establish a cerebello-tha-lamo-cortical circuit that is comparable to cortico-striato-thalamo-cortical circuits. Together they provide a powerful
[(Figure_2)TD$FIG]
Figure 2. (a) A framework for speech perception. In speech perception, auditory information is transmitted to the auditory cortex via the thalamus (a, blue) and to the CE (b),
where the temporal relationship between successive events is encoded (1) and transmitted to the frontal cortex (red). The seminal AST model on speech perception [50]
accounts for differences in temporal sensitivity (L = left; R = right; orange letters reflect short temporal windows of integration; blue letters reflect longer windows of
integration). Auditory information is mapped onto memory representations (3a) that are transmitted to the frontal cortex (c) to be integrated with temporal event structure
(4) that is conveyed via a cerebello-thalamo-preSMA (3b) pathway (red). Temporal information is transmitted to the BG (5) via connections from the preSMA and from the
frontal cortex (d). The BG evaluate temporal relations and transmit this information back to the cortex (6) via the thalamus. The CE and the thalamus also provide direct
input to the BG (f), thereby possibly modulating BG processing. The descending auditory pathway (g) could modulate processing in the whole network. (b) A framework for
speech production. Memory representations are transmitted from the temporal (1) to the frontal cortex (a, blue), where they are mapped onto temporal event structure (b)
generated by the preSMA (2) in concert with the CE and the BG (4). Furthermore, the CE (5) is involved in the temporal shaping of syllables (d). The integrated information is
then used in motor control of articulation (e, green) interfacing the SMAproper (6), premotor and primary motor cortex. The CE and the thalamus also provide direct input to
the BG (f), thereby possibly modulating BG processing.
Opinion Trends in Cognitive Sciences Vol.14 No.9
computational basis because information in these circuitscan be processed rapidly and repeatedly. Moreover, pro-gression from simple to increasingly more complex beha-vior necessitates additional sequencing and patterningcapacity, a quality that has been attributed to the BG[55]. Although each of these brain structures has probablycontributed to the evolution of speech in isolation, thecombination and resulting functional differentiation ofsubcortical structures provides a novel perspective forspeech processing. Most importantly, this differentiationextends findings on the ontogenetic and phylogenetic de-velopment and maturation of white matter fiber tractsresponsible for information flow between gray matter cor-tical areas [56]. A prominent example is the left-hemi-sphere accentuated arcuate fasciculus that connects
Wernicke’s and Broca’s region. This white matter fiberbundle is only fully developed in humans; in chimpanzees,it is nonexistent [57]. The same can apply to macaques [57]or is at most rudimentary and less specified [58]. Thefunction of these white matter tracts is to convey memoryrepresentations to the DLPFC where this formal infor-mation can be integrated with explicit temporal structureto either comprehend or to produce speech sequences. Inspeech perception, temporal structure complements formalpredictions about upcoming events in a sequence. Inspeech production, the preSMA/BG pacemaker generatesa grid for the temporal alignment of memory representa-tions. In a similar vein, MacNeilage and Davis [59] proposethat speech evolved from a simple biomechanical mechan-ism (i.e. biphasic opening and closing mouth movements).
395
Box 2. Two thalamic firing modes
Sherman and Guillery [36] emphasize that understanding of cortical
function depends on knowledge about the nature of thalamic input to
the cortex. As auditory information passes through the thalamus the
question arises as to how it treats this information. Thalamic cells
respond to input in either ‘tonic’ or ‘burst’ mode [36]. Tonic mode
preserves input linearity, whereas burst mode is more efficient in
detecting input. Consequently, thalamic cells in burst mode send a
‘wake-up call’ to the cortex that is evoked by sudden novel or
unexpected input. Importantly, bursts follow the temporal properties
of stimulation and enhance sensory event detection [37]. For
example, in the visual domain, bursts occur approximately at phase
zero of the oscillation underlying a periodic stimulation [36] (Figure I).
Furthermore, they can signal salient input because bursts affect the
postsynaptic neuron more strongly than single spikes [38].
We speculate that burst-firing marks input events characterized by
salient changes at the energy level (e.g. onsets, offsets or more
intense parts of an acoustic signal). For instance, in speech, these
events might correspond to pauses, vowels or stress correlates. In
analogy to visual processing, we consider that bursts preferentially
occur at vowel onsets. Hence, thalamic bursts could transmit the
temporal relation between events for subsequent cortical processing
and also amplify the neural representation of these events.
Computational simulations of a bursting neuron [39] show that the
biophysical mechanisms of spike generation enable individual
neurons to encode different stimulus features into distinct spike
patterns. However, burst timing is more precise than the timing of
single spikes. Accordingly, the driving input from the cerebellar,
event-based temporal processing system to the thalamus could be
encoded via burst firing to forward precise temporal markers of
events. In parallel, a more linear and continuous stimulus representa-
tion, delivered via the auditory pathway, is primarily encoded via
tonic firing to preserve detailed spectro-temporal structure.
Burst firing is characterized by inter-spike intervals of approxi-
mately 100 ms, whereas tonic firing features intervals of around 10–
30 ms. Poppel [40] hypothesized that temporal processing proceeds
in an oscillatory fashion in which sensory input registered within
30 ms is treated as co-temporal. One can speculate that at a sampling
rate of approximately 30–40 Hz, perception of temporal order is
constrained by thalamic ‘packaging’ of information in tonic mode and
cortical sensitivity to these packages, (e.g. the phonemes of a
syllable). However, a better understanding of thalamic function is
clearly necessary to model speech and temporal processing.[(Figure_I)TD$FIG]
Figure I. Responses of lateral geniculate nucleus relay cells of cats during sinusoidal grating in either tonic (a) or burst (b) mode. Adapted from Sherman and Guillery
[36]. Tonic firing preserves linearity of the input, whereas burst firing selectively encodes parts of the stimulation that correspond to changes in the energy level of a
stimulus.
Opinion Trends in Cognitive Sciences Vol.14 No.9
At a conceptual level, the biphasicmovements of themouthcorrespond to a blank ‘syllabic frame’ [60]. Consonant andvowel sounds (e.g. /ba/) constitute content elements thatare inserted into the syllabic frame during articulation.Furthermore, the temporal structure of concatenated syl-lables (e.g. /bababa/) rests on input from the SMA. At thisstage, sequencing capacity and precise temporal proces-sing should be coupled to establish the serial order offrames and content elements.
Temporal structure and temporal alignment in speechprocessingWhich speech events convey suitable temporal structure?One way to classify the temporal structure of speech is todistinguish envelope, periodicity and fine structure levels[61]. In speech production these levels converge into aunitary acoustic signal, whereas in speech perceptiontwo aspects are fundamental: (i) the envelope conveysinformation about duration, rhythm, tempo and stressand (ii) speech can be understood when all other cuesbesides the slowly varying temporal envelope are degraded[62]. This implies that important speech events are cap-tured by these categories. However, the question remains
396
which part of the signal is used to assess them. Greenberg[63] developed a syllable-centric framework that describesthe syllable as an ‘energy arc’ (spectro-temporal profile).Typically, the syllabic nucleus protrudes as it correlateswith maximum ‘oral aperture’ in the articulatory gesture.The prominence of the vocalic nucleus is accompanied by asteep rise and a subsequent peak in acoustic energybecause it is typically more intense than a consonant. Inaddition to duration, a relative increase in intensity alsodistinguishes rhythmically prominent from nonprominentsyllables [64]. Greenberg proposes that the nucleus setsthe ‘spectro-temporal register’ on which the rest of thesyllable is mapped. This notion seems comparable to Mac-Neilage’s term ‘‘the general-purpose carrier, which weknow today as the syllable’’ ([60] p. 105).
Importantly, there is a related concept in speech percep-tion. Following the notion of perceptual centres or ‘p-centres’ [65], Port [66] describes the vocalic nucleus asthe carrier of a perceptual beat that renders this eventparticularly salient. Port refers to Dynamic AttendingTheory (DAT) [67] by linking the saliency of the vowel tothe pulse of a ‘neurocognitive attentional oscillation’.DAT proposes that the allocation of attention depends
Box 4. Questions for future research
� What are the anatomical and functional commonalities and
differences regarding human and animal temporal processing
networks?
� What is the function of direct and reciprocal subcortico-subcor-
tical pathways besides subcortico-cortical connections and do
these circuits interact?
� How do temporal processing and predictions generated on the
basis of temporal structure relate to other formal and contextual
aspects of speech (e.g. phonotactic rules or semantic priming)?
� Is the proposed temporal aspect of the framework the same in
speech and music? Moreover, is it restricted to the auditory
domain or is it involved across sensory and cognitive domains?
� Are there therapeutic applications (e.g. overemphasizing the
predictive value in the context of pathological speech processing,
or to combine specific temporal patterns with movement)?
Box 3. Lateralized sensitivity to temporal structure
It is well known that the left cortical hemisphere specializes in fine
temporal discrimination, whereas the right hemisphere holds
information over longer periods of time [45]. In a similar vein, the
language-dominant hemisphere differentiates fine temporal input,
whereas the complementary hemisphere integrates information
across longer time spans [46]. More specifically, at the tonal level,
the core bilateral auditory cortices are sensitive to temporal and
spectral variation. Spectral variation is weighted towards the right
hemisphere; the left hemisphere specializes in rapid temporal
processing [47]. In speech (i.e. syllables, words), both auditory
cortices engage in a general temporal processing mechanism. Rapid
variations in temporal sound structure are preferably processed in
the left hemisphere [48], whereas the contour of the speech
envelope concurs with right hemisphere areas [49]. In speech
perception, ‘Asymmetric Sampling in Time’ (AST) [50] ascribes a
short window of integration (20–50 ms) to the left hemisphere, and a
long window (150–250 ms) to the right hemisphere. For example,
spontaneous power fluctuations of intrinsic oscillations in right-
hemisphere regions of Heschl’s gyrus correspond to the dominant
syllabic rate, between 3 and 6 Hz, and to rates between 28 and 40 Hz
in the left hemisphere [10]. Contributions of preceding stages of
auditory processing in addition to those at the cortical level need to
be considered. A similar temporal dissociation is proposed for the
CE. The right CE responds more strongly to high frequency
information and speech, whereas the left CE is more sensitive to
low frequency information and singing [51]. Lateralization also
impacts on auditory information processing at all stages of the
auditory pathway including the thalamus and the brain stem [52].
The bilateral auditory cortices use implicit discharge rate codes
and explicit temporal codes to represent the temporal structure of
auditory signals [53]. Discharge rate codes integrate stimulus
features in discrete 30 ms windows that could reflect cortical
sensitivity to thalamic ‘packaging’ in tonic mode. Animal evidence
[54] indicates a slowdown of temporal response rates along the
ascending auditory pathway from the thalamus (10 ms) to the
auditory cortex (30 ms). Thus, co-temporal information [40] could
correspond to input sampled in a short window of integration within
the left hemisphere.
Opinion Trends in Cognitive Sciences Vol.14 No.9
on synchronization between self-sustained, adaptiveinternal oscillations and external temporal structure. Thissynchronization results in stimulus-driven allocation ofattention. With respect to speech perception, this impliesthat we can attempt to synchronize an internal attentionoscillation with the temporal structure of external speechevents such as the rhythmic succession of vocalic nuclei.Thus, the rise in acoustic energy and the resulting energymaximum in sound structure, and the perceived promi-nence of the p-centre can constitute complementaryphenomena. This event can then be used to: guide atten-tion to points in timewhen important information appears,to set a spectro-temporal register [63] and to open a frame[59] for the subsequent integration of content elementsfrom memory.
Information about temporal structure is then used toalign memory representations (temporal cortex) with thepoint in time that an event is maximally salient to ensuretemporal coherence and to optimize speech processing. Iftemporal structure conveys periodicity and allows theextraction of a regular pattern, then one can conceive ofsuch an alignment as an incidence of synchronizationbetween an external stimulus-inherent oscillation andan internal stimulus-driven oscillation. In line withDAT, information about successive events can provideattractors for an attention oscillation that are used to
synchronize attentional resources, and to generate framesfor content integration. In other words, speech perceptioncan involve integration of spectro-temporal content orformal information (‘what’) and temporal informationdelivered via the event-based temporal processing system(‘when’). The formal aspect involvesmapping from sound tomeaning in the temporal cortex and the transmission ofmemory content via white matter fiber tracts [68]. Thisgoes hand in hand with the functional differentiation of aventral and a dorsal stream in speech processing. Thedorsal stream maps acoustic or phonological representa-tions to articulatory or motor representations whereas theventral stream maps sensory or phonological representa-tions onto lexical conceptual representations [50,69]. For-mal information transfer via ventral and dorsal pathwaysis further subdivided into ‘what’, ‘how’ and ‘where’ streams[70].
Furthermore, the cortico-striato-thalamo-cortical atten-tion-dependent temporal processing system adds higher-order processing routines, such as interval estimation andcomparison, as well as the extraction of temporalregularity. This information can then be used to generatepredictions concerning the temporal locus of importantspeech events. Once regularity is perceived, the systemcan tolerate small perturbations whereas strongregularity-based predictions would allow the maintenanceof a pattern even in the presence of displaced or omittedevents. This function differentiates the proposed role of themotor system in processing temporal aspects such astracking rhythm, speech rate and turn taking in communi-cation [71,72].
Concluding remarksIn this article we have outlined a neurofunctional frame-work of speech processing that emphasizes two elementaryaspects of speech. First, information conveyed in an acous-tic signal is entirely time-dependent. Temporal character-istics of the signal and related temporal processing shouldtherefore play a significant role in both speech productionand perception. Second, the evolution of speech as a com-plex motor behavior originates in subcortico-cortical motorsystems and their capacity to temporally structure beha-vioral sequences. We propose that speech production andperception have retained characteristics of this primordialinteraction between motor timing and sequencing
397
Opinion Trends in Cognitive Sciences Vol.14 No.9
capacities, and a developing cognitive competence. Neu-roanatomically, this fundamental interaction can beretraced in ontogenetic and phylogenetic development inwhich primitive subcortical structures set in motion basiccomputational mechanisms in support of refined neocor-tical functions. In line with a recent proposal on speechproduction [73] we highlight the necessary contributions ofcortical and subcortical brain structures to speech proces-sing. We offer a framework within which to: (i) furtherinvestigate how different aspects of uni- and multimodalinformation converge in time to form unitary percepts [74],(ii) explain how developmental and compensatory mech-anisms in speech disorders impact on speech processing(e.g. [75]) and (iii) elucidate how the underlying perspectivetransfers to other domains such as music (Box 4).
AcknowledgmentsThe authors would like to thank D. Yves von Cramon for expertneuroanatomical input and discussion, and Kathrin Rothermich, Iris N.Knierim, Maren Schmidt-Kassow and Anna S. Hasting for continualfeedback. Special thanks to Robert F. Port and Angela D. Friederici as wellas to the anonymous reviewers for constructive comments on an earlierversion of the manuscript, and Richard Ivry for valuable discussion of theconcept. Lastly, thanks to Kerstin Flake for graphics support.
References1 Poeppel, D. andHickok, G. (2004) Towards a new functional anatomy of
language. Cognition 92, 1–122 Moore, B.C.J. (2003) Temporal integration and context effect in
hearing. J. Phonetics 31, 563–5743 Buhusi, C.V. and Meck, W.H. (2005) What makes us tick? Functional
and neural mechanisms of interval timing. Nat. Rev. Neurosci. 6, 755–
7654 Petacchi, A. et al. (2005) Cerebellum and auditory function: an ALE
meta-analysis of functional neuroimaging studies. Hum. Brain Mapp.25, 118–128
5 de Cheveigne, A. (2003) Time-domain auditory processing of speech.J. Phonetics 31, 547–561
6 Lashley, K.S. (1951) The problem of serial order in behavior. InCerebral mechanisms in behavior (Jeffress, L.A., ed.), pp. 112–136,Wiley
7 Jackendoff, R. (2002) Foundations of language, Oxford UniversityPress
8 Patel, A.D. (2003) Language, music, syntax and the brain. Nat. Rev.Neurosci. 6, 674–681
9 Ghitza, O. and Greenberg, S. (2009) On the possible role of brainrhythms in speech perception: intelligibility of time-compressedspeech with periodic and aperiodic insertions of silence. Phonetica.66, 113–126
10 Giraud, A. et al. (2007) Endogeneous cortical rhythms determinecerebral specialization for speech perception and production.Neuron. 56, 1127–1134
11 Huang, C.M. et al. (1982) Projections from the cochlear nucleus to thecerebellum. Brain Res. 244, 1–8
12 Woody, C.D. et al. (1998) Acoustic transmission in the dentate nucleusI. Changes in activity and excitability after conditioning. Brain Res.789, 74–83
13 Morest, D.K. et al. (1997) Neuronal and transneuronal degeneration ofauditory axons in the brainstem after cochlear lesions in the chinchilla:cochleotopic and non-cochleotopic patterns.Hearing Res. 103, 151–168
14 Wang, X.F. et al. (1991) The dentate nucleus is a short-latency relay of aprimary auditory transmission pathway. Neuroreport 2, 361–364
15 Xi, M.C. et al. (1994) Identification of short latency auditory responsiveneurons in the cat dentate nucleus. Neuroreport 5, 1567–1570
16 Dum, R.P. and Strick, P.L. (2003) An unfolded map of the cerebellardentate nucleus and its projections to the cerebral cortex.J. Neurophysiol. 89, 634–639
17 Akkal, D. et al. (2007) Supplementarymotor area andpresupplementarymotor area: targets of basal ganglia and cerebellar output. J. Neurosci.27, 10659–10673
398
18 Huang, C. and Liu, G. (1990) Organization of the auditory area in theposterior cerebellar vermis of the cat. Exp. Brain Res. 81, 377–383
19 Altman, J.A. et al. (1976) Electrical responses of the auditory area ofthe cerebellar cortex to acoustic stimulation. Exp. Brain Res. 26, 285–
29820 Bignall, K.E. (1970) Auditory input to frontal polysensory cortex of the
squirrel monkey: possible pathways. Brain Res. 19, 77–8621 Benedek, G. et al. (1997) Visual, somatosensory, auditory and
nociceptive modality properties in the feline suprageniculatenucleus. Neuroscience 78, 179–189
22 Paroczy, Z. et al. (2006) Spatial and temporal visual properties of singleneurons in the suprageniculate nucleus of the thalamus. Neuroscience137, 1397–1404
23 Kobler, J.B. et al. (1987) Auditory pathways to the frontal cortex of themustache bat, pteronotus parnellii. Science 236, 824–826
24 Kurokawa, T. et al. (1990) Frontal cortical projections from thesuprageniculate nucleus in the rat, as demonstrated with the PHA-L method. Neurosci. Lett. 120, 259–262
25 Katoh, Y. and Deura, S. (1993) Direct projections from the cerebellarfastigial nucleus to the thalamic suprageniculate nucleus in the catstudied with the anterograde and retrograde axonal transport of wheatgerm agglutinin-horseradish peroxidase. Brain Res. 617, 155–158
26 Katoh, Y. et al. (1994) Bilateral projections from the superior colliculusto the suprageniculate nucleus in the cat: a WGA-HRP /doublefluorescent tracing study. Brain Res. 669, 298–302
27 Kurokawa, T. and Saito, H. (1995) Retrograde axonal transport ofdifferent fluorescent tracers from the neocortex to the suprageniculatenucleus in the rat. Hearing Res. 85, 103–108
28 Budinger, E. (2000) Functional organization of auditory cortex in themongolian gerbil (meriones unguiculatus). IV. Connections withanatomically characterized subcortical structures. Eur. J. Neurosci.12, 2452–2474
29 Pastor, M.A. et al. (2008) Frequency-specific coupling in the cortico-cerebellar auditory system. J. Neurophysiol. 100, 1699–1705
30 Pastor, M.A. et al. (2006) The neural basis of temporal auditorydiscrimination. NeuroImage 30, 512–520
31 Alario, F. et al. (2006) The role of the supplementary motor area (SMA)in word production. Brain Res. 1076, 129–143
32 Lehericy, S. et al. (2004) 3-D diffusion tensor axonal tracking showsdistinct SMA and Pre-SMA projections to the human striatum. Cereb.Cortex 14, 1302–1309
33 Postuma, R.B. and Dagher, A. (2006) Basal ganglia functionalconnectivity based on a meta-analysis of 126 positron emissiontomography and functional magnetic resonance imagingpublications. Cereb. Cortex 16, 1508–1521
34 Middleton, F.A. and Strick, P.L. (2000) Basal ganglia and cerebellarloops: motor and cognitive circuits. Brain Res. Rev. 31, 236–250
35 Krienen, F.M. and Bruckner, R.L. (2009) Segregated fronto-cerebellarcircuits revealed by intrinsic functional connectivity. Cereb. Cortex 19,2485–2497
36 Sherman, M.S. and Guillery, R.W. (2005) Exploring the thalamus andits role in cortical function, MIT Press
37 He, J. and Hu, B. (2002) Differential distribution of burst and single-spike responses in auditory thalamus. J. Neurophys. 88, 2152–
215638 Izhikevich, E.M. (2004) Which model to use for cortical spiking
neurons? IEEE T. Neural. Networ. 15, 1063–200439 Kepecs, A. and Lisman, J. (2003) Information encoding and
computation with spikes and bursts. Netw. Comput. Neural. Syst.14, 103–118
40 Poppel, E. (1997) A hierarchical model of temporal perception. TrendsCogn. Sci. 1, 56–61
41 Lieberman, P. (2002) On the nature and evolution of the neural bases ofhuman language. Yearb. Phys. Anthropol. 45, 36–62
42 Leiner, H.C. et al. (1993) Cognitive and language functions of thehuman cerebellum. Trends Neurosci. 16, 444–447
43 MacLeod, C.E. et al. (2003) Expansion of the neocerebellum inhominoidea. J. Hum. Evol. 44, 401–429
44 Weaver, A.H. (2005) Reciprocal evolution of the cerebellum andneocortex in fossil humans. Proc. Natl. Acad. Sci. U. S. A. 102,3576–3580
45 Allard, F. and Scott, B.L. (1975) Burst cues, transition cues, andhemispheric specialization with real speech sounds. Q. J. Exp.Psychol. 27, 487–497
Opinion Trends in Cognitive Sciences Vol.14 No.9
46 Hammond, G.R. (1982) Hemispheric differences in temporalresolution. Brain Cognition 1, 95–118
47 Zatorre, R.J. and Belin, P. (2001) Spectral and temporal processing inhuman auditory cortex. Cereb. Cortex 11, 946–953
48 Liegeois-Chauvel, C. et al. (1999) Specialization of left auditory cortexfor speech processing in man depends on temporal coding. Cereb.Cortex 9, 484–496
49 Abrams, D.A. et al. (2008) Right-hemisphere auditory cortex isdominant for coding syllable patterns in speech. J. Neurosci. 28,3958–3965
50 Hickok, G. and Poeppel, D. (2007) The cortical organization of speechprocessing. Nat. Rev. Neurosci. 8, 393–402
51 Callan, D.E. et al. (2007) Speech and song: the role of the cerebellum.Cerebellum 6, 321–327
52 Schonwiesner et al. (2007) Hemispheric asymmetry for auditoryprocessing in the human auditory brain stem, thalamus, and cortex.Cereb. Cortex 17, 492–499
53 Wang, X. et al. (2003) Cortical processing of temporal modulations.Speech Commun. 41, 107–121
54 Wang, X. et al. (2008) Neural coding of temporal information inauditory thalamus and cortex. Neuroscience 154, 294–303
55 Graybiel, A.M. (1997) The basal ganglia and cognitive patterngenerators. Schizophrenia Bull. 23, 459–469
56 Friederici, A.D. (2009) Pathways to language: fiber tracts in the humanbrain. Trends Cogn. Sci. 13, 175–181
57 Rilling, J.K. et al. (2008) The evolution of the arcuate fasciculusrevealed with comparative DTI. Nat. Neurosci. 11, 426–428
58 Catani, M. et al. (2005) Perisylvian language networks of the humanbrain. Ann. Neurol. 57, 8–16
59 MacNeilage, P.F. and Davis, B.L. (2001) Motor mechanisms in speechontogeny: phylogenetic, neurobiological and linguistic implications.Curr. Opin. Neurobiol. 11, 696–700
60 MacNeilage, P.F. (2008) The origin of speech, Oxford UniversityPress
61 Rosen, S. (1992) Temporal information in speech: acoustic, auditoryand linguistic aspects. Philos. T Royal Soc. B. 336, 367–373
62 Shannon, R.V. et al. (1995) Speech recognition with primarily temporalcues. Science 270, 303–304
63 Greenberg, S. et al. (2003) Temporal properties of spontaneous speech –
a syllable-centric perspective. J. Phonetics 31, 465–48564 Kochanski, G. and Orphanidou, C. (2008) What marks the beat of
speech? J. Acoust. Soc. Am. 123, 2780–279165 Morton, J. et al. (1976) Perceptual centres (p-centres). Psychol. Rev. 83,
405–40866 Port, R.F. (2003) Meter and speech. J. Phonetics 31, 599–61167 Large, E.W. and Jones, M.R. (1999) The dynamics of attending: how
people track time-varying events. Psychol. Rev. 106, 119–15968 Glasser, M.F. and Rilling, J.K. (2008) DTI tractography of the human
brain’s language pathways. Cereb. Cortex 18, 2471–248269 Saur, D. et al. (2010) Combining functional and anatomical
connectivity reveals brain networks for auditory languagecomprehension. NeuroImage 49, 3187–3197
70 Rauschecker, J.P. and Scott, S.K. (2009) Maps and streams in theauditory cortex: non-human primates illuminate human speechprocessing. Nat. Rev. Neurosci. 12, 718–724
71 Kotz, S.A. et al. (2009) Non-motor basal ganglia functions: A review andproposal for a model of sensory predictability in auditory languageperception. Cortex 45, 982–990
72 Scott, S.K. et al. (2009) A little more conversation, a little less action –
candidate roles for the motor cortex in speech perception. Nat. Rev.Neurosci. 10, 295–302
73 Gunther, F.H. (2006) Cortical interactions underlying the production ofspeech sounds. J. Commun. Disord. 39, 350–365
74 Schroeder, C. et al. (2008) Neuronal oscillations and visualamplification of speech. Trends Cogn. Sci. 12, 106–113
75 Corriveau, K.H. and Goswami, U. (2009) Rhythmic motor entrainmentin children with speech and language impairment: tapping to the beat.Cortex 45, 119–130
399