Connectionist Neuroimaging

17
S. Wermter et al. (Eds.): Emergent Neural Computational Architectures, LNAI 2036, pp. 560-576, 2001. © Springer-Verlag Berlin Heidelberg 2001 Connectionist Neuroimaging Stephen José Hanson, Michiro Negishi, and Catherine Hanson 1 Psychology Department Rutgers University Newark N.J. USA Abstract. Connectionist modeling and neuroscience have little common ground or mutual influence. Despite impressive algorithms and analysis within connectionism and neural networks, there has been little influence on neuroscience, which remains primarily an empirical science. This chapter advocates two strategies to increase the interaction between neuroscience and neural networks: (1) focus on emergent properties in neural networks that are apparently “cognitive”, (2) take neuroimaging data seriously and develop neural models of dynamics in the both spatial and temporal dimensions. 1 Introduction In 1990 then President of the USA, George H W Bush, declared the “Decade of the Brain”. This year the “Decade of the Brain” ended (although perhaps George W Bush, the son, will declare yet another decade), it is worth asking what happened during the Decade of the Brain and in particular what was the influence of neural computation on neuroscience. How did neuroscience help define or delineate aspects of neural computation during this last decade? Neural Networks have become mainstream engineering tools (IEEE, 1998) helped stir a resurgence of statistical methods (esp. Bayesian methods). They have been incorporated into a large and diverse application base from medical to automotive and control applications. And Hollywood continues to believe “intelligence” is some property of a neural network. On the other hand, paradoxically, Neural Networks have had little or no effect on the larger mainstream neuroscience community or field. It is clear over the last decade, despite an increasing sophistication and development of neural network algorithms, that little has changed in the neuroscience field with respect to computation or the representational issues concerning neural tissue. Neuroscientists continue to focus on cell level mechanisms and generic properties of system level interaction. Most people blame neural networks for this lack of impact on neuroscience and biological considerations of neural networks. Four reasons are often cited: They are not biologically plausible They do not scale well with large problems They are not new---just statistics They don’t process symbols and humans do 1 Also at Telcordia Technologies, Piscataway, New Jersey.

Transcript of Connectionist Neuroimaging

S. Wermter et al. (Eds.): Emergent Neural Computational Architectures, LNAI 2036, pp. 560-576, 2001.© Springer-Verlag Berlin Heidelberg 2001

Connectionist Neuroimaging

Stephen José Hanson, Michiro Negishi, and Catherine Hanson1

Psychology DepartmentRutgers UniversityNewark N.J. USA

Abstract. Connectionist modeling and neuroscience have little commonground or mutual influence. Despite impressive algorithms and analysis withinconnectionism and neural networks, there has been little influence onneuroscience, which remains primarily an empirical science. This chapteradvocates two strategies to increase the interaction between neuroscience andneural networks: (1) focus on emergent properties in neural networks that areapparently “cognitive”, (2) take neuroimaging data seriously and develop neuralmodels of dynamics in the both spatial and temporal dimensions.

1 Introduction

In 1990 then President of the USA, George H W Bush, declared the “Decade of theBrain”. This year the “Decade of the Brain” ended (although perhaps George WBush, the son, will declare yet another decade), it is worth asking what happenedduring the Decade of the Brain and in particular what was the influence of neuralcomputation on neuroscience. How did neuroscience help define or delineate aspectsof neural computation during this last decade?

Neural Networks have become mainstream engineering tools (IEEE, 1998) helpedstir a resurgence of statistical methods (esp. Bayesian methods). They have beenincorporated into a large and diverse application base from medical to automotive andcontrol applications. And Hollywood continues to believe “intelligence” is someproperty of a neural network. On the other hand, paradoxically, Neural Networkshave had little or no effect on the larger mainstream neuroscience community or field.It is clear over the last decade, despite an increasing sophistication and developmentof neural network algorithms, that little has changed in the neuroscience field withrespect to computation or the representational issues concerning neural tissue.Neuroscientists continue to focus on cell level mechanisms and generic properties ofsystem level interaction.

Most people blame neural networks for this lack of impact on neuroscience andbiological considerations of neural networks. Four reasons are often cited:

� They are not biologically plausible� They do not scale well with large problems� They are not new---just statistics� They don’t process symbols and humans do

1 Also at Telcordia Technologies, Piscataway, New Jersey.

Connectionist Neuroimaging 561

I blame neuroscience. I see three reasons for this lack of connection:� For nearly 100 years neuroscience has been essentially an empirical enterprise,

one that has not easily embraced common abstract principles underlying commonbehavioral/physiological observations (“splitters” as opposed to “lumpers”).

� System Neuroscience, which should have the greatest impact on computationalapproaches, in general is the most difficult level in which to obtain requisite datato constrain network models or provide common principles due to potentialcomplexity of the multiple Neuron-Body problem.

� Most serious, is the level of analysis that Neuroscientists tend to cling to: thecellular level (or god help us the molecular level). This focus is notwithstandingthe lack of fundamental identification of this anatomically distinct structure asalso a unique unit of computation. It can be easily shown that computationalregularity at the behavioral level does not force unique implementations at theneural level.

We suggest there are two strategies to encourage more connections betweenneuroscience and neural networks.� One: Attempt to show emergent behavior that is more is similar to human

COGNITIVE performance: Analyze the network representations to understandthe nature of the interactions between learning and representations.

� Two: take neuroimaging data seriously and model it with neural networks (thatembody dynamical systems), for example, as opposed to doing inferentialstatistics-- Neuroimaging data could be seen as spatio-temporal multivariate data,representing some time dynamical system distributed through a 3-d volume.

In the end we shall suggest it is also productive to look for ways to combinecognitively suggestive models with the data rich methods of neuroimaging such asEEG and fMRI.

2 Network Emergent Behavior: The Case of Symbol Learning

We argue it is useful to demonstrate Emergent behaviors in Networks that were� not programmed,� engineered or� previously constrained� by choice of architecture,� learning rule or� distributional properties of the data.It is known that Recurrent Neural Networks can induce regular grammars fromexposure to valid strings drawn from the grammar. However, it has been claimed thatneural networks cannot learn symbols independent of rules (see Pinker). A basic puzzle in the cognitive neurosciences (30) is how simple associationistlearning which has been proposed to exist at the cellular and synaptic levels of a braincan be used to construct known properties of cognition which appear to requireabstract reference, variable binding, and symbols. The ability of humans to parsesentences and to abstract knowledge from specific examples appears to be inconsistentwith local associationist algorithms for knowledge representation (3, 8, 16, 20, 21, 22but see 11). Part of the puzzle is how neuron-like elements could from simple signalprocessing properties emulate symbol-like behavior. Properties of symbols includethe following (14).

562 S.J. Hanson, M. Negishi, and C. Hanson

a set of arbitrary "physical tokens" scratches on paper, holes on a tape,events in a digital computer, manipulated on the basis of "explicit rules"that are likewise physical tokens and strings of tokens. The rule-governed symbol-token manipulation is based purely on the shape ofthe symbol tokens (not their "meaning"), i.e., it is purely syntactic, andconsists of "rulefully combining" and recombining symbol tokens.There are primitive atomic symbol tokens and composite symbol-tokenstrings. The entire system and all its parts -- the atomic tokens, thecomposite tokens, the syntactic manipulations both actual and possibleand the rules -- are all "semantically interpretable:" The syntax can besystematically assigned a meaning e.g., as standing for objects, asdescribing states of affairs).

As this definition implies a key element in the acquisition of symbolic structureinvolves a type of independence from the task the symbols are found in and thevocabulary they represent. Fundamental to this type of independence is the ability ofthe learning system to factor the generic nature of the task or rules it confronts withfrom the aspect of the symbols or vocabulary set which are arbitrarily bound to theinput description or external referents of the task. In this report we describe a series ofexperiments with an associationist neural network that creates abstract structure that iscontext sensitive, hierarchical, and extensible.

... ...

second order connections

Output Layer Hidden Layer

Input Layer

Feedback Layer

Fig. 1. The Recurrent Network Architecture used in the simulations. This is a simple neuralnetwork learning architecture that possesses a simple memory. All weights are subject toadaptation or learning, there are no fixed structures in the RNN prior or during learning.

Consider the simple problem of learning a grammar from valid, or positive set onlysentences consisting of strings of symbols drawn randomly from an infinite populationof such valid strings. This sort of learning might very well underlie the acquisition oflanguage in children from exposure to grammatically correct sentences during normal

Connectionist Neuroimaging 563

discourse with their language community2. It is well known now that neural net-works3 can induce the structure of the FSM (Finite State Machine; for example seeFig. 2) only from presentation of strings drawn from the FSM (9, 25). In fact,recently (2) it has been shown that the underlying attractors of neural networks haveno choice but to be in a one to one correspondence with the states of the state machinefrom which the strings are sampled. This surprising theorem is the precedence forproposing that a neural network embodying the underlying rules of a state machine inits attractor space could also learn to ‘‘factor’’ the input encoding or externalsymbols. In the present report, we employ a Recurrent Neural Network (RNN, seeFigure 2) with a standard learning algorithm developed by Williams and Zipser,extended to second order connections (10). The network was trained with newlygenerated sentences until performance of the network met a learning criterion4. Inputsentences were limited to 20 symbols long and were constructed from local binaryencoding of each symbol.5 All weights in the RNN were candidates for adaptation, nostructures were fixed prior to or during learning. Humans are known to gain a memorial advantage from exposure to strings drawnfrom a FSM over ones that would be constructed randomly (17, 18, 23, 24) as thoughthey are extracting abstract knowledge of the grammar itself from exposure to stringsdrawn randomly from the FSM. A more stringent test of knowledge of a grammar

2 Although controversial, language acquisition must surely involve the exposure of children to valid

sentences in their language. Chomsky (3) and other linguists have stressed the importance of the a priorembodiment of the possible grammars in some form more generic than the exact target grammar.Although not the main point of the present report, it must surely be true that of the distribution of possiblegrammars, some learning bias must exist that helps guide the acquisition and selection of one grammarover another in the presence of data. What the nature of this learning bias is might be a more profitableavenue of research in language acquisition than the recent polarizations inherent in the nativist/empiricistdichotomy. (5, 16, 20, 22).

3 Neural networks consist of simple analogue computing elements that operate in parallel over an input andoutput field. Recurrent Neural Networks are networks that have recurrent connections to theirintermediate or ‘‘hidden’’ layers. Such recurrent connections implement a local memory of recentinput/output and processing states of the network. Feedforward networks have only unidirectionalconnections and hence no mechanism for examining past inputs.

4That is, after each training sentence, the network was tested with randomly generated 1000 sentences andthe training session was completed only when the network yielded below the low-threshold output nodeactivity when sentences could not end and above high-threshold activity when they could end. Thresholdswere initialized to 0.20 (high value threshold) and 0.17 (low value threshold) and were adapted usingoutput values during the network was processing test sentences. The high threshold was modified to theminimum value yielded at the end of test sentences minus a margin (0.01) and the low threshold wasmodified to the high threshold minus another margin (0.02) during the test, and these thresholds wereused for the next test and training sentences.

5Each word was represented as an activation value of 1.0 of a unique node in the input layer, while all othernode activations were set to 0.0. The task for the network was to predict if the next word was END (inwhich case the output layer node activation was trained to become 1.0) or not (output should be 0.0). Notethat when the FSM is at the end state, a sentence can either end or continue. Therefore at this state, thenetwork is sometimes taught to predict the end a sentence and sometimes not. However the networkeventually learns to yield higher output node activation when the sentence can end.

564 S.J. Hanson, M. Negishi, and C. Hanson

would be to expose the subjects to a FSM with one external symbol set and to see ifthe subjects transfer knowledge to a novel external symbol set. In principle in thistype of task, it is impossible for the subjects to use the symbol set as a basis forgeneralization without noting the patterns that are commensurate with the propertiesof the FSM6. A version of this type of transfer is shown in Fig. 3. In this task newsymbols are assigned randomly to the arcs, such that the external symbols arecompletely new. This task, which we call the Vocabulary Transfer Task, was used inthe first simulation to train recurrent neural networks and to examine their ability totransfer over novel, unknown symbol sets.

1

2

3

AB

B C

AC

1

2

3

DE

E F

DF

Fig. 2. The SYMBOL transfer task. The figure shows two finite state machine representations,each of which has 3 states (1, 2, 3) with transitions indicated by arrows and legal transitionsymbols (A, B, ... , F) for each state. Note that this task involves no possible generalization fromthe transition symbol. Rather, all that is available are the state configuration geometries. Thetask explicitly forces the network to process the symbol set independently from the transitionrule.

In this task, the network was trained on three regular grammars (the source grammar)which have the same syntactic structure (Fig. 2) defined on three unique sets of wordsand the effect of these prior training’s to the training of the target grammar wasmeasured as the network was trained with yet another new set of words. One of theindicators of such effect is the savings in terms of number of trials needed to meet thecompletion criterion. Fig. 3 shows the number of trials for both the source grammartrainings (vocabulary switching = 1, 2, ... , 9 in the figure) and the target grammartraining (vocabulary switching = 10), averaged on 20 networks with different initialrandom weights. The result of vocabulary switching is in the first 9 cycles is acomplete accommodation of the new symbol sets with near 100% savings. Thisaccommodation represents the networks ability to create a data structure that isconsistent with a number of independent vocabularies. More critically however therewas 63% reduction in the number of required trainings for the new unseen vocabulary.This result is remarkable given the required independence of syntax and vocabulary.Apparently the RNN is able to partially factor the transition rules from its constituentsymbol assignments after exposure to a diversity of vocabularies. One obvious question that arises is whether the source of the novel transfer is due toa network memory. Our initial studies in this area showed that in fact local memory is

6 Reber (24) showed that humans would significantly transfer in such a task, however his symbol sets

allowed subjects to use similarity as a basis for transfer as they were composed of contiguous letters fromthe alphabet. However, recent reviews of the literature indicate that this type of transfer is common evenacross modalities (16).

Connectionist Neuroimaging 565

important. We showed in a series of similar tasks that there was no significant savingsin learning for feedforward networks that were exposed to rule learning contexts (e.g.,‘‘Penzias Problem’’) with subsequent permutation transfer7. At least from thesepreliminary studies it would suggest that memory in the network is an importantcomponent of the ability of a neural network to transfer its syntactic knowledge.

Fig. 3. The learning savings from subsequent relearning of the symbol transfer task. Each datapoint represents the average of 20 networks trained to criterion on the same grammar. Therelearning cycle shows an immediate transfer to the novel symbol set which continues toimprove to near perfect transfer through the ninth cycle (3 cycles of the 3 symbols sets) untilthe 10th cycle where a completely novel set is used with the same grammar. Over 60% of theoriginal learning on the grammar independent of symbol set is saved.

How is the neural network accomplishing these abstractions? Note that in thevocabulary transfer task the Network as with a human subject has no possible way totransfer based on the external symbol set. It follows that the network must abstractaway from the input encoding. In effect the network must find a way to buffer orrecode the input in order to defer symbol binding until enough string sequences havebeen observed. If the network extracted the common syntactic structure, the hiddenlayer activities would be expected to represent the corresponding FSM states,regardless of vocabularies. This is, in fact, shown by linear discriminant analysis(LDA)8. After the network learned the first vocabulary, activity of the node wasshown to be sensitive to FSM states (Fig. 4A). In this figure, different FSM states are 7 Feedforward networks were trained on the Penzias task, a boolean counting task studied previously by

Denker et al (4). A permutation task was defined which was similar to the vocabulary transfer task, butthe feed-forward network showed only interference effects even with significant increases in the capacityof the network.

8 LDA of the hidden unit states allows for a complete search of linear projections that are optimallyconsistent with organizations based on state from the FSM or by vocabulary. LDA was applied to thehidden unit activations over 20 Networks to find some a stable result for the preferred encoding of theinput space. Evidence from the LDA for state representations would indicate that the RNN found asolution to the multiple vocabularies by referencing them hierarchically within each state based on contextsensitivity within each vocabulary cluster.

566 S.J. Hanson, M. Negishi, and C. Hanson

represented by different clusters, while the different symbol sets are plotted withdifferent graphic symbols. Note that these clusters represent attractors for the states inthe FSM. Moreover if one starts a trajectory nearby one of the clusters it proceeds to alocation nearby the cluster representing the appropriate state transition. Hence thisspace possesses context sensitivity in that coordinate positions encode both state andtrajectory information.

Fig. 4A. Linear Discriminant Analysis of hidden activities of networks that learned a singleFSM/symbol set. Note that the different clusters represent different states while the "+" signcodes for the single symbol set.

After each of the three vocabularies were learned in three cycles, LDA of the hiddenlayer node activities with respect to FSM states (Fig. 4B) was contrasted with thatwith respect to vocabulary sets (Fig. 4C). The correct rate of discrimination clearlyshows that the state space is organized by FSM states, since FSM states could becorrectly classified by the former linear discriminants with the accuracy of 80%(SD=16, n=20) whereas vocabulary set could be classified correctly for only 45%(SD=9.7, n=20). Notice in both figures 4B and 4C relative to 4A that the symbol setshave spread out and occupy more of the hidden unit space with significant gapsbetween clusters of the same and different symbol sets. Moreover from Fig. 4B, onecan also see that the vocabularies are hierarchically organized into statescorresponding to the FSM. This hierarchical structure provides a super-structure forthe accommodation of the already learned vocabularies and any new ones the RNN isasked to learn. It also can be seen from Fig. 4C that the hidden layer activations arealso sensitive to , but not linearly separable by, vocabularies.LDA after the test vocabulary is learned once also shows that the network state ispredominantly organized by FSM states (Fig. 5), although the linear separation byFSM states of a small fraction of activities are compromised. This interference by thenew vocabulary is not surprising considering that old vocabularies were not re-learnedafter the new vocabulary was learned. What is more interesting is the spatial location

Connectionist Neuroimaging 567

Fig. 4B. Linear discriminant analysis of the hidden units of the RNN that have learned threeindependent FSMs with three different symbol sets using state as the discriminant variable.Notice how the hidden unit space spreads out compared to Fig. 4A. Notice further that thespace is organized by clusters corresponding to states which are internally differentiated bysymbol sets (represented by different graphic symbols: +=ABC, triangle=DEF, square=GHI).

Fig. 4C. Linear Discriminant Analysis of hidden unit activities from networks trained on threeindependent FSM and three different symbol sets. The LDA used the symbol set as thediscriminant. Note the symbol sets are coded by 1 of 3 graphic codes same as in Fig. 4A and5B. In this case note that the discriminant function based on symbol set produces no simplespatial classification as in Fig. 4B which shows the same activations classified by the state ofthe FSM.

of the new vocabulary ("stars"). The hidden unit activity, again, clearly shows thatstate discriminant structure is dominant and organizes the symbol sets, the fourthvocabulary or unseen symbol set that the networks are exposed to, simply findsempty spots in the hidden units space to code it location relative to the existing statestructure, hence indicating the strength of the present abstraction to encourage

568 S.J. Hanson, M. Negishi, and C. Hanson

observance of the hierarchical state representation and its existing context sensitivity.In effect the network bootstraps from existing nearby vocabularies that would allowgeneralization to the same FSM that all symbol sets are using.

Fig. 5. Linear Discriminant analysis of the Hidden state space after training on 3 independentsymbol sets for 3 cycles and then transfer to a new untrained symbol set (coded by stars). Notehow the new symbol set slides in the gaps between symbol sets previously learned. Presumablythis provides initial context sensitivity for the new symbol set creating the 60% savings.

It has been argued that neural networks are incapable of representing, processing, andgeneralizing symbolic information (3, 16, 20, 21, 22). Pinker, for one, argues theremust be some distinction drawn between what the brain can do with "mere statisticalinformation" and the sorts of symbol processing that must be required to understandthat a "whale is not a fish or Tina Turner is a grandmother, overriding our statisticalinformation about what fish or grandmothers look like" (20). The other alternative, asdemonstrated by the present experiments is that neural networks that incorporateassociative mechanisms can be sensitive to both the statistical substrate of the worldand create data structures that have the property of following a deterministic rule andonce learned can be used to override even large amounts of statistical evidence (e.g.from another FSM). Quite conveniently as demonstrated, these data structures canarise even when that rule has only been expressed implicitly by examples and learnedby mere exposure to regularities in the data. The next section focuses on Neuroimaging techniques and discusses some of theproblems and some of the promise. This will help develop the idea of combiningRNN for extraction of FSM properties of real-time cognition.

Connectionist Neuroimaging 569

3 Taking Neuroimaging Seriously: New Tools for Neuroimaging Is Neuroimaging Just the 21st Century Phrenology?

Neuroimaging is an important new technology for he analysis and representation ofcognitive processes and the neural tissue that supports them. Nonetheless, thesetechnologies run the danger of becoming the new Phrenology. Unfortunately, when itcomes to studying cognitive processes, data analytic techniques commonly employedin neuroimaging limit and distort the hypotheses researchers can consider. For example, several statistical problems exist within the assumptions of theseanalysis:

� Independence of voxels in space and time; clearly there is dependence structurein both time and space.

� Gaussian assumption unlikely to hold; hence stat tests will be inefficient and missstructure in low S/N environments.

� Contrastive testing is subject to linearity and minimal components assumptions� Modularity metaphor --looking for local focal areas of computation--face and

teapot areas!Statistical thresholds that are chosen are enormously high, implying that either the

underlying distribution is non-guassian or that locality of signal is preferred ( 510− ,2010− )!

We have looked at a number of these assumptions including recently showing thatBOLD susceptibility in brain is non-gaussian (Hanson & Bly, (32)). Recently wehave looked specifically at the notion that the fMRI time series has a specifictemporal dependence (Murthy, Lange, Bly & Hanson (33)). We used two kinds ofsimple sensory tasks. The first task was a finger tapping task in a box car paradigmwith 4 seconds of tapping and 4 seconds of no finger tapping. A GE 1.5T scannerwas used to collect the data from a single subject. We also used an auditory taskwhere a single tone was presented again to a single subject also in a boxcarpresentation. Autoregressive time series models were used to model every voxel inthe brain over time steps and the goodness of fit was collected. All goodness of fitabove a criterion value (>95%) was re-projected back into the brain slice withoutfurther thresholding. In the figure 6 below we show two slices, one showing astandard SPM99 analysis of the boxcar for the finger tapping task (leftmost graphic)and the AR analysis showing the projection of an AR (3) model (rightmost graphic)which required no reference to the contrastive box car design, rather it createdsensitivity in the brain slice due only to the temporal structure of the fMRI signalitself. We have also used the time delay coefficients derived from the AR analysis whichwould indicate the sensitivity of the indicated tissue to the time dependence. Recentlywe have also shown that the time series is generally not stationary especially in areasthat seem “activated”. In these cases and perhaps more generally it would beappropriate to consider ARIMA style models which can explicitly modelnonstationarity. We note in passing that a more general ARIMA model is in fact aRecurrent Neural Network which as we have shown previously has the usefulproperty of extracting a unknown FSM from a time series.

570 S.J. Hanson, M. Negishi, and C. Hanson

SPM99 AR(3)

Fig. 6. Brain slices showing the “active” areas of the brain during a finger tapping task.Rightmost graphic shows a standard SPM99 analysis, while the leftmost graphic shows the ARmodel.

We have also used the time delay coefficients derived from the AR analysis whichwould indicate the sensitivity of the indicated tissue to the time dependence. Recentlywe have also shown that the time series is generally not stationary especially in areasthat seem “activated”. In these cases and perhaps more generally it would beappropriate to consider ARIMA style models which can explicitly modelnonstationarity. We note in passing that a more general ARIMA model is in fact aRecurrent Neural Network which as we have shown previously has the useful propertyof extracting a unknown FSM from a time series.

4 Dynamics of Cognition: The Case of Event Perception and Signal Fusion (EEG, fMRI)

4.1 Event Perception: Perceiving and Encoding Events

Day to day experience is characterized, remembered, and communicated as a series ofevents. We think about driving to work, we remember having an argument with ourspouse, and we tell a friend about our plans to attend the theatre next Saturday.Abbreviated phrases such as driving to work act as a type of shorthand notation fordescribing complex action sequences. Thus, our ability to communicate successfullywith others using such labels as driving to work reflects a certain level of familiaritywith the referenced activities that we share or presume to share with our intendedaudience. How common is our knowledge about common events? Empirical work suggeststhat there is considerable consensus concerning the constituent actions of familiarevents. For example, Bower, Black, and Turner (1) asked subjects to describe thetypical actions involved in going to a restaurant, attending a lecture, getting up,grocery shopping, and visiting a doctor. They found that subjects showed

Connectionist Neuroimaging 571

considerable agreement about the composition of common events, many responsesbeing offered by more than 70% of their subjects and very few being unique. Familiarity with events may provide the basis for understanding and encoding newinformation. In a second experiment of their 1979 study, Bower et al. (1) askedsubjects to parse prose stories centered on events such as visiting a doctor intosmaller "parts." They found that subjects tended to choose similar points in the storyas constituent boundaries. Agreement about event boundaries extends to onlinemeasures of parsing as well (e.g., Newtson (19); Hanson & Hirst (12)). For example,Hanson & Hirst (12) asked subjects to indicate the boundaries of events whileviewing videotapes of common activities (e.g., playing a game) under variousorientation instructions and found that subjects had little difficulty agreeing about theboundaries of such events.

4.2 Recurrent Nets and Schemata

Neisser’s Perceptual Cycle

Neisser has suggested that perception is a cyclical activity in which(1) memory in the form of schemata guides the exploration of the environment,(2) exploration yields samples of available information, and(3) data collected from the exploration process modifies the prevailing schema.According to Neisser:“The schema assures the continuity of perception over time in two different ways.Because schemata are anticipations, they are the medium by which the past affects thefuture; information already acquired determines what will be picked up next. (This isthe underlying mechanism of memory, though that term is best restricted to cases inwhich time and a change of situation intervene between the formation of the schemaand its use.) In addition, however, some schemata are temporal in their very nature.“One can anticipate temporal patterns as well as spatial ones. (p. 22-23).” By focusing on the interaction of perception and memory, Neisser's "perceptualcycle" model offers a particularly fertile context for studying the processing of eventinformation. However, because this is a processing model rather than a model ofknowledge representation little emphasis is placed on the structure of schematizedknowledge. Thus, it is not clear how turning ignition might be related to driving homeor even what role the decomposition of events might play in generating theexpectations purportedly used to guide sampling of available information. Germaneto this issue is another that arises in relation to the proposed modification process.How does the prevailing schema change in response to the sampling process? Inparticular, what is the basis for the similarity between the ongoing situation and theschemata that are subsequently activated? These questions lead to computational considerations in how one might implementa system which can represent schemata, their similarity and their dynamic propertiesin the presence of the ongoing stimulus situation. We discuss two possibilities, thefirst which has a classical status in the event-perception literature. This hypothesisconcerning event structure, was first introduced by Schank & Ableson (28) and byMinsky, and is referred to by either "Scripts" or "Frames". This hypothesis basicallyasserts that the world can be clustered into simple categories that predict and organizethe stimulus situations according to their goals and expectancies.

572 S.J. Hanson, M. Negishi, and C. Hanson

The second approach is a competing account which we will introduce here for thefirst time in the context of a Connectionist hypothesis concerning temporalprocessing of information. Temporal control in connectionist networks wasintroduced by Jordan (15, also see 26,27), as well as Elman (6,7) who first discussedthe relationship between introducing recurrence into connectionist networks andtemporal processing. Both Jordan and Elman were interested in the relationshipbetween psychological constraints arising from phenomena involving recognition orproduction of serial order. Jordan’s models focused on Lashley’s general challenge toassociationism which requires the addition of a memory to the connectionist networkallowing context sensitive behavioral sequences. Similarly, Elman introduced amemory to the standard connectionist network in order to recognize simple grammars.

4.3 Categories and Temporal Control

In the present paper we introduce and focus on another important property ofRecurrent connectionist networks, that of the evolution and dynamic control over theadjunct memory and the construction of recognition categories ("schemata") from thestimulus situation. The difference in our focus to the Jordan-Elman focusunderscores our interest in the interaction between perception and memory. Ourexperiments and simulations our meant to elucidate the types and nature ofinteractions between a "memory" in the neural network and its dependence onevolving schemata (hidden layer) and present features of the situation. There isactually a closer relation of the present work to the general framework of Arbib whohas used the term Schema as we use it here in this constructive sense as well asHanson & Burr (11), who have emphasized the importance of representational andlearning interactions in connectionist networks. 22 subjects asked to provide judgments of event change while watching a video tapeof actors engaging in everyday events (eating in a restaurant, driving a car, working inan office.). One videotape showed two people playing a game of Monopoly and theother showed a woman in a restaurant drinking coffee and reads a newspaper.Subjects watched the videotapes under various orientations and pressed a responsebutton whenever they believed a new event was beginning. In the present study, weused responses made when subjects had been oriented toward "small" events whileviewing the tapes. This orientation produced the greatest number of perceived eventboundaries. There was high agreement between the 22 subjects on event boundaries and wechose cases where there was at least 75% agreement on the location of the event-second for an event boundary which produced 15 event-boundaries for averaging.Both EEG and fMRI were recorded simultaneously from a single subject makingevent boundaries judgments in the Restaurant tape, which was scripted each secondwith an actor who was drinking coffee and reading a newspaper in a restaurant andhad been used previously in the behavioral study. ERP and fMRI were averagedbefore and after the 15 event boundaries and are time aligned (fMRI was sampledevery 4 seconds and ERPs were taken every 1MS and then down-sampled for everysecond) and shown below 20 seconds before and 8 seconds after an event change. The ERP shows positive active areas in visual areas and temporal lobe prior to eventchange evolving to a large negative wave starting from prefrontal areas and movingback towards temporal and visual areas. Simultaneous with EEG, we recorded from

Connectionist Neuroimaging 573

the same subject fMRI, this is also shown in figure 7, again time synchronized withthe ERP averages. In the fMRI case, an arbitrary baseline was created from the first 4scans as an average and then contrasted against the other 41 scans to produce astandard t-map. These t-maps were then treated as the ERPs and averaged within awindow before and after an event change. Prior to an event change there is a largeamount of distributed activity (these are at t values where p<.1 --low threshold case)early on, which “thins out” prior to an event change and seems also to lateralizetowards the right hemisphere after an event change boundary. Areas that were significantly active in what we termed the low threshold case (tvalues where p < 0.1) produced 10-12 areas which are shown in figure 8. These areasinclude Dorsal Lateral prefrontal cortex, anterior cingulate, precuneous and temporallobe areas similar to what was seen in the ERP coherence maps. Also shown in figure8 is a “meta analysis” of labels taken from the literature where these areas have beenimplicated in different tasks. Shown are at least 3 labels that have been used tocharacterize these areas from different studies or tasks in the neuroimaging literature.

EVENT BOUNDARY

Fig. 7. ERP and fMRI showing concurrently collected and averaged activity 20 seconds beforeand 8 seconds after 15 judged event changes.

It is clear from the putative functions of these areas that it would be possible tocharacterize the event boundary judgments as a dynamical system that includesattentional , spatial orienting and detection functions (anterior cingulate, anteriorparietal), which includes areas that have been implicated in so-called “attentionalnetworks” (e.g. Posner). Such an event detection system would also necessarilyseem to include planning for schema processing and some sort of short term buffer forcomparison to well known schema that would be involved in memory and or languagecomprehension (temporal lobe). Finally, in terms of comparison and schemaprocessing, functions such as imagery and spatial modeling (parietal lobule,precuneus) would be critical to such a dynamic schema detection circuit. The most highly significant areas as indexed by the t-value was Anterior Cingulateand Dorsal Lateral Prefrontal Cortex. The subject also responded with a thumb

574 S.J. Hanson, M. Negishi, and C. Hanson

movement to indicate a behavioral response to an event change which was mostlikely associated with SMA and primary motor cortex. This potential circuit would not be apparent from the normative neuroimagingmethod which stresses single area functions rather than interactivity, distributedprocessing and systemic function. Key to the proposal in this chapter is the conceptthat neuroimaging data is subject to dynamical systems analysis and in particular thetype of sequential structure inherent in simple FSM as discussed earlier in thischapter. Without this kind of view, neuroimaging data are far too limited to revealthe likely complexity of commerce between brain areas. Neural networks haveconsiderable power and generalization scope that could be useful in analyzing andmodeling neuroimaging time series, in particular we end on a proposal to fuse timerich signals (such as EEG) and space rich signals (such as fMRI) in order to createspatio-temporal signals that commensurate and sensitive to cognitive interactivity andsystem level brain function.

Fig. 8. SPM99 analysis of the event perception task. Using low threshold values.

7HPSRUDO�/REH$QWHULRU�SDULHWDO

/HIW�2FFLSLWDO/REH

'RUVDO�/DWHUDO3UHIURQWDO

$QWHULRUFLQJXODWH

6XSSOHPHQWDU\0RWRU$UHD

)URQWDO�DUHDV

3ULPDU\�0RWRU$UHD

WKDODPXV

3DULHWDO�OREXOHSUHFXQHXV

$WWHQWLRQ�6SRWOLJKWVKXQWLQJ�WRZDUGVIURQWDO�DUHDV

3ODQQLQJ�6KRUW�WHUP

EXIIHUFKRLFH

/RQJ�WHUP6WRUDJH/DQJXDJH

9LVXDO�3DWWHUQUHFRJQLWLRQHQFRGLQJ

$WWHQWLRQ³'HWHFWLRQ´

'HFLVLRQVPDLQWDLQ

DWWHQWLRQDO�IRFXV

0RWRU�SODQQLQJ,QLWLDWLRQ�

PRGXODWLRQ�RIPYW�

,PDJHU\6SDWLDOPRGHOLQJ

$WWHQWLRQ2ULHQWLQJ�LQ

6SDFHVWLPXOXV

GLVHQJDJHPHQW

Connectionist Neuroimaging 575

4.4 Fusing EEG and fMRI for “Real Time” Cognitive Measurement

Because event perception tasks are productive in a cognitive sense and evokes lots ofCOGNITION at once: Perception, memory, sequence, grammar etc. Processes,system level and real-time processing are informative and potentially revealing. Weclaim it is important to understand the SYSTEM of brain areas that supportCOGNITIVE function as and interactive, distributed, time, dynamic structured neuralnetwork. Specifically it is well known that the reconstruction problem using ERPsis intractable due to the lack of constraints on the position and the number of dipolesgiving rise to scalp voltage potentials (ill-posed). We would constrain the ERPinverse equations with fMRI number and locations in a low threshold approach asdescribed earlier. Next one would solve the equations and iterate at that samplingrate of the fMRI image acquisition. Use the now solved for dipoles for initiation ofthe location estimation procedures from before (for example, in the kernel Gaussianmethod, the mean might be placed at the now identified dipole) and reestimate thelocations of neural activity. Now use these new estimates to reseed the ERP inverseand iterate. This kind of approach would allow the location stable measures inFMREI to be interpolated with the ERP estimators between fMRI sample points. Ineffect the fMRI would be augmented with a millisecond estimation of positionbetween every sampled image. Further imposing of temporal regularizer wouldensure that the ERP-fMRI estimator would remain smooth and stable between fMRIsamples. The successful application of this new method would constitute the firstdemonstration of real time brain imaging.

5 Some Conclusions: CONNECTIONIST Neuroimaging

We need to look for emergent properties of networks that might guide themeasurement for neuroscience. Similar to the Grammar transfer task, the kinds ofcomputations found were *METRIC* and similarity based. Neuroimaging Data canhelp constrain our modeling and provide us insights in to the complex spatio-temporaldynamical system of the brain.

References

1. Bower, G.H., Black, J.B., & Turner, T.J. (1979). Scripts in memory for text. CognitivePsychology, 11, 177-220.

2. Casey, M, The dynamics of discrete-time computation with applications to recurrentneural networks and finite state machine extraction, Neural Computation, 8, 6 pp. 1135-1178, (1996).

3. Chomsky. N, Syntactic structures. Mouton, The Hague (1957).4. Denker, D. Schwartz, B. Wittner, S. Solla, R. Howard, L. Jackel, J. Hopfield, Automatic

learning, rule extraction and generalization, Complex Systems, 1 (5), 877-922 (1987).5. Elman, J.L., E. Bates, M. Johson, A. Karmiloff-Smith, D. Parisi, K. Plunkett, Rethinking

Innatenes (MIT Cambridge, 1996).6. Elman, J.L., Finding Structures in Time, Cognitive Science, 14 (1990).7. Elman, J.L. (1988). Finding structure in time. CRL Technical Report8801. Center for

Research in Language, UCSD.

576 S.J. Hanson, M. Negishi, and C. Hanson

8. Fodor, Z. Pylyshn, Connectionism and Cognitive Architecture: A critical analysis. InPinker & Mehler (Eds.), Connections and Symbols, (MIT Cambridge, 1988).

9. Giles, B. G. Horne, T. Lin, Learning a class of large finite state machines with a recurrentneural network, Neural Networks, 8, (9), pp. 1359-1365 (1995).

10. Giles, L., Miller, C. B., Chen D. Chen, H. H. Sun G. Z., Lee, Y.C. (1992). Learning andExtracting Finite State Automata with Second-Order Recurrent Neural Networks, NeuralComputation, (In Press)

11. Hanson S. & D. Burr What connectionist model learn: learning and representation inconnectionist models, Behavioral Brain Models, 13, (3), 471 (1990).

12. Hanson, C. & Hirst, W. (1989). On the representation of events: A study of orientation,recall, and recognition, Journal of Experimental Psychology: General, 118, pp. 124-150.

13. Hanson, S.J. & Burr, D. J. (1990). What Connectionist Models Learn: Learning andRepresentation in Connectionist Networks. Behavioral and Brain Sciences, 13, 3 pp. 477-518.

14. Harnad, S. The Symbol Grounding Problem , Physica D 42: 335-346, (1990).15. Jordan, Serial Order: A parallel distributed processing approach, ICS

Technical Report (UCSD, 1986).16. Marcus G., S. Viyayan, P. Bandi Rao, M. Vishton, Science. 283, (1999).17. Medin, D.L., & Schaffer, M.M. (1978). Context theory of classification learning. Psychol.

Review, 85, 207-238.18. Miller, G.A. and M. Stein, Grammarama I: Preliminary studies and analysis of protocols.

Technical Report No. CS-2, Cambridge: Harvard University, CCS (1963).19. Newtson, D. (1973). Attribution and the unit of perception of ongoing behavior. Journal of

Personality and Social Psychology, 28, 28-38.20. Pinker, S. Enhanced: Out of the Minds of Babes, Science, 283, (1), 40-41, (1999).21. Pinker, S. How the Mind works. (W.W. Norton & Co. 1997).22. Pinker, S. The Language Instinct (Morrow & Co. 1994).23. Reber, A. Implicit learning of artificial grammars. Journal of Verbal Learning and Verbal

Behavior, 6, 855-863 (1967).24. Reber, A. Transfer of syntactic structure in synthetic languages. Journal of Experimental

Psychology, 81, 115-119, (1969).25. Redington, J and N. Chater, Transfer in Artificial Grammar Learning: A reevaluation, J.

Exp. Psych: General, 125: 2, pp. 123-138. (1996).26. Rumelhart, D., G. Hinton, and R. J. Williams, Learning representations by back-

propagating errors, Nature, 323, 9, (1986).27. Rumelhart, D., Hinton, G. & Williams, R. (1986). Learning internal representations by

error propagation. In D.E. Rumelhart D. and J.L. McClelland (Eds.) Parallel DistributedProcessing I: Foundations. Cambridge, Mass: MIT Press.

28. Schank, R.C. (1982).Dynamic Memory: A theory of reminding and learning in computersand people. Cambridge: Cambridge University Press

29. Servan-Schreiber D., Cleeremans, A. & McClelland, J. (1988). Encoding sequentialstructure in simple recurrent networks. CMU Technical Report CS-88-183.

30. Special Issue on Cognitive Neuroscience, Science, 275, 1580-1608 (1997).31. Watrous, R. & Kuhn G. (1992). Induction of Finite -State Languages Using Second -Order

Recurrent Networks, Neural Computation,32. Hanson & Bly, 2000, The distribution of BOLD susceptibility in the Brain is Non-

Gaussian33. Murthy, Bly & Hanson, 1999, Identification of the fMRI Signal, Cognitive Neuroscience

Society.