NOTE TO USERS UMI - TSpace

187
NOTE TO USERS The original manuscript received by UMI contains pages with slanted print. Pages were microfilmed as received. This reproduction is the best copy available UMI

Transcript of NOTE TO USERS UMI - TSpace

NOTE TO USERS

The original manuscript received by UMI contains pages with slanted print. Pages were microfilmed as received.

This reproduction is the best copy available

UMI

INTUITlVE REASONING AND THE ENHANCED NOVELTY FLTER

David Yeo

A thesis submitted in confonnity with the requirernents for the degree of Doctor of Phiiosophy

Graduate Department of Hurnan Developrnent and Applied Psychology Ontario hstitute for Studies in Education of the

University of Tbronto

O Copyright by David Yeo (1997)

National Library Bibliothéque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services services bibliographiques

395 Wellington Street 395, fue Wellington OîtawaON KlAON4 ûttawaON K I A O W Canada Canada

The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in rnicroform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantid extracts fkom it may be printed or otherwise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur fomat électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thése ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

V REAlJONING AND THE E N W C E D NOVELTY FILTER

David Yeo Doctor of Philosophy

Department of E-Iumm Development and Appîied Psychology Ontario Institute for Studies in Education of the

University of Toronto ( 1997)

Funetional enhancement makes it possible to introduce noniinearity into a

singe-layer comectionist network. in this thesis 1 derive a fundamentaily new

class of learning model by integrating functional enhancement with a variant

of Teuvo Kohonen's seminal m i t y mer learning algorithm. It is argued that

the properties which define the resulting enhnnced riotreltz~ plter learning model

(i.e. habituation. feedback. interaction). in-and-of-themsehres. pmve sufficient

to seme as the foundation for a plausible model of cognition. A number of

tests of this contention are offered, the examples forrning a continuum -

schemata arising h m the interplay of habituation. feedback, and interaction

ground concepts which. through recursion. spawn rational inference.

I wish to thank Dr. Peter Lindsay, m y supenrisor. and more important& my &end. who with patience, wisdom, tolerance, and that perfect blend of freedom and control. shepherded me though both my masters and doctorate degrees. 1

also would U e to thank Dr. Keith Oatley who &st made me beïieve that 1 had created something of worth. and Dr. Geoff Hinton who made Keith's belief in me a reaiity by turning a sow's ear into a silk purse. And to my dear men& Dr. Donald Hutcheon and Dr. Rhona Charron: all 1 can Say is that your encouragement and. in particuiar. your own success provided the inspiration I

needed to keep me from giving up. Finaiîy to Claudia. my wife. Here words truly fail me. Not only do p u cushion the blow of my failures, you enhance the joy of my successes. Oniy once have 1 heard anything vaguely approaching what 1 would lüre to Say to you. So with apologies to W. K. Auden:

Wou are] TNJ N o m my South. my East and West

My workitu~ week and my Swlday rest ...

Table of Contents

Chapter 1 : Introduction

1 - 1 Constraint Satisfaction ........................................................ I 1.1.1 Dynamic Schemata .................................................... 2 1.1.2 Intuitive Inference ...................................... .. ................ 2

.... ................. 1.1 -3 Rational Inference ........................... .,, .... 3 1.2 Main Thesis ......................................................................... 4

.................................. .......................... 1.3ComplexSystems .. 5 1.3.1 The Purpose of Feedback ............................................... 6

.............................................................. 1 .4 Intuitive Reasoning 7 ................................................ 1.5 M e w ......................... .. 8

Chapter 2: Philosophical Foundations

2.1 The Ghost in the Machine .................................................... 10 2.1.1 The Wittgensteinfan Paradox ......................................... 11 2.1.2 Meaning from Meaninglessness ...................................... 13

................... 2.2 Associationism ,.. .......................................... 15 ................... ....................... 2.2.1 The Riddle of Induction ....... 16

2.2.1.1 The Paradox of the Raven ....................................... 19 .......................................... 2.2.2TheNewRiddleoflnduction 20

Chapter 3: Psychological Foundations

3.1 Behaviourism ..................................................................... 3.1 .1 The Reflexological Model ............................................... 3.1 . 2 Interbehavioufisrn ........................................................

................. 3.1.3 Unifying Operant and Classical Conditioning 3.1.4 Purpose in Be haviour Theory ......................................... 3.1.5 Is Behavlourism Dead ........................... ,. ....................

............................................................... 3.2 Gestalt Psychology 3.3 'Ihe Janus Principle ..............................................................

.............................................. 3.4 Neumphysiological Psych010gy 3.4.1 Cell Assemblies ............................................................. 3.4.2 The NeurophysioIogy of Memory ..................................

Chapter 4: Mathematical Foundations

4.1 Pattern Mathematics .......................................................... 44 4.2 Measuring Pattern Similarity ..................... .. ........................ 45 4.3 The Mathe~llittics of Pattern Association ................................ 47 4.4 Orthogonal Projections ......................................................... 48 4.5 Phi Functions .................... ... ....... .. ................................ 50

4.5.1 0 ptimal Phi Solutions .................................................. 51

Chapter 5: The Leamhg Mode1

................... ........................ 5.1 The McWoch-Pitts Neuron ..,,

5.2 nie Perceptron Convergence Theorem .................................. 5.3 COmlation Matrix Mernories ................... .,. ......................

5.3.1 Complete Correlation Matrix Memory ............................ 5.3.2 ïncomplete Correlation M a l r k Memory ........................ .. 5.3.3 Autoassociative Comlation Matrix Memory ...................

5.4 The Novelty Detector ....................... ... ............................. ................................... .................... 5.5 The Novelly Filter ...

5 -6 The Enhanced Novelty Filter ................................................. 5.6.1 The Capacity of the ENF ................................... ..... ...... 5.6.2 The ENF's Ability to Generaiise .....................................

5.6.2.1 The Leave -K Out Method of Cross-Validation .......... 5.6.2.2 The Jets and Sharks .................. ..,,,. ............

Chapter 6: The kaming Tasks

........................................................ 6.1 The Pafity W o Problem 6.1.1 The Training Set ........................................................

................................................................. 6.1.2 The Test Set 6.1.3 Resuîts ........................................................................

6.2 Blocking .............................. ... ....................................... 6.2.1 The Tratning Set ...................................................... 6.2.2 The Test Set .................................................................. 6.2.3 Results ................... .. ...................................................

6.3 Schemata ............................................................................ 6.3.1 The R o m Schema Problem ...........................................

6.3.1.1 n i e Training Set .................................................... ....................................................... 6.3.1.2 The Test Set

6.3.1.2.1 Room ?Srpe from Attributes ............................. 6.3.1 . 2.2 Attribu tes from Room 'ISrpe .............................

6.3.1.3 Resuits ................................................................. 6.3.1.3.1 Room Type from Attributes ............................. 6.3.1.3.2 Attributes from Roorn 'Z4pe ...................... ... ....

6.4 Concept Formation .............................................................. 6.4.1 The ?Y: Problem ............................................................

6.4.1.1 The ?Iaining Set .................................................... 100 6.4.1.2TheTestSet .......................................................... 101 6.4.1.3 Results ................................................................. 101

6.4.2 The Complex Discrimination Problem ...................... ... .... ..... 103 6.4.2.1 The Training Set ..................................................... 105 6.4.2.2 The lkst Set ............................................................... 107

...................................................................... 6.4.2.3 Results 107

........................................... ....*..... .. 6.5 Intuitive Reasoning .. .. 109 ...................... 6.5.1 The Cornpite Symbol Structure Problem 109

................... 6.5.1.1 BoltzCONS ............... ....................... 110 ............................................. 6.5.1.1.1 The Training Set 111

6.5.1.1.2TheTestSet ................................................... 112 ..................................................... 6.5.1.1.3 Results 113

6.5.2 The Tic-Tfac-Toe Problern ................... .. .................... 115 6.5.2.1 TheTraulingSet ............. .... ............................ 115

........................................................... 6.5.2.2The Test Set 121 6.5.2.3 Results ................................................................. 121

Chapter 7: Conclusions

....................................... 7.1 Main Thesis Revisited .... . . . 123 7.2 Generating Temporaï Sequences .................. .. .......~............ 126

................... ........................... 7.3 The Problem of Scallng ...... 129 ...................... 7.4 Consciousness ........................ ... 131

...................................................................... 7.5 Contributions 134

Appendices

A . Training set used in the tic-tac-toe problem ............................ 137 B . Estimated Tic-Tac-Tm Responses ................... .. ................... 139

........................... .. ......*............. C . ENF Source Code Listing .... .. 141 ............................... D . The ENF as a Mode1 of Habituation ... ..... 151

........................................ E . Correspondence with Dr . Kohonen 155

URÇIerstanding the mfnd may mt be as inbimte as o w vanity hoped or our

intekt feared (R. R. Lin&)

introduction

in this thesis 1 propose a new connectionist leaming model which 1 cal1 the enhanced novelty fllter (ENF). after 2uvo Kohonen's ( 1984) elegant novelty Blter algorithm. upon which it is based. 'Ihe enhanced novelty filter is. quite literaily. a mathernatioal hobgmn Like its optical kin. the proposed learning model c m store many patterns in the same physical representation. And like an optical hologram. once a pattern has k e n imprinted (Le. leanied). it can be recalled fYom a mere fragment of the original.

1.1 Constraint Satisfaction

The enhanced novelty filter is a member of a large and important class of connec tionis t learning moàels known as constraint satisfaction (CS) networks (Hinton . 1977). Under the constraint satisfaction in terpretation. each neuron is a hypothesis. and each connection a constraint among hypotheses:

Thus. for example. if whenever hypothesis A is true, hypothesis B is usually m e . we would have a positive connection nom unit A to unit B. If. on the other hand. hypothesis A prwides evidence against hypothesis B. we would

have a negative connection from unit A to unit B. ... nie goal' is to ilnd a

solution in which as many of the most important? constraints are satisfied

as possible. (Rumehart and McClelland. 1989. p. 50)

' Notice that constraint satisfaction m e s cornpiete with i$ own usense of purpose" (Le. teleology) in that its goals are to: 1) find a solution and 2) maximise aie nurnber of satisfii constraints. ' The importance of aconstraint is refleded in the strength of the connection representing the constraint. I a constraht s very important, then its connection strength parameter (Le. weight vslue) is large. Less important constraints involve smaller weights.

Unlike the hard constraints of symbolic logic. in constra.int satisfaction the

constraints are 'soft'.' This enables a constraint satisfaction network to select

nom conflicting information in finding plausible interpretations of a situation. Because the addition of new constraints rnay force the repeaï of formerly valid inferences cons traint satisfaction networks are inherently nonmono tonic. And

nonmonotonicity is the rule in human common sense reasoning (Reiter. 1987).

Dvnamic Schemata

Soft constraints generate schemata which =emerge at the moment they are

needed from the interaction of large numbers of much simpler elements." (McCleiiand and Rumelhart. 1986. p. 20). Thus schemata are not things, as such. Rather they. and the laiowledge they ernbody. are dynarnicaly created by the vev environment they try to interpret [McClelland and Rumelhart. 1986). in short. knowIedge is seen to enmgelrom the interplay of ùnpinging wmiminB. Moreover. schemata can ernbed. That is. at a stmcturaî level, a schema can be seen a s a kind of tree structure in which subschemata (smali configurations of units which cohere and which may be part of many different stable patterns) correspond to subtrees (McCleUand and Rumelhart. 1986).

intuitive Inference

To process a schema is to determine which mode1 of the world best fits the current infornation (McClelland and Rwehart . 1986). First. the state of a subset of the units is Bxed to represent the initiai premises. The larger the activation value arriving at a given input junction. the greater the evidence for or against its associated premise. ?he system is then aiiowed to iterate until a conclusion (given by another subset of units) stabilises.'

' Whereas hard constraints m u t be satistied, soft constraints are desiderata that O u g h t to be satisfied. ' To distinguish it from the more famiri prucess of 'rational inference'. Hinton (1990) d i s this method of perfoning inferences by a single settling of a constra.int satisfaction network 'intuitive inference'.

In essence. the input (stimulus) patterns serve as 'keys* which 'unlock'

behaviour (response) patterns. This is. of course. by no means a new idea. Not only was it a defining characteristic of Aristotle's associationist doctrine. when couched in terms of 'releaser stimuli' and 'fixed action patterns' it also reveals itself to be the central tenet of modem ethology (e.g. Lorenz. 1970).

1.1.3 Rational Inference

in its most popular guise. rationai inference is a series of axiorns chained together by mles of inference into coherent propositions. lnherent th& view

is Me mspoken pranise that a sïgn@mt m u n t of thought is deductiue

(McDermott. 1985). However. this model of rational thought has become increasingiy suspect. Important questions continue to aise. questions which the predicate calculus based model of mind increasinghl appears incapable of answering. To give but three examples:

- How is satisfiabilitf ensured when new axioms are added?

- How are 'appropriate* axioms selected? - Who does the selecting?

The last question is. perhaps. the most fundamental of all. For if rationality is inherent in the agent doing the seiecting what. then. is the b a i s of the

agent's rationaliw Presumably the selec tion is also rationally determined. thereby forcing an infinite regress of homuncuii. "ortunately. Hinton (1 990)

has suggested a way out:

Poundstone (1988) writes: 'if one holds beliefs that are mntradictory, then one cannot have justification for al least some of those beliefs. Wahout justification there is no knowledge. Therefore, understanding aset of beliefs entails (rd a minimum) being able to detect amtradiction in those beliefs. For that reason. the probiern of detecting paradox. SATISFIABILilY, is a delimiter of knowledge." (p. 181 ). ' Dennett (1978) suggested that a way out of this pardox is for each homunculus to be more stupid than the last. Uitimately only simple switching deasions ae required. But Dennett failed to draw the seemingly. obvious canclusion that his binary homunculi are functionally indistinguishable from (simplifieci) fleurons.

More complex inferences require a more serial approach in which parts of

the network are used for perforzning several dinerent intuitive inferences in sequence. This will be c d e d 'rational inference". (p. 50)

This view of rationai inference as a series of intuitive inferences offers one of the f ~ s t reaï alternatives to the traditionai rationalist doctrine. Unlike the rigid system of deductions which define the standard view. Hinton's model is founded on the more fluid process of induction. Thus. not only does the model eliminate much of the brittleness' typicaiiy associated with rule-based models. it smoothfg integrates leaming fnto the rmsonïngprocess.

1.2 Main Thesis

Foliowing Sir Karl Popper's (1963) farnous dictum that one should always state one's hypothesis in its boldest form. rny thesis asserts that:

TRe ENF embodies three principks which. inand-of-thanselues, proue s@tcht to senie as the foundation for a plausible d e l of cognition

An important corollaxy to this thesis is that:

Aiüwwh cognition may appeur to be based on ruies ami mtiorial ûtfeeme. it is actualIy based on constraint satisfuction and intuitSe infeenm.

This thesis must be defended on two fronts. First. the folk-psychologica17

notion of mle-based rationaïity must be shown to be inadequate. And second. it must be demonstrated that the proposed intuitive inference based model of cognition offers a more plausible alternative.

'The somewhat pejorative terni Vdk psychology' refers to the set of concepts, generalisations. and iules of thumb organisms use to explain and predict behaviour. Typicaiiy these explanations view behaviour as the outcorne of the organism's beliefs, desires, sensations, and so forth.

Page 5

It is with respect to this second front that the present work hopes to make

its key contribution: for it is argued that the ENF provides the basic formallsm upon which a plausible theoy of intuitive inference based cognition fs inexorably founded. In particular, it is contended that the p ~ c i p l e s which serve to d e h e the enhanced novelty filter leaming model: novelty detection (Le. habituation), feedback, and interaction, constitu te the fundamental mechanisms which underlie cognition.' As wili presently be demonstrated.

these three mechanisms aione prare sufficient to generate key psychologicai phenomena selected fYom various points in the cognitive spectxum. That is. phenornena as diverse as positionai invariance and rational thought are seen to emerge from the interplay of these three key factors.

1.3 Com~lex Svs terns

Systems thinking requires that one consider not only parts and processes in isolation but also the order and organisation resulting from the interaction of

those parts. This view contrasts sharply with scientific reductionism which

maintains that an entity can be resolved into and. hence. reconstituted from its parts. The problem. systerns theorist Ludwig von Bertalanffy ( 1968) argues. lies in our continued acceptance of two increasingly suspect assumptions:"

- that interactions between a system's parts is weak enough to be neglected - that the relations describing the behaviour of the parts is linear.

In short. the whole is alrnost always more than the sum of its parts. h i t less mystically. the characteristics of the cornplex, as compared to those of its elements. appear as 'new' or 'ernergent' (von Bertalanffy. 1968).

' This is not to say that these three principleç are the onl y mechanisrns underlying cognition. tt 6 entirely possible. indeed probable. that a more complete model of cognition will extend these core principies. ' Wth the discovery of Heisenberg's two famus uncertainty principles even physics, the acknowledged bastion of reductionism. found it difficult to resolve phenomena into local events (von Bertalanffy, 1968).

Take water. for example. Although the behaviour of a single water molecule is goven~ed by the generaiiy weil-understood equations of atomic physics. when billions of water molecules are put together they coliectively acquire a property. iiquidiQc that none of them alone possess. Liquidity is an emergent property. And emergent properties typicaiiy produce emergent behaviour (Waldrop. 1992).

Perhaps the most obvious example is life itself: 'Neither nucleotides nor amino acids nor any other carbon-chain molecule is alive - yet put them together in the right way. and the dymmic behaviour oiat emerges out of theif intemcüom is

what we ail ue." (Langton. 1989. p. 41. emphasis added).

The P u m e of Feedback

ln a closed systemLo the final state of the system is unequivocaily

determined by its initial conditions (von Bertalad@. 1968). Thus. the system's fmal state changes if its initial conditions are altered. But organisms are open systems. which enables them to maintain steady states distinct fiom their natural state of thermodynamic equilibrium. By monitoring deviations kom the state to be obtained or rnaintained, the same finai state can often be

achieved in many dinerent ways. and from different initial conditions. I t is feedback uihfdi gSes beiuwbw its t e h m qppearance (Wiener. 1948). But

although behaviour clearly demonstrates purpose both in its persistence in the face of obstacles. and in its flexibiuty in assuming a variety of means to obtain parücular ends i t need not follow that. in order for an organism to perform inteugibly or adequately. it must consider the factors of a situation in relation to the goals it seeks (Zurin. 1985). Organisms behave as they do. not because of the consequences which are to follow. but because of the consequences

which have foîlowed simiïar behaviour in past (Skinner. 1953).IL in short, purpose derhres nom behaviour, not the other way round!

'O In a dosed systern, no material (or information) enters or baves. l ' On this Mew reinforcement and punishrnent are instruments. rather than outcornes. of teleology.

Page 7

1.4 intuitive Reasoning

Teleological explanations ofken do appear to lead to useful simplifications which enable us to predict the behaviour of others ( D e ~ e t t , 1978). Consider. for instance. predicting mwes in chess. In chess there are many perfectly legal m m . only a few of which have merit with respect to the goals of winning pieces. establishing a strong defence and ultimately checkmate. By adopting a teleological perspective. one can greatly reduce the number of options which must be considered (Dennett. 1978). But is this how chess is actually played? Numemus experiments (e.g. de Groot. 1965: Simon and Chase, 1973a) suggest it is not. Indeed. Simon and Chase (1973b) are quite specific on this point:

Chess protocols are BUed with statements like, 'If 1 take him. then he takes

that piece. then 1 go there ... " and so on. De Groot showed that the structure of a player's thought processes while he is doing this are the same for al1 lwels of chess skill. It is the meru3 of thought. not the smcture of thought. that reaiiy makes the Merence in quaiity of outcorne. (p. 420)

And the contents of thought. Simon and Chase continue. are govemed by the

ability to perceive famiiiar patterns quickly [emphasis added]:

Why. as has often been observed. does the master so frequently hit upon good moves before he has even analysed the consequences of various alternatives? Because. we conjecture. when he stares at the chesboard. the familiar pwptuat slrurhues that are evoked jivm long-term memoy by

the patterns on the boanl act as m u e generntors. l2 . . . It is this organisation

of stored information that pennits the master to corne up with good moves almost instantaneously. seemingly by instinct and intuition. (p. 42 1)

'' Each fannii pattern smes aç the condition part of a production. When this condition is satisfied by recognition of the pattern, the resulting action is to evoke a move assodated with this pattern.

To generate a forward search of the problem space the implied m m is made 'in the mind's eye". triggering an iteration of the pattern perception system. That is. the intemal representation of a position is updated, and the result passed back to the pattern perception system which. in turn. suggests hrther mwes. As the attentive reader will no doubt have noticed, this view of search through problem space as the iteration of patterns is indisthguishable fiom the process of serial intuitive inference desai id in 1.1.2. For each -deducedW move is me=@ the pmduct of an intuitive inference in whfch the current state acts as a key. and a sequence of m m becomes sirnply a series of intuitive inferences in which each regenerated state keys its successor. nius under this

interpretation deduction i s f w i c t i o ~ iridtstinguïshuble fnrn serial induction. l 3

Like cognitive science itself. this thesis endernom to integrate often diverse theoretical perspectives into a coherent mode1 of cognition. In particular. it seeks to weave together threads dram from philosophy (chapter 2). psychology (chapter 3). and mathematics (chapter 4) into a plausible account of the fabric of thought. From philosophy comes the seminai tenets of associationism. Not only does associationism give rise to modem connectionism (Demett. 1996)".

it's in the fertile gulf between recall and recollection that reasoning by association is spawned. From psychoIo@ comes an awareness of the limitations of simple association. And tkom mathematics derives the formaiism which makes possible dynamic exploration of the implications of these perceived strengths and wealaiesses.

la No! only a~ the individual "engramsW which are enaided in a consbaint satisfaction network spawned through induction (by simple enurneration), they are aise re<xilkted tbrough induction. ' * As Dennett (1 996) put it : "Horne's associationism was, however, a d i r a inspiration for Pavlov's f mous experiments in the conditioning of mhd behaviour, which i d in tum to the somewhat different theories of E. L. Thorndike, Skinner, and the other behaviourists m psychofogy. Some of these researchers - Donald Heôb, in paitiailar - attempted to Ynk ttieir behaviourism more dosely to what was then known about the brain. ... These medianisms - now caîîd Hebbian leaming ruies - and their descendants are the engines of change in connectionism. the Iatest manifestation of this Iraâition." (p 87)

Chapter 5 formalises the proposed ENF learning model. Although the ENF

actuaiiy describes an entire class of learning models. this work explores the set of nonlinear mappings generated through synap tic (Le. input) interaction.

n i e experiments (chapter 6) were selected to form a progression of cognitive tasks. The &We-or and m - t h r e e problems reflect the ENFs nonlinear mapping power and, in parücular. its synaptic selectivity. These experirnents provide important tests of the model's core leaming capabiiities. However it is the remaining tasks which most vividly highlight the proposed learning modeî's potentiai as a foundation of higher cognition. Specifically. the m m sdzema

problem investigates the modeï's abiiity to generate and reason with coherent structures (Le. schemata). the TC and oomplew diScranination problems explore the model's concept fonnation abiiity. and the c~posite symbol structure and tic-tac-toe problems explore its plausibility as a basis of rational th~ught!'~

'VWhough some may wish to argue that tic-tac-toe represents a somewtrat trivial challenge for rationai thought, s McClelland and Rumeihai.l(l986) point out: "Other more cornplex games, such zs checkers or chess require more effort, of course, but can, in prinaple, be deait with in the same way." (p. 40).

2. Phiioso~hical Foundations

2.1 The Ghost in the Machine

Before 1 achance the philosophical arguments which tend support to the proposed intuitive inference based mode1 of cognition. 1 wiii begin with a brief

examination of the fallacies of functionalisml - its main contender.

Implicit in the functionalist view is the unspoken premise that thought is deductive [McDennott. 1987). Indeed. deduction lies at the vev heart of the

functionalist's central tenet - intentiondity:

The assumption that something is an intentional system is the assumption that it is rational: that is. one gets nowhere with the assumption that

entity x has beiiefs p. q, r. ... uniess one aiso believes what foiiows from p. q. r. .. .: otherwfse there is no way of ruling out the prediction that x wW. in

the face of its beliefs p, q. r. ... do something utterly stupid. and. if we cannot mle out that prediction. we will have acquired no predictive power at ail. So whether or not [an organism] is said to beüew the hths of logic. it must be supposed to follow the ncles of logic. [Demett. 1978. pp. 10- 1 1)

In fact. this assumption of rationality is so M y entrenched in our inference habits that. when Our predictions of another's behaviour prove false. we tend to assume that the individuai simpiy did not understand. or that they just don't b o w any better (Dennett. 1978).

' Sirnpiy put, fundionaiism (Putnam. 1964) 5 the notion mat the cornputer is an apt rnodel of the mind. tts central tenet is th& thinking is a matter of 'software' rather than hardware'. Functiorialism maintaéns that the mental structures of folkgsychdcgy: beliefs, desires, and so forth, are fmaiised in a language of thoughf (Fodor, 1975) and operated on by Boole's (1 854) Iaws of thought.

Page 11

Herein lies the paradox of functionalisrn - one must assume rCLti0naü-J in

order to explain it Put another way. if whenwer an agent does anything intelligent the act must be preceded by another act of considering a regulative proposition appropriate to the task. what makes one consider the appmpriate madm rather than any of the thousands which are not (Ryle. 1949).

The crucial objection to the intellectualist legend is ais. The consideration of propositions is itself an operation the execution of which can be more or Iess intelligent. l e s or more stupid. But if. for any operation to be inteliigently executed, a prior theoretical operation had first to be perfomed and performed inteliigently. it would be a Logical impossibüity for anyone to ever break into the circle. (Ryle. 1949, p. 3 1)

That is. if a pefformance inherits its title to inteliigence from some anterior act of planning what to do. one is led to the pertlous precipice of infinite regress. A shrewd plan must itself be a product of astute planning. The intellectuaüst legend mis takenly defines intelligence in terms of the apprehension of txu ths rather than the apprehension of tmths in terms of intelligence (Ryle. 1949).

The essence of this mistake is found in the distinction between 'howing how' and 'knowing that'. Stupidity is not the same thing as ignorance!

2.1.1 The Wittéensteinian Paradox

Not oniy is there no ghost selecüng appropriate mles and monitoring their execution. the notion of rule following. itself. ultimately leads to paradox.'

Known as the Wittgensteinian paradox. this argument rnakes the astonishing clairn that no mwse of action muid be cietenniRed by a rule. bemuse emy wurse of adion cwt be made out to acwrril wfth tfie mie (Wittgenstein. 1958).

Although Ryle (1949) exposed the paradox in the functionaiist contention that in order for one to âd intelligently a prier theoretid operation has first to be performed, and perfomed intelligently, surprisingly he did not reject the concomitant tenet that intelligent behaviour is a function of the observance of rules.

Page 12

As an example. suppose someone is asked to calculate the s u m 68 + 57. If

they mspond with '5'. we would nahtrally assume that they had emd. But. Wittgenstein argues. it is also possible they are right. and that we are in emr: for if ail of our past additions invohred numbers smaiier than (in this case) 57.

it may be that when we thought we were using 'plus' (symbolised by '+'). we were in fact using 'quus' (symboiised by 'Y). defined as:

x f y = x + y if x and y are both less than 57.

= 5 otherwise.

Of course. most people would answer '125' when asked to sum 68 and 57

without any serious thought to the possibility that the 'quus' rule might be appropriate. If asked why. they would likely Say they added 8 and 7 to get 15.

put down the 5. carried the 1. and so on. But this justification. too. is open to the skeptic's challenge: for the skeptic might contend that when they used the term 'caxq? they actuaily meant 'quany". etc. Nor is the issue resohed even by counting individual items. For the skeptic might claim that when we use the term 'countw we really mem 'quountw, when we use 'individual" we mean 'quindividuai-, and so on. The entire point of the skeptical argument". Kripke (1982) notes, 'is that ultimately we reach a level where we act without any reason in terms of which we can j u s w Our action." (p. 87). Morewer. since thinking that we are obeying a nile is not the same thing as obeying it. Wittgenstein (1953) maintains that it is impossible to obey a ni le 'privately'. It is only when we introduce the normative standards of others in the community that we have the necessary conditions for attributing correct or incorrect rule following to an individual (Kripke. 1982). Once again the impiicit assumption of rationality plays a key role. For if the deviant responses of an individual do not accord with those of the community in enough cases. the individual may even be judged to be mad. Le. foliowing no coherent rule at ail.

Page 13

Like a phoenix. the ghost in the machine keeps rising bom its ashes. Ln its

most recent incarnation the ghost has materialised inside a Chinese room:

To me Chinese writing looks like so many meaningless squiggles. Now

suppose 1 am placed in a room containing baskets full of Chinese symbols. Suppose also that 1 am given a mle book in English for matching Chinese syrnbols with other Chinese symbols. ... Imagine that people outside the room who understand Chinese hand in smaU bunches of symbols and that

in response 1 manipulate the symbols according to the rule book and hand back more srnaJi bunches of syrnbols. ... 1 satisw the Turing test for understanding Chinese. AU the same. 1 am totally ignorant of Chuiese. . . . What goes for Chinese goes for other fomis of cognition as weli. Just manipulating the symbois is not by itself enough to guarantee cognition. perception. understanding, thinking, and so forth. (Searle. 1990. p. 26)

This 'Chinese room' arguent. as Searle's parable is known. has become the torchbearer of the anti-fomalist movement. However. in order for the parable

to be valid, the degree of detail in a model must be deemed irrele~ant.~ That is.

Searle is forced to the untenable position that even if a model perfectry rnimics brain function. it would stiil fail to yield 'me' meaning (Le. cognition). If

simpiy adding more detail to a model is sufficient to generate cognition. Searle's argument is immediately defeated. In this way. Searle's Chinese room parable reveals itself to be just another incarnation of 'substance d~alisrn*.~

This assumption m exposeci by Pyfyshyn when he asked Searle if the following staternent acairadely charader&& his position: ' I : more mû m m of the œ/k h ywr bah w m to be replamû by integrated cimit chips, plogramrned h such a way tu keep the input-oufput fundion of each unit identical to that of the unit behg replaced, you would h iikeiihood jurt keep on speakhg exa* csyou ae now doing except that you warM just stop meaning anything by it. " (in Hofstadter and Dennett , 1 981, p. 374) "Substance (or Cartesian) dualism is the doctrine that there exists two kinds of substance: rnatetiai-stuff and mind-stuff. Mathai-stuff occupies space, is oôsewabie, and is govemed by mechanid la-. Mind- stuff is the negation of these qudities (i.8. does not occupy space, is not observable, etc.)

Page 14

Searle's parable does. howwer. raise an important question - how is symbol

meaning grounded in something other than just more meaningless symbols? A

key tenet of the fùnctionabt doctrine is that the semantic content of a representation is a function of the semantic content of its syntactic parts in combination with its constituent structure (Fodor and Pylyshyn. 1988). Yet

despite this reliance on symbols. it is apparent that the standard functionalist account of how symbols obtain is woefuliy inadequate:

n ie standard reply of the syrnboiist (e.g. Fodor) is that the rneaning of the symbols cornes h m connecting the symbol system to the world 'in the right way. But it seems apparent that the problem of connecting up with the world in the right way is vimially coextensive with the problem of cognition itself. (Harnad. 1991. p. 340)

To argue that symbol meaning is grounded in yet more symbols again leads to infmite regress. It foliows. therefore. that symbols are grounded in either non-symbolic or subsyrnbolic representations. Ln short. a symbol ob tains its meaning by reason of its relationship. association. or even its accidental resemblance to something else (Hume. 1748). More precisely. symbol meaning

is grounded in d i s c n . . . ' n (Hofstadter. 1985: Harnad. 1991). in that it

involves both the abiîity to tell things apart (i.e. iconic representations) and to

discern their relative degree of similarity (Le. categorical representati~ns).~

' Neither iconic nor catworical representations can yet be interpreted as rneaning anything; for to overcome the systematiaty objection it must further be possible to combine these representations into propositions which can be semanh'caily interpreted (Harnad, 1991). That is, once one has agraunded set of elementary symbols, the rest of the symbol strings of a naturai language can be generated by symbol composition alone (Hamacf, 199 1 ). The iamic and categorical representations rnerdy give content to the objects they identify. content which is inherited by the composite symbol strings. To bonow Hamad's exampie, ii the name 'horse' is grounded by iconic and cabgorical representations that reliably disaiminate and identify horsles on the bas& of their sensory projections, and if the temi 'st-s' is simiiariy grounded, the category 'zebra' cm be constituted out of these elernentary categories by means of the symbolic description: 2ebra8 = 'horse' & 'sûipes'. Amed only with this representation, one could, in prindple, identify a zebra on first acquaintance. h this way the semantic interpretation is fixecl %y the behavioural capacity of the dedicated symbal system, a; exercised on the abjects and state of the world to which its symbols refer; the symboî meanings ae amdingîy not just parasitic on the meanings in the head of the interpreter. but intrinsic to the dedicated symbol systern itself." (Harnad, 1991. p. 345)

Page 15

While we often use the tenns remembering and recoiiection interchangeably in modern parlance. to Aristotle (ca. 400 B.C.) they were far frorn equivalent. Whereas remembering fers to the direct recail of sense images. recoliection

. . was seen to be. in essence. re~sonuig by assoantion:

Acts of recollection occur when one impulse naturaily succeeds another . . . . This is why we folIow the traii in order. starting in thought from the present. or some other concept. and from something similar or contrary to. or closely comected with. what we seek. This is how recoilection takes place; for the impulses fkom these experiences are sometimes identical and sornetimes siniultaneous with those of what we seek. and sometimes forni a part of them; so that the rernaining portion whïch we experienced after that

is relatively small. (Aristotle. On Memory and Recollection. pp. 30 1 -303)

In short. Aristotle realised that recoilection is neither the acquisition nor the recovery of mernosr. Rather. recollection was seen to be a dynamic process - a synthesis. Aristotle's point was reiterated by William James ( 1890):

And it is an assumption made by many miters that the revival of an image is all that is needed to constitute the mernosr of the original occurrence. But such a revival is obviously not a memory. whatever else it may be: it is simply a duplicate. a second event having absolutely no connection with the

first event except that it happens to resemble it. . . . What memory goes with is. on the contrary. a very complex representation. that of the fact to be recalled p h its associates. the whole forming one 'object* . . . . (pp. 6 10-6 12)

' Assmationism (Mstotle). whichmaintai'ns that the mind is made up of simple ideas arising from sensory experience held together by contiguity and sirnilarity, is often oonfused with connectionism (Thomdike), the doctrine that neural bonds are functional mediators between stimulus and response.

Page 16

Aristotle went on to describe the mental 'glue' that unites sense images into a coherent whole. identifying succession (i.e. temporal contact). simultaneity (i.e. spatiai contact]. and similarity7 as the basic psychological forces that bind

associations together. And although Aristotle's mode1 is somewhat lacking in detaiï by today's standards, as Anderson et. al (1990) note. it continues to be a perfecüy viable computational theory of memory:

If one was to s m m m i s e Aristotle's discussion fkom a contemporw neural network point of view. it might look something Iüce this: The elementary units of memory are closely related to sense images (state vectors?). These elementq units are linked together by a nurnber of mechanisms (connection matrices?). It is possible to systematically use these associative structures to perform reasoning and menmry access by forming chains and more complex structures built nom elementary associations (semantic ne tworks?). (P. 3)

But as is often the case. the devil is in the details. It was David Hume who Brst discoverrd this devü: for it was Hume who first chalienged the seemingly sacrosanct notion of causality and with it exposed philosophy to the riddle of induction. a nddle it has yet to adequately answer (Russell. 1945).

The Riddle of Induction

Reasoning n9m cause to effect is fundamental to action. to belief. in fact to all forms of knowledge (Hendel. 1955). The paradigrnatic case is that in which a billiard baU in motion stxikes another at rest and the second bali acquires motion. At first one oniy notices these two c i r c ~ t a n c e s . but on witnessing further demonstrations in which the same sequence & w a . occurs. thereafter if we see one baii moving toward another we immediately condude the fbst wiii irnpel the second. It is this inference which Hume (1748) brings into question:

' Aristotle distinguishes between being simiiar and being o p p i t e .

Page 17

At least. it must be acknowledged that there is here a consequence &am by

the mind that there is a certain step taken. a process of thought. and an inference which wants to be explaineci. These two propositions are far from

being the same: 1 haw found oiat su& an objecî has ahays been attm

wWt such an flect, ami I foresee thnt O* objects which are in appeanurce

similat wüi be aftended wïth simüar @ i . 1 shdl dow. if you please. that the one proposition may justly be infemed Zrom the other: 1 know. in fact. that it is always inferred. But if you insist that the inference is made by a chah of reasoning. 1 desire you to produce that reasoning. (P. 48)

In short. Hume contends that the supposition the future resembles the past is not founded on argument but. rather, is derived entirely from habit. And since to anticipate an effect is the same as believing the effect will follow. belief. too.

is seen to a i se from c u s t ~ r n . ~ That is. [t is not rmmn whidt is the guide to ÿfe

but astom (Hume. 1748). As John Maynard Keynes ( 1948) points out. at issue is the vaüdity of reasoning by induction:

Hume's skeptical criticisrns are usually associated with causaiity: but arguments by induction - inference from past parüculars to future generaiisations - was the real object of his attack. Hume showed. not that inductive methods were false. but that their validity had never been

established and that ail lines of proof seemed equaily unpromising. (p. 82)

Hume is not alone in questioning the validity of induction. But what. exactly. is induction? Before examining what it is. it is helphil to first cl-

what it is not: for one of the most widespread misconceptions of logic is that deducüve arguments proceed nom the general to the specific whereas inductive arguments mn from the specïfic to the general. This is simply not the case.

Belief does differ somewhat. in Mat. belief typicaily aiso c- with it with a peculiar feeling or sentiment which reftects the unifonnity of the wnjunction in past expenence (Hume, 1748).

Page 18

There are deductive arguments that go from the general to the general:

AU gorillas are apes. AU apes are marnrnais. - - --

AU goriilas are mammals.

from the parücular to the particular:

Akela is a wolf.

Akela has a tail.

Akela's tail is the tail of a wolf.

and from the particular to the general:

Brian is a politician. Brian is a thief. - - -

Anyone who knows ail politicians knows a thief.

Conversely. a number of inductive arguments proceed from general premises to

a general conclusion:

AU students in this class are highly intelligent. AU students in this class are motivated to do well.

AU students in this class will do weii.

and others which go fkom particular premises to a particular conclusion:

Page 19

Car A is a Porsche and car B is a Porsche. Both cars have the same type of engine. Car A's top speed is over 180 kilometres per hour.

Car B's top speed is mer 180 Wometres per hour.

niere are even those with general premises and a paacular conclusion:

Al1 emeralds previously examined have been green.

The next emerald exarnined will be green.

'Zhus the ciifference between inductive and deductive arguments is not in the g e n e r w or particularity of their premises and conclusions (Skyrms. 1986).

Nor is it to be found in the Hoiland et al. (1987) demition of induction. which they taise to encompass '... aiï inferential processes that expand knowledge in

the face of uncertainty." (p. 1). For not oniy might this sewe reasonably weïl as a demition of deduction. as Hempel's pamdox of the rauen ülustrates.

characterising induction as 'an inferential processw is. itself. contentious .g

2.2.1.1 The Paradox of the Raven

Hempel ( 19%) imagined a scientist attempting to logicaiiy jus tify (contirm) the hypothesis that *AU ravens are black'. The conventional method employed is induction by simple enurneration. mat fs. the scientist seeks out numerous instances of ravens and checks their colour. Every black raven supports the hypothesis. and a single non-black raven refutes it.

'Pdrniîive' induction involves the individual assuming that simlar consequemes wil fdow s i m i perceptud events (Quine, 1 995). Characîerising th& primitive induction as inference refleds a confusion between rnuttipfe instances and multiple prernises. 011 the other hand, interpreting 'cornplex' induction as the product of inference is aCso suspect because of Hempel's paradox; for any given set of premises imply a potentidly infinite number of directly contradidory 'inductions' (see section 2.2.1.1).

However since the assertion that -AU non-black things are non-ravens" is logically equivalent to the assertion *AU ravens are black". and given that there are more non-black things in the world than black things. the contrapositive form is easier to test. Unforhinately, it also leads to the foiiowing paradox:

For just as each black raven tends to confirm the law that a l l ravens are black. so each green leaf. being a non-black non-raven, should tend to CO- the law that ail non-black things are non-ravens. that is. again. that aii ravens are black. (Quine. 1969. p. 159)

Clearly this is an unacceptable state of &auS. For while it rnay be tempting to argue that each non-black. non-raven does. in fact. conIbn the hypothesis to a limited de-. it is important to rernernber that instances of non-black. non-ravens also serve to con- a number of contradictory hypotheses such as ravens are white" (if it is a non-white. non-raven).

2.2.2 The New Riddle of induction

The guiding principle of induction is the expectation that sirnilar things wïll

behave similarIy. But what constitutes simüariw The traditional view is to claim that to be similar is to possess many or most of the same propemes. More precisely. to Say that x is more siniilar to y than to z means that x shares more properties with y than with z. But this reduces to the unpromising task of detemiining what counts as a property (Quine. 1969). Nor can similarity be defhed as jointly belonging to more sets: for any two things are joint menibers

of any number of sets (Quine. 1969). ?he magnitude of the problem is aptly illustrated by Goodman's (1979) famous 'grue' paradox in which he asks the basis upon which one can refùte the daim that emeralds. if examined at some future point in tirne. will no longer be green but blue.

Page 21

Of course Goodman is not seriously contending that we should expect the next emerald that we examine to be blue. He merely wants to force us to consider why we expect it to be green. Like Hempei's paradox. we are again left in the intolerable situation where anything confirnis anything. Goodman's

answer is that some traits are projectible. ' O others are not. However as Quine

and Ullian (1970) observe. to cali a trait 'projectible' is only to Say that it is

suited to induction. It does not teil you why. Like Hume. Quine (1969) argues that the larger answer lies in the hoary notion of similarity:

Why do we expect the next one to be green rather than grue? The intuitive answer lies in slmilarity, however subjective. .. . Green things. or at least green emeralds. are a kind. A projectible predicate is one that is ûue of ail and only the things of a kind. What makes Goodman's example a puzzle. however. is the dubious scienafic standing of a general notion of similarity. or of kind. The dubiousness of this notion is itself a remarkable fact. For surely there is nothing more basic to thought and language than ouf sense of similarity: Our sorting of things into kinds. (P. 160)

Simpiy put. the question of what traits are projectible is really a question of what counts as similarity (Quine and Ullian. 1970). Thus we come full circle. It rnay be that we are ultimately forced to conclude. with Quine (1974). that similarity standards are simply a function of natural selection: 'Otheiwlse any response if reinforced. would be conditioned equaiiy and indiscriminately to any and every future episode. aii these being equally similaf (p. 19). Yet while

some siniilarity standards may be innate. sirnilarity judgments are also known to change with experience. As an example. after pairing different stimuli with

food Pavlov (1927) found that when one of the stimuli in the conditioned set was inhibited, the inhibition generallsed to ail the other members of the set.

Thus while at the lwel of sensory impingernents the stimuli remain dissirnilar. they eventually come to acquire a second-order sirriilarity based on salience.

'O By 'projedible" Goodman simply means that induction is abie to project the trait into the future.

Page 22

This appeal to salience should not. however. be interpreted as an invitation to mentalism. As is evident in the following example (Quine. 1974). for the most part saiience is a function of reinforcement history [emphasis added]:

The 1e-g of an observation sentence amounts to detemiining. as we may say. its shükuitg bcisis. By this I mean the distinctive trait shared by the

episodes appropriate to that observation sentence: the shared trait in which the perceptual simiîarity consists. In learning the sentence the child may approximate its similarity basis Little by little. in learning 'red' he has to leam that it is a question of sight. not some other sense. He has to find

the proper direction in the scene. and how much to count: how big a patch. He has to learn what aspect of the patch to count: he might think that

what mattered in his first red patch was the shape and not the colour. ... A m n g oie myriad of fw of oued2 impinganent. Ouzse that are ùrelerxuif

to 'red" u10uld in the bng rwi ceme to compete. Times when the sounci 'red" was reinforced wodd show th& axlvnon feabues ever m r e cleariy as their

irreieuantfmturescontiriuedtovan(atrruidm ... . (pp. 43-44)

Thus if 'red' is reinforced in the presence of a red bail and penalised in the

presencc of a yellow rose. a red rose rnay not elicit the correct response. l But

if the response is also reinforced in the presence of a red shawl. the red rose wül more likely elicit the appropriate response. It is not that the red rose is more similar to a red ball than to a yeiiow rose. it is more similarjointIy to the red hall and the red shawl than to the yellow rose. Red has become salient.

" Given its favourable colour and its unfavourable shape.

One cannot b@z@ be a determinlst in physics and chemistry and biology,

and a q s t i c in psgdu,Gosy. (D. O. Hebb)

3. Psvcholoeical Foundations

3.1 Behaviourism

Once thought to be the sakation of psychology. behaviourism has become a pariah. Seen as synonymous with a mechanical mode1 of mind. behaviourism is deemed to embody the dehumanisation mechanisation typically implies. However, as Zuriff (1985) points out:

Clearly. the t e m ~mechanicalw has manifold meanings. oniy a few of which apply to behaviourism. ... All too often when the terni is used to attack behaviourism. no clear meaning is attached to it. and the criticism is

impossible to evaluate. In many cases. no precise meaning is intended. and the t e m semes merely as a blunt weapon to disparage behaviourism. not so much for specific features of its pro&ramme but for the attitudes it is aileged to foster. Many critics believe that the behaviourist image engenders a disrespect for human beings and a wiiiingness to control and wen to coerce them. (p. 188)

nie centrai tenet of behaviourism was f h t outlined by the controversial founder of Amencan behaviourism. J. B. Watson. in his seminal 1913 paper entitled 'Psychology as the Behaviourist Views Itw [emphasis added]:

Psychology as the behaviourist views it is a pure5 objectgfe

b m d i of rsatuml SCieRCe. Its theoretical goal is the prediction and control of behaviour. Inbr,spea[on f o m no essentiQI part of its method ... . (p. 158)

With these few woràs. Watson not only gave birth to a new direction in

psychology. he also pnwoked a controversy which to this day shows little sign of abating' - for implicit in his dewtion is the assumption that psychology.

typicaliy classed as a social science. is more appropriately defined as a naturaï science. Concomitant to this view is the contention that empiricism, not rationali~rn.~ is the proper basis of psychological theory. This contention is

manifest in two ways: first. the traditional tools of rationaiisrn were rejected, most notably introspection: and second. there was a rejection of the object of rationalism (i.e. consciousness]. Both manifestations have been the subject of considerable debate: for both abandon Our familiar first-pemn perspedse:

The standard perspective adopted by phenomenologists is Descartes's fùst- person perspectSe. in which 1 describe in a monologue (which 1 let you overhear) what 1 fhd in my conscious experience. counting on us to agree. . . . he cozy complicity of the resulting first-person p W perspecliw is a

treacherous incubator of emors. In the history of psychology. in fact. it was the growing recognition of this methodological problem that led to the

downfd of Introspectionism and the rise of Behaviourism. n ie Behaviourists were meticulous about avoiding speculation about what was

going on in my mind or your mind or his or her or its mind. In effect. they charnpioned the thinl-person pemp3im. in which oniy facts garnered 'from the outside" count as data. (Dennett. 1991. p. 70)

' Behaviour theory 8 not atheocy d d; raher. it is a set of theories which &are what WMgenstein (1 958) referred to aç a fàn@ resernblance. That is, just as fmdy rnembers often display a resemblanca even though no one feature characteiises each and every member, there is no set of necessary and sufficient propetties which serve to d iaüy identify the members of the behaviourist farrdy. Behaviourism's famify membeis indude: methoddogicad behaviourism. iaoEcd behaviourism. andyoc behaviourism. logical behaviourism. and purpasive behaviourism. to name but a few. At best they share ai epistemology with dose links to the inteliectuai trad'ions of pasnivism and pragmatism ( a r i f f , 1985). ' Rationalism 6 characterised by the belief that it is possible to obtain. by r e a m done, knowleâge of the nature of what easts. But as Peirce (1898). the founder of pragmatism, long ago pointed out: 'Men many times fancy mat they ad from r e a m when, in point of fact, the reasom they attribut0 to themselves are nothing but excuses which unconsdous instinct invents to satisfy the teasing 'whys' of the ego. The extent of th6 seif delusion is such as to rendet philosophical rationahm a farce." (p. 1 1 1).

3.1. I The Reflexoloéical Mode1

n i e most often cited objection to behaviourism is the mistaken claim that

behaviourists see behaviour as the product of simple s tirnulus-response links.'

Both the prevalence of this view and the tenacity with which it is asserted. lends credence to Skinner's (1988) provocative assertion that 'the skepticism of

psychologists and phihophers about the adequacy of behaviourism is an inverse function of the extent to which they understand it.' (p. 472). ln fact. 11Y3 important behaolaoist - . theoy aduaIIy mnjorms to the simple r~~ naode1 (Zuriff. 1985). Instead. behaviour is seen to be the product of many reflexes modifying. facilitating. and inhibiting one another. As such behaviour frequently exhibits properües not possessed by any individual reflex. That is. although behavioural laws are typically expressed in the fom: 'Given stimuli a. b. c then responses x. y. 2'. responses are seldom uniquely determined by environmental conditions alone. As an example. an animal's response to food placed before it depends on how hungry it is. In other words. an organism's respome depends on its state. States restore lawfuhess when a one-to-one correspondence between stimulus and response is lacking. Put more formally. stimulus S irnpinging on organism A in state Tj generates response Rj:

Carnap cwed this formaiism a 'bilaterai reduction sentence'. I t has important implications for behaviourist psychology (ZuriE 1985). In particular. the state (i.e. disposition) is defbed only if the condition obtains.

' in its most basic fm. the reflexdogical mode1 asserts that behaviour can be ana- into discrete movements elicited by an immediately preceding impinging of energy on a sensory receptor.

As in ethology. the state of the organinn prDnes a spenfrc set of behauiotus for relerzse on the oanslon of encounteriru~ oppmpriate biggering stimuli.

In effect. states provide a context. Thus while behaviour is typically not in one - to-one correspondence with the environment. the mediating role played by the state of the organism means that it no longer foliows. as cognitivists contend. that there must be something eïse to which behaviour corresponds. Le. an intemal representation of the world (Zuriff. 1985). Nor is it necessary to posit intemal operations by which an organism transforms input fkom the

extemal world into intemal representations. This is not to Say. as often rnaintained. that behaviourism rejects the concept of intemal representation: for if the concept is rneant to refer to the fact that the environment causes enduring changes which effect behaviour. then few behaviouris ts deny internai representations exist. Sirnilarly. if the concept is meant to reflect the tmism that 3ehaviour is not solely a function of impinging stimulus energy. again few

behaviourists would find much reason to qumel. What is rejected. however. is the notion that representations somehow symbolise or act as a substitute for the environment. As G. E. Zuriff(1985) put the point:

It is as if in addition to the extemal environment. the intemal representation is yet another object which the organism can react to. know. or scan. .. . ?he problems of how the organism comes to know and adapt to the world are bransferred to the intemal problems of how the organism cornes to know and adapt to its internai representations. (P. 163)

3.1.2 Interbehaviourism

With the inclusion of learning and mediation al1 that remains of S-R reflex theow is the assertion that behaviour is a dependent variable. functionally or causaily related to environmental variables. However. even this assertion has corne into question. For it implies that the relationship is unidirectional, mming frorn the environment to behaviour. And as Bandura (1974) observed:

Environments have causes as do behaviours. For the most part the

environment is only a potentiality until actualised and fashioned by appropriate actions ... . By their actions people play an active role in producing the reinforcing contingencies that impinge upon them. ?bus. behaviour partly creates the environment. and the environment influences behaviour in a reciprocal fashion. (p. 866)

Aithough Bandura offered his argument as an attack on operant behaviour. in fact the kind of reciprocal determinism he describes is recognised on many levels by operant theories (Zuriff'. 1985). Skinner readily acknowledges that people often alter the variables which control their environment as. for

exarnple. when people remove cigarettes fkom their visual environment so they

wili not serve as discriminative stimuli for an undesired response of smoking.'

Behauiour is ofen a @netion of uariables which are thanselves a jimction of behnuiour. in short. the interaction between stimulus (S) and response (R) is S<->R not S > R (Kantor. 1970). Thus. despite the fact that S-R psychology is srpicaiiy characterised by contrast with fieId theories. especiaily by Gestalt psychologists. in truth the picture emerging is of behaviour effected by a group of interacting psychological forces dependent. in part. on the organism itself. This portrait captures the essence of field theory (Zurin. 1985).

'This is m a t readily apparent in Skinner's (1953) concept of "precurrent behaviour".

3.1.3 Unifvine Classical and O~erant Conditionhg

While studying ihe salivation reflex. Russian physiologist Ivan Pavlov (1927)

discovered that systernatic changes in the dogs behaviour were iinked to his behaviour. Specifically. as the dogs became farniliar with the experimental

situation (in which food powder was presented). they would begin salivating as soon as Pavlov waiked into the roorn. Pavlov wondered whether the prematm

salivation was triggered only by the sight of food. or whether any stimulus might corne to elicit the response. The product of his seminal investigations,

for which he received the Nobel prize. is the classical conditioning paradigm. What resulted from Pavlov's work was an associationist psychology whose unit

of behaviour was the acquired reflex rather than the idea (Terrace. 1973). His

work on acquired reflexes also considerably broadened Daminian theory. Operant conditioning further extended Darwinian theory by demonstrating

that responses. too. are subject to selection. -Just as genetic characteristics

which arise as mutations are selected or discarded by their consequences. so novel forms of behaviour are selected or discarded through reinforcement."

(Skinner. 1953. p. 430). Although the eminent ethologist Konrad Lorenz had reached a similar conclusion in Euolrrtion and Modjfication of B ~ w , it was another ethologist. Richard Dawkins (1988). who perhaps put it best:

Individual organisrns are raot replicators: They are highly integrated bundles

of consequences. ... At Skinner's lwel two5 the replicators are habits in the

animai's repertoire. originaiiy spontaneously proâuced (the equivalen t of mutation). The consequences are reinforcement. positive or negative. The

habits can be seen as replicators because their fkequency of emergence from the animal's motor system increases. or demases. as a result of their reinforcement consequences. (P. 33)

In Seledion by Consequences. the paper to which Dawkins S refemng. Skinner (1981) argued that behaviour k the produd of three levek of contingency: 1) contingencies of survival. 2) contingenaes of reinforcement and, 3) special contingencies maintained by an evolved social environment.

Page 29

Darwinian theory is not the only point of commonality between classicai

and operant conditioning. Indeed. it has becorne apparent that the ability to distinguish between classical and operant conditioning on even an operational Ievel is not nearly as decisive as some hme hoped. As Temace ( 1973) notes:

Ironicaiiy. as Our knowledge of instrumental [i.e. operant] conditioning increases. it appears that the domain of classical conditioning decreases: and the notion of classical conditioning as a fundamental and independent type of conditioning becomes more and more doubtful. (p. 94)

The principal. in fac t the only. significan t non-procedural difference between the two conditioning paradigTns is in the contingency which prevails between the organism and its environment (see Figure 3.1):

1 Environment Organism 1 us 4 )-*"

Enviromen t Organism

(contingency)

Fgure 3.1: Equivalence of (a) classical and (b) operant leaming para-. Functionaliy the role of the conditioned (CS) and discriminative stimuli (SD) are identical. as are the roles of the unconditioned stimulus (US) and reinforcing stimulus (SR). R is a response.

Thus the classical conditioning paradi@. when properly viewed. is sirnply a subset of the operant paradigm. Although the paradigms rnay operationaily be distinguished. in that. classical conditioning involves the re1ationship between unconditioned and conditioned stimuli whiie operant conditioning focuses on the relation between an arbitras'y response and a reinforcing stimulus. the main ciifference between the two methods lies solely in the interaction between the organism and the conditions in its environment. That is. in the classical

paradigm reinforcement (i. e. the unconditionrd stimulus) is alwap presents

whereas in the operant paradigm it is contingent. Therefore. the classicai and operant paradgms 3night best be reganied as simply diffierent procedures for studying behavioural change. procedures that are potentiaily understandable in ternis of a common reuiforcement principle." (Donahoe. 1988. p. 38). '

3.1 -4 Purrx>se in Behaviour Theow

The teleological objection to behaviourism contends that because behaviour theory makes no reference to purpose, it provides only a supeficial account of

the organisation of behaviour. On this view. behaviour cannot intelligibly or adequately be described without aiso implying mat organisms consider the factors of a situation in relation to the goals they seek (McDougall, 1926). But

while most behaviour theorists find teleological explanation 'empiricaiiy empty'

(Taylor. 1964). it does not necessarily follow that they ako deny that behaviour

is goal-directed.' It is apparent that behaviour typicaiiy demonstrates both

flexibility in assuming a variety of means to achieve its particular ends and persistence in the face of obstacles (Zufiff. 1985). Rather. what is at issue in

this debate is the causal sequencing of purpose and behaviour.

ln operant conditioning terrns, classical conditioning is a amtinuous (or fixed ratio one) schedule. ' Terrace (1973) has proposed that the cornmon pn'nciple (Le. the basic unit of analysis) is found in the sequence SI-R-S2, where S I is a cue, R a response, and S2 a reinforcing stimulus. "Not d behaviourists find teledogical explanation mpiridiy empty. Tolman (1932). for example, argues that purpose is an ernergent phenornena which rnay or rnay not eventually be reduciblo ta physiology.

Page 31

Behaviourists maintain that purpose derives kom behaviour: cognitivists contend the rwefse. Thus, for example. teleology suggsts the hand evolved as it has for the explicit purpose of holding (graspiag, etc.) objects, whereas the

behavioural (and ethological) account argues that the hand evolved as it has because organisms sporting sMar structural adaptations in past had greater reproductive success as a result of being able to hold [grasp. etc.) objects. Despite their obvious similarity. these interpretations are no t equivalent: for in assuming that goals cause the changes which bring them about. teleological explanations generate the paradox of a consequent causing its antecedent (Taylor. 1964). Behaviour theory offers a way out of this paradox: instead of clairning that a person behaves because of consequences which are to foUow. behaviourists Say that they behave as they do because of the consequences that have followed sirnilar behaviour in past (Skinner. 1953). This is. of course. Thomdüre's Law of med. Under this interpre tation. reinforcement and punishment are instruments rather than outcornes of teleology. That is. reinforcuig and pURiShüu~ stimuli are, in essence, wrrectse signais which serw tD guide dhe organisn toward Us gwls. If these signais arise entirely from within

the system the feedback loop is said to describe a simple reflex arc. But when the feedback cornes from outside. behaviour takes on a teleological flavour.

Ironically. in some ways the problem with teleological explanations is not that they explain too little but rather that they explain too much! It is often too easy to accommodate teleological explanations to the data (Skinner. 1964).

Moreover. rnany of these explanations invoive hopelessly circular lines of reasoning (Quine. 1960). For instance. a person is said to seek shelter during a thunderstom in order to stay dry: and the reason we know that their goal was to stay dry is because they sought shelter during a thunderstom. nius behaviourists object to teleological explanation. not because it inaccurately pof t fa . the perceived causes of behaviour, but because it merely adds another layer of interpretation and subjectivity (Zuriff, 1985).

Page 32

3.1.5 1s Behavioun'Lsm Dead?

If the received wisdom of today is to be beiieved. behaviourism is refuted: its

methods have failed. and it has little to oEer modem psychology (Zuriff, 1985). In short. it is dead. Some (e.g. GrofK 1994) go so far as to suggest that it was

never alive. Unfortunately it is beyond the scope of the present work to detail a proper x-eply to this question.' Perhaps ulümately one is forced to conclude. with Zuriff [ 1985). that the question is best answered by applying the sMngent standards of behaviourist positnrism to itself:

?he behavioural science furthermore predicts that behaviourist methodology wiii be adopted by mat , if not aii, scientists because scientists find

predic tion and control particularly rewarding. Thus, in the competition among various scientific methodologies. behaviourism is expected to succeed for two reasons. First. it pmvides greater rewards in the forrn of prediction and control than the others and therefore wiU be chosen by scientists. Second. a community which uses a methodology that is the most effective for its adaption wiL1. by definition. have the greatest probability of adapting and su~ving. Other communities . not possessing equaiiy effective methodologies. will not survive as weii in this cultural fom of competition and natural selection. Thus. behaviourist standards of scientific acceptabiïity can be viewed as a scientificaily based explanation and prediction of what iuUI mentuaDy become scientific practice rather than a prescription of what it ougM to be. (p. 278)

@The interested reader is refmed to rn gailier monogreph (Yeo, 1994) in wfvch I examined mis question. SpecificalIy, t i e d the mosi frequently cited objections to behaviour theory were reviewed: Le. the first person objection, the reflexological objection, the teleologicai objection, the black-box objection, and the GesiaR objection. and each waà found to be wanting. Suffice it to say that behaviwrism continues to play a key roie in both theory (9.g. the use of conditiming to determine the neurophysiokqy of learning) and pradice (e.g. to ûeat phobias and auüsm or teach Me sldls to the mentaliy challengeci). Sadty. as Zuriff (1985) has observed, 'Polemics. intemperate invective. adhaninern argument, and caricature peniade discussions of behaviourism by those who seek its demise. ... Fadm other than effectiveness hold sway, and the search for tnrth is lost in the battle between movements." (p. 278).

Probably the most famous tenet of the Gestalt school of psychology is its

claim that 'the whole is greater than the sum of its partsw.'" This adage not

only reflects the Gestalt emphasis on the unity of percepts. it aiso stands in

stark contras t to behaviourism's mode1 of isolable stimuli [i.e. atomism) . However the merences between Gestalt psychology and behaviourism may not

be as vivid as some of behaviourism's critics would wish to suggest." in that

hoiism is seen to emerge from atomism. mat is. gestalt phenaena anerge throqh the UztemctiOn of discretepmpertles. O r as von BertalanfQ (1968) put it:

The meaning of the somewhat mystical expression. 'the whole is greater than the surn of its parts' is sirnply that constitutive characteristics are not explainable from the characteristics of isolated parts. The characteristics of

the complex. therefore. as compared to those of the elements. appear as 'new' or 'ernergent'. (p. 55)

The Gestaltist's objection spans more than the atomisrn-holism dimension. The core of the controversy is embodied in the distinction between and 'insight" (Koestler. 1964). For if the various theories of learning are plotted on a continuum. be haviourism's classical conditioning paradigm would be at one end and Gestalt's spontaneous problem solving would be at the other. This

polarisation of views led Bertrand Russell (1959) to quip. -Animais obsemd by Americans rush about franticaily until they hit upon the solution by chance.

Animals observed by Cemians sit stU and scratch their heads until they evoive the solution out of their inner consciousness." (p. 96).

' O Although this famous adage ac(ualy originated with Aristotle. by providing an ernpiricai foundation Aristotle's daim, the Gestah schaol gave it new life and new signif iam. " In fact, some behaviour theotists (9.g. Tohan) incorporate Gestait principles in their learning models

for

While many people have eniisted Russell's quotation to support arguments ranging ftom psychological prejudice to attribution of national characteristics. few note its continuation: '1 beiieve both sets of obsewations to be entirely reliable. and that what an animai WU do depends upon the kind of problem you set before it." (ibid. p. 96). Behaviourists devised learning tasks for which

the animals were biologicdy ill-fitted (e.g. bar pressing). It is not surprismg, therefore. that they found the acquisition of new skills could only be obtained by means of the slow p r e s s of conditioning. In contrast. Gestaltists set their animais tasks for which the animals were naturaiiy suited (Koestler. 1964).

Invariably. mastesr of the task was very rapid. leading Gestalt psychologists to (mistakenly) conclude that ail learning is based on insight whereas. in fact. the most fkequentiy cited insightful performances are more appropriately seen as rare Unit cases (Koestler. 1964). For instance. in an attempt to replicate Kbhler's famous experiment in which a chimpanzee insightfuîiy leamed to extend its reach by joining two sticks together. Koestler (1964) found that only chimpanzees which had previous experience piaying with sticks were able to make this spontaneous discovery. Chimpanzees without previous expenence. although of fou- equal intelligence. simply failed to 'see the ïight". Yet

when the experîment was repeated after the naive anirnals had had a chance to play with the sticks. the sticks turned to rakes instantly (Koestler. 1964).

K6hler ( 1957). himself reported that it took a young chimpanzee named Koko. 19 days to learn to push a box under a suspended banana and to climb atop. Moreover. when the banana was moved a few metres from its former position. Koko was unable to repeat the 'insight'. niese observations would seem to imply that insight requires that the comment behaviours be weil-established. In other words. the suddemess of an insightful solution is due to the fact that the organism already has the requisite base sküls in its repertoire - al1 that is needed is a link (Le. association) to combine them.

Despite Gestalt psychology's contention that insight necessarily excludes al i trial-and-emor learning. a moment's reflection shows that the history of science abounds with examples of brilliant insights which were preceded üy

more-or-less fumbling tries. in half-unders tood situations (Koestler. 1967). n ie crucial mistake made by the Gestalt school was to identify trial-and-emor leaming with the behaviourist's rejected reflexological rnodel. ?rial-and-emor need not irnply blind or random behaviour: the amount of 'stamping-in* decreases in proportion to an organism's biologïcal preparedness for the task. This is not to Say. as Gestaitists (and other nativists) contend. that learning is

merely the unfolding of a priori neurological processes. l2 For as Poiyàni (1958)

observed. nativism ultimately forces one to untenable conclusions:

From this principle (isomorphism) it would foliow that the whole of mathematics - whether known or yet to be discovered - is latent in the neural traces arising in a man's brain when he looks at the axioms of PnnCipia Mattianatica . . . . (Polyiini, 1958. p. 34 1 )

Rather as Donald Hebb (1949) once rernarked. an insightfùl act is an excellent example of something that is not leanied. but still depends on learning. It is

not learned since it is adequately perfonned on its fmt occurrence. appearing aü at once in a recognisable form. On the other hand. the animal must have had prior experience with the cornponent parts of the situation or with other situations bearing some similarity: the situation cannot be completely s trange. AU evidence thus points to the conclusion that any new insight consists of a recombination of preexistent mediathg processes. not the sudden appearance of a whoily new process (Hebb. 1949).

'* The -ah prinuple of 7somorphism" maintains the existence of an a prion' conespondence between processes in the nervous systern and events m the outside wodd. Thus an organisn need not construct a mode1 of reaiity -the mode1 is prefigured and need only be activated by spontaneous insight.

3.3 The Janus Princinle

AU life is hierarchically organised: and relativity of the te- part and whole is a universal characteristic of hierarchies (Koestler. 1967). Parts and wholes. in an absolute sense. do not exist anywhere. We fùid. instead. intermediary structures - sub-wholes - that display. according to the way you look at them. some characteristics commoniy attributed to wholes. and some characteristics commoniy attributed to parts (Koestler. 1967). These Janus -faced s tmctures.

which Koestler caüs hobm. l3 represent the missing link between the atornistic

appmach of the behaviourist. and the holistic approach of Gestalt psychology. Moreover it is holism without mysticism. and atornism without reductionism. For while it is possible to dissect a complex whole into its composite parts. it is seldom possible to predict the properties of the whole frorn those of its parts.

Langton ( 1989). perhaps. has put the point most eloquently:

The distinction between linear and nonbear systerns is fundamental. and provides excellent insight into why the mechanisms of life should be so hard to tùid. nie simplest way to state the distinction is to Say that ünear

systems are those for which the behaviour of the whole is just the sum of the behaviour of its parts, whiïe for rranlinwr systems. the behaviour is m r e

chan the sum of its parts. ... We can break up cornplicated linear systems into simpler constituent parts, and analyse these parts independent@. Once

we have reached an understanding of the parts in isolation. we c m achieve a full understanding of the whole system by canposhg our understanding of

the isolated parts. This is not possible for nonlinear systems ... . The key feature of nonlinear systems is that their primary behaviours of interest are properües of the interrxdiom between partç. rather than being properties of the parts themsehres. and these interaction-based properties necessarily disappear when the parts are studied independenW. (P. 41)

l 3 From the Greek holos rneaning bhoie' wiai the suffix 'on' to suggest a partide, aholon S "a of relations which is repreçented on the next higher levd as a unit, i.9. a relaturn." (Koestler, 1967, p. 72).

Page 37

m e nemous system is a hierarchic a a i r in which functions at the higher

levels do not deal directly with the ultimate structurai units. but rather operate by activating lower patterns that have their own relatively autonornous

structural unity (Koestler. 1964: Minsky. 1985: Brooks. 1987). l4 Feedbadc

neœssanIy opemtes within theftwed &nits imposeci by a hodon's encoded m m . In this way. the 'rules' embodied in each holon represents positive precepts to

be foilowed. not merely constraints imposed on actions. This is most readily apparent in the output hierarchy where. at each step of its journey to the periphew signals release preset action patterns. Simple skills are thus seen to be prerequisite to the emergence of more complex skiiîs. including the stick

manipulation of Gestalt fame. That is to Say. the probability of an 'insight' is greater the more nrmly established each of the separate skills. However skilful at camying a stick, a dog wiil likely nwer leam to use the stick to get a piece of meat placed outside its mach. whereas Kbhler's primates were 'ripe' for this discove~ as they already possessed the necessary manual dexterity and oculo- rnotor coordination to enable them to develop the playful habit of pushing objects about with sticks (Koestler. 1964). And it is in the notion of 'ripeness' that the bridge between behaviourism and Gestalt is to be found:

The embittered controversies between dinerent schools in experimental psychology about the nature of learning and understanding can be shown to deme to a large extent fmm a refusa1 to take the factor of ripeness seriously. The propounders of Behaviouristic psychology were wont to set their animals tasks for which they were biologically iIi-fitted. and thus to prwe that new skills could only be acquired through conditioning. chaining of reflexes. learning by rote. Kohler and the Gestalt school. on the other hand. set their chimpanzees tasks for which they were ripe or abmst W. to prove that ai l learning was based on insight. (Koestler. 1964. p. 109)

"To borrow Koestler's (1967) example, the command %ght cigaretten need not speafy Mat each finger musde must do to sûike the match; i! merely prods the appropriate œntres to action.

A fkequent criticism of connectionist theory is that it lacks neural realism. Even within the comectionist community. there is a growing sense that much modem work in neural networks has moved far away fkom its mots in the study of the brain (Anderson and Rosenfeld. 1988). This is cause for concem. Not only is the field losing contact with its foundations. it risks losing a source of valuable ideas. Indeed, biologlcal plausibility impîies famiüarity with the neurophysiological mots of cognition. When models are too removed fkom biological reaiity they become. to quote Cowan (1988). 'so difficult to assess and waluate that we reaily are in danger ... of recapitulating the history of psychology. which treated the nervous system as irrelevant as long as you were understanding the phenornena" (in Johnson. 199 1. p. 2ûû).15

3.4.1 CeU Assemblies

For biologists and connectionists alike. one of the most important models of cortical organisation is Donald Hebb's (1949) ceZi ussembly theo y:

Any frequently repeated. particular stimulation wiii lead to the slow

development of a Yeu-assembly". a diffuse structure . . . capable of acting as a closed system. deiivexing facilitation to other such systems and usuaUy having a specific motor faciiitation. A series of such events constitutes a 'phase sequence" - the thought process. (P. 48)

Hebb's ceii assembly theory suggests how order car! arise fkom chaos. Perhaps must significant. at least from a neural modemg perspective. Hebb went on to spenilate precisely how the new circuitry might be corne about:

" Cognitive science. as distinct from artifidai intelligence. seeks reaiistic methods of achieving cognition.

When an mon of celi A is near enough to excite a cell B and repeatedly or persistently takes part in firing it. some growth process or metabolic change takes place in one or both ce& such that A's efficiency, as one of the ceUs firing B. is increased. (ibid. p. 50)

Simply put, two neurons which Bre together wiii form a link (i.e. associate). The resulüng constellations of neurons become symbols which stand for the

things and ideas that make up ouf world (Johnson. 1991).

3.4.2 The Neuro~hvsioloatr of Memorv

Hebb (1949) beliwed that the most probable way in which one ceii could become capable of firing another is either for new synaptic knobs to dwelop or for existing boutons to enlarge. increasing the area of contact between the

afferent axon and efferent soma.'' Although there is liffle evidence to support

enlargement as a means of increasïng contact area. substantial fiterature has accumulated (e.g. Edelman. 1978; Changew. 1985: Young, 1986; Purves. 1988)

giving credence to the notion that dweiopment generates a set of neural connections which are ultimately reduced by the selection of some neural circuits and the regression or remangement of others. Popularly known as nemd Danoinism. 'In this view of memoxy the processes of perception. motor response. and associative recoiiection are intimately tied together by the

process of global mapping." (Edelman. 1989. p. 56).17

" To Hebb. the terni 'soma' refened both to the cal1 body (perikaryon) and its dendrites. " A global rnapping is a dynamic, high-order structure containing multiple reentrant kxai corocal maps, motor and sensdry, which interad with non-mapped parts of the brain (0.g. the hippocampus). k is the end product of a three-stage pracess. fi-, aç a consequence of daborate chmicai control loops, genetic and epigenetic forces direct œU movement and process extension, leading to the formation of primary repertoires or variant neuronal groups (bai circuits) within a given matmical reg ion of the cortex- After the anatomjcal comedions of the p i m q reperioires have b e n more of less fixed. -n neuronal groups ae formed as the result of the synaptic abratiori caused by ongoing sensory stimulation. Finaliy, these secondary repertoires ae linked by reentrant signais which ad to coordinat0 inputs and resolve conflicts between the different functionafly segfegated maps (Edelman, 1989).

As new contexts and associations occur. the process of global mapping. with its accompanying patterns of neuronal selection and synaptic change. creates spatiotemporaily continuous representations of objects and events. By extending the process of global mapping to the rnapping of types of maps. the bmin is thereby able to represent its own activities (Edelman. 1992). In this way topoiogicaüy comected cortical maps make it possible to both classify and comelate happenings in the world. For until organisms devise a meaninal (i.e. adaptive) criteria to partition the world. novelty predominates .

As its name implies. neural Darwinfsm is founded on the tenets of natural selection. But whereas naturai selection requires differential reproduction. in synaptic selection differential amplification over time is required. Recently the biological foundation of differential ampliacation has slowly begun to emerge. Much. however. remains in dispute. IronicaUy. the key point of contention between the camps is not over the biophysics of amplification. but rather. over its site. For while it is generally conceded that synaptic actMty triggers an influx of calcium ions initiating a cornplex series of biochemical events leading to amplincation. theorists disagree on the site of the calcium's action. One of the main proponents of the postsynaptic model. Daniel Alkon (1989). argues that memory is effected by 'plugging holes' in the postsynaptic membrane:

In rabbits my coiieagues and 1 have Iooked at neurons cailed CA1 pyramidal ceiis in the hippocampus of the brain: in snails we have looked at neurons known as ?Srpe B photoreceptors. which detect iight. In both the rabbit and the ma i l the repeated temporal association of stimuli over the course of Pavlovian conditioning causes a persistent change in these target neurons: the flow of potassium ions through channels in the membranes is reduced. ... Ordinariiy. potassium-ion fiow is responsible for keeping the charge on the ceU membrane well below the threshold potential at which propagating signals are triggered. When the flow of potassium ions is reduced. impulses c m be triggered more readily. (p. 44)

Page 41

Aikon's model nuis counter to Hebb's requirement of joint pmsynaptic and postsynaptic activity. in that the spread of signals from one postsynaptic site to another does not require that the sites be active (Alkon. 1987). There need only be a wmergirag pathway so that the temporally associated stimuli meet. And since activity is not specific to synapses (leaming is a postsynaptic event),

speciflcity depends on which combination of ce& is most easily excited by subsequent presentations of a conditioned stimulus (Alkon. 1987). That is. the representation of a stimulus pairing requires that an entire set of cortical cells have their dendritic elements altered by the conditionhg process. Herein ïies the problem. As m o n ( 1987). himself. points out:

Thus. there must be in Our model some specificity in the pattern of cortical cells' firing for the geneticaiiy specified unconditioned stimulus effects. Speciacity can be provided if we design the model so that any particular unconditioned stimulus can interact (i.e. converge) with a whole range of conditioned stimuli - but on the sarne ceU - within the dendritic tree of one neuron. (p. 149)

Unfortunately. topological convergence of this magnitude is just not found in

the cortex. The cortex is. for the most part. a collection of components ail of one kind (pyramidal ceils), each with 5 to 10 thousand synaptic contacts through which it is affecteci. and about as many through which it affects other neurons (Braitenberg. 1989). investigations into the convergence /divergence of these connections rweals that the most iüceiy mnmcüon betwee~ a pair of neumrzs is just one s y ~ i ~ ~ t i c oontad (Braitenberg. 1989). Multiple contacts do happen. but rare&. The overaii picture of cortical organisation is a minority of neurons whose action is inhibitory. n m o w in range. and very strong. interspemd among a mjority of excitatory neurons comected by a vast number of very weak. very widespread. synaptic contacts (Braitenberg. 1989).

This is not to Say. however. that complex synaptic interactions are rare. In fact. as the following schematic (Figure 3.2) of Herr>lissenda's neural circuiw clearly illustrates. gated neurons would appear to be the

Optic Ganglion

Turn toward light

Figure 3.2: Schematic diagram of the visual pathway and its convergence

with the statocyst pathway. The type B photoreceptor inhibits the type A photoreceptor which. in conjunction with excitation from the hair celis (HC). excites the ipsilateral interneuron (1).

The ipsilaterai interneuron. in turn. excites motor neuron (M). triggering a tum toward the light . [from Aîkon. 1987)

h fa, there is growing evidence (e.g. Hawlons et al., 1983) that coinadent activrty in presynaptic and postsynaptic neurons (Le. Hebb's model) s not ailid for sttengthening the neuronal connections. The connection can aiso be strengthened without adMiy of the postsynaptic c d when a third neuron, conjoined d the synapse of the presynaptic neuron, is adive at the same time aç the presynaptic neuron. This suggests that the abiiii to detect associations rnay simpiy reflect the intrinsic capability of certain cellular interamons (Kandel and Hawkins, 1992).

The presynaptic forces are championed by one of the most infiuential people in neuroscience - Eric Kandel. Based on his extensive investigations of the simplified neivous systern of the sea mail ApZgsia, Kandel has concluded that

leaming alters the amount of neurotransmitter that a neuron releases. Speciticaiiy. in habituation the flow of neurotransmitter is turned dom. in

sensitisation it is turned up.lg KandePs detractors have argued that because

sensitisation and habituation are non-associative. they do not reaily qu- as learning. The point is moot. howwer. as results fkom both Drr,sophtla and Ap4sia studies indicate that a key mechanism of classical conditioning. an associative fonn of learning. is merely an elaboration of a mechanism of

sensitisation [Hawkins. 1989). That is to Say. as in sensitisation. in classical conditioning calcium-induced faciîitation is the main mechanism of learning. Further evidence is found in the fact that faciiitation requires the same timing on the cellular level as classical conàitioning does on the behavioural level

(Hawkins and Kandel. 1984). These fhciings suggest that there might be a 'cellular alphabet' for learning. where the rm&mism of the more copnptex f o m of iamîig might be genemtedfrom amburations of the mechunisms founci in hnbilrcation and sensitisatton (Hawkins and Kandel. 1984).20

The moleailar mechanism 8 amihr to that proposed by Alkon, i.e. calcium-ion-induced variation in potassium fbw. h habituation. repeated stimulation producas a prolonged inactivation of the calcium channets, leading to a decrease in calcRun ion influx. Conversely, in sensitisation the neurotïansmitter semtonin triggers an enzyme that cataiyses the synthesis of cydic AMP. increasing the influx of calcium. The higherorder features of c k s W conditioning (6.g. blocking) would seern to provide ai attractive

arena in which to investigate this provocative hypothesis. As Hawkhs (1989) points out. -.. these (higher-order] features of conditioning have acognitive flavour (in the çense that the animai's behaviour is thought to depend on a amiparison d current senscry input with rn internai representation of the worid) and they may therefore provide a bridge between bssic conditioning and more advanced fm of leaming ( Kamin. 1969; Rsscoda. 1978; Wagner. 1978; Mackintosh, 1983; Dickinson, 1 WO)." (p. 67).

EqLaations are mre important to nie, beaue poiitii!s is for the present. but an equation is sa7aethfng for etan@. (Albert Einstein)

4. Mathematical Foundations

Neural networks are defhiteîy not for the rnathematically faint of heart! In many respects. contemporary comectionism closely follows the rigorous mathematical tradition of physics. It is no coincidence that some of the most important learning models (e.g. Hopfield networks. the Boltzmann machine) are applications of principles originally derived in statis tical mechanics. Ye t. whiie it is tme that mathematical notation is often so terse as to be cryptic. it

is also important to remember that. as Whitehead and Russell (1910) long a . pointed out. mathematics is a form of logic. ' Thus. adopting the mathematical formalism promotes a rigour often lacking in psychological theory.

4.1 Pattern Mathematics

Many of the properties of neural network models are succinctly described by

the mathematics of linear algebra. A key concept is the notion of vector space. in its most general fom. related numerical values can be viewed as coordinates in an n-dimensional space. As an exarnple. lines of longitude and latituae on a rnap can be thought of as points in two-dimensional vector space. And just as we c m calculate distances bemeen cities based on their map coordinates. points in vector space can be mathematicaiiy manipulated and compared.

This is not to say mat mathematics is reâucible to icgic. Rather as Quine (1953) put it, The f o m u l ~ wbidi are wanted as theorems am of course just those wtiich are Wiü under the intended interpretations of the primitive signs -vaM in the sense of being either !rue statements or open sentences whidi ae true for d values of free variables. I n m u d i as d Iogk and mathematics is expressible in this primitive language, the valid fonmdm mtxaœ in transWbn dl vdid sentences of logic and mathematics Godel has shown that this totaCty of principies cm never be exadty reproduced by the theorems of afomiai systern, in the sense of 'fumai system' just now described. ... A fair standard is afforded by Pniiipia; for the b& of Pni.rû,pa iç presumaMy adequate to the derivation of all codified rnathmaticat theory, except for a fringe requiring the axiom of infinity and the axiom of choice as additional assumptions." (p. 89).

One of the most important of these manipulations is the use of mabices2

as operators to map between vector spaces. For instance. the mapping between the n-dimensional vector space in which vector x resides. and the m- dimensional space of vector y. is effected by the equation:

where the mat- operator W is an array of mai n a r s consisting of m rows and n coiumns. However the mapping capabiiity of a linear system is ltmited (Minsky and Papert. 1969). lSrpically multiple rnappings must be appiïed in

order to achieve more complex vector transfocrnations. Unfortunately it is not simply a matter of pmpagating vectors through a series of rnatfix operators; for it is a mathematicai truisrn that:

That is to Say. when an input vector is cascaded through a series of matrices. unless a nonlfnearity is introduced between matrices. the effect is the sarne as if the vector had been appïied to a single m a m fomed by the& product!

Measurine Pattern Similaritv

In many ways the detemat ion of similarity is at the heart of the riddle of induction. Given this importance. it should corne as no surprise that there are a variety of ways to measure pattern simiiarity. One the most popular. largely due to its computational simplicity, is the (u~ormaiised) correlation:

' Typically matrices am represented by capital Mers and W o r s by lower case îetters.

However. sometimes the relevant information is contaIned in the relative

magnitudes of the components. In this case 4.03 may tend to exaggerate the importance of one or other of the patterns. In order to estabiish a 'level

playing field'. often similarity is better measured in texms of direction cosine:

where. if x and y are Euclidian vectors. then (w . x) is their inner product. The fünctions ltwll and llxll give the length of vectors w and x respectively. Formaiiy

known as the 'Euciidian norm'. the length of vector v in Rn is given by:'

~inally.' logicai equivalence can also be used to rneasure pattern simüarity.

Because continuous -valued logic faciiitates calculation . Zadeh's ( 1973) 'fuzzy'

equivalence measure (5) seems particularly weli suited to the task:

Once the element equivaiences have been calculated. they must be cornbined into a measure of overaii pattern similarity with the aid of a function that is both rnonotonic and symmeMcai with respect to its arguments. e.g.:

' Notice that if the length of the vectors in equation 4.04 are standardiseci to unity (Le. Iiwil = IWI = 1) then the m i n e similarity measure is identicai to the unnomalised correlation given by 4.03. ' Measures of syrnbû string s imiMy (0-g. Harnming distance - see Hamming.1950) wüi not be discussed, as they are beyond the scope of the present work. The interested reader is refend to Kohonen (1984).

Page 47

4.3 The Mathematics of Pattern Association

Linear systems5 offer both advantages and dlsadvantages. A major disadvanta@ is their rather ïimited input-output rnapping capabiiities . This

point was forcefuliy brought home by Minsky and Papert in 1969. But Wear systems also aiiow a precise mathematical solution to the thomy problem of d e t e m g weight values to ef5ect a desirrd set of input-output mappings. This solution (known as the outer-product) has important implications for Hebbian learning in that. when used in combination with the aforementioned inner product siniilafity measure. it is often possible to recd one rnember of an associated vector pair using the other as a retrieval key. As an example. if matrix W embodies the outer product association of vectors x and y (i.e. YS). then to trigger the recail of y simply take the inner product?

Moreover. it is usualiy possible to store several vector associations in the scnne

m t r i w by summing the outer products across the patterns p to be memorised:

Fomidy a linear system is defined by the equations:

The first equation merely dates thai mutüplying the input by a constant (p) yields the s m e result as muitiplying the output by the sa~ie constant. The second equation. however, iç rather more interesting; for il irnplies that knowing how a system responds to each input separately 8 d that 9 needed to predict how the system will respond to the sum of iîs mm (Le. the whole is quel to the sum of its paf%). Perfect recall (shown here) requires that the input vectors be norrnalised.

However. if the set of input vectors is not ~rthogonal.~ the outer product (i.e. Hebb) mle does not perfectiy associate the input and output vectors (McCleiiand and Rumelhart. 1986). In this event. the optimal associative mappïng is @en by the pseudo-merse:"

Ihus. if niatrices X and Y represent the set of input and output patterns to be

associated. the optimalQ comlation matrix is obtained by the Penrose method:

There is also an important relationship between the pseudoinverse and orthogonal projection. Specifically, if X is a (possibly rectangular) rnatrk with

XI. x2. . . . . xk its colurnns. and if X+ is its pseudoinverse. then the orthogonal projection operator on the space spanned by the columns of X is given by:

Any vector (r) can be uniquely decomposed into two vectors. one (p) spanning a speciaed n-dimensional plane and the other (q) orthogonal to that plane. such that: r = p -+ q (see Figure 4.1). Moreover, it is always possible to calculate these component vectors by means of the Penrose solution:

' Two vedors, x and y, are omiogond (Le. at ngM angles) if their innec produd is zero (Le. xTy = 0). This equation assumes the rows of X are linearty independent. More gmwally, for any m K n matrix X with m 2 n, the pseudo-inverse (or generaiiseâ inverse, or Moore-Perirose inverse) can be defined as shown in equation 4.12. provided that XTX s nonakigular (see Rao and Mitra [1Wl j fa detaik). O ln the least-squares sense (Le. minimises the sum of squares enor across the mernonsed patterns).

m e 4.1: Orthogonal projections in three dimensional space.

Since detemiining the pseudoinverse of a matrur is typically computationaily expensive. it is fortunate that there is a more efficient method of calculating orthogonal projections. Known as the G m S c h m i d t orthogoRQLiSCLIUKI

. . p-s,

its original purpose is to find a set of mutualîy orthogonal basis vectors which

span any n-dimensional Iuiear space Rn (Kohonen. 1984).

nie direction of the first vector can be chosen hely (Kohonen. 1984).

Therefore the first basis vector. yl . is simply r l . Thereafter. each new vector is determined by recursively applying the nile:

where (ri ** y.) is the inner product of ri and y-. and j J 11 11 is the length of 3.

Vector yi is the residual that is left over when the best linear combination of the old patterns Urj for j < i) is fit to the input data. In other words. the wcbr yi ~ t . e s e n L s Ohe 'muerty' fn the m e n t input wcQr (Kohonen. 1984). It should therefore corne as no surprise that the Gram-Schmidt process piays a key role in the proposed enhanced novelty Blter leaming model. In fact. as will be seen. the enhanced novelty tilter employs a variant of the GramSchmidt process to dynamicaiiy extract novel information h m its input patterns.

4.5 Phi Function~

Phffioicirons (Niisson. 1965) are the heart of the proposed learning model. A phi function with parameters (weights) w l. w2. . . . . wd. wd+l is wxitten as:

where the fi(x). i = 1, 2, ... , d. are "Iinearly independent, real, single-valued functions independent of the weights' (Nilsson, 1965. p. 30).1° Although there

are literally an infhite number of phi functions, because input interaction is central to the thesis advanced in the present work. of prime importance is the

phi famiy known as rtfwrder p o ~ ~ f i a d ; l o ~ s . defined as:"

where kl, k2, ... , kr = 1, 2, ... , n and ml, m2. ... , mr = O and 1. Note that

despite the potential noniinearity within each of the component functions, fi(x), the phi function itself is a iinear composite of its parameters. That is to Say, although a phi function may not be Wear with respect to its input vector. x. it is linear with respect to its cornponent ternis. This allows a number of useful simplifications. For instance. although a aven phi function f a d y member may implement a complex decision surface. collectively the loci of points in the pattern space which satisQ the equation:

wlfl(x) + w2f2(xl

merely constitute famiUes of decisiori

+ ... + W&X) + ~ d + ~ = O (4.19)

suriaces (Nilsson. 1965). The phi-space separations these surfaces effect are called pN dichotantes.

'O The final (d+l ) weght vector dernent embodies the neuron's threshold value. ' ' Which indudes the family of iinear (r = 1 ) fundions.

Page 5 i

4.5.1 Pptimal Phi Solutions

Sobajic (1988) has carried out an extensive theoretical anaiysis of the phi hinction methodology, and has proved that phi fimtions are always capabie of yiecding a j W d lie. no Men iayers) sohrfirni tu kmüng problans! It is fairly

straightfolward to demonstrate this important conclusion. As noted previously. in the standard (iinear) leaming problem the goal is to

optimaily solve the foUowing matrix equation:

where X is a matrix of p input patterns each d elements in length, w is the corresponding d element weight vector. and y is a p element output vector. If

the p is equal to d then the optimal solution is directîy given by the equation:

where the expression [ x ~ x ~ kT is the peudoinverse of X. l2 Simiiarly. if p is

less than d. we c m parütion X to ob tain a square (p by p) matrix. 2, and then (provided the deteminant of Z is not equal to zero) o p W y solve for w using:

The more interesting case is when p is greater than d. Because a phi function can potentiaiiy generate an infinite number of o ~ m m m l values (Pao. 1989).

d can aïways be made to equal p. That is. if rnatrix F is a p by p-d dimensional set of functionaily enhanced coiumns of X the optimal solution is given by:

"This fom of the pseudoirmem assumes that the cdumns of X are lineariy independent. if the patterns (Le. rows) are lineady independent, but oie columns are not, use xTwT)-' instead.

Thus it can be seen that fùnctional expansion WU a h y s yield a fiat.net solution if a sufficienüy large number of additional orthononnal functions am used (Pao. 1989). Morewer. because the noniînear mapping is effected by a sinae layer. the functional enhancement methodology pennits powerful hybrid

solutions (e.g. combining supenrlsed and unsupervised paradigms). l3

l 3 E is, of course. al80 possible to functionally enhance the inputs to a muîtilayer system. One partiailarly interesting example is the 'finite lmpuloe Response" mode! proposed by W m (1994). Wan's model. which ultimatdy proved the winning entry at the 1990 Santa Fe Tm Series and Analysis Cornpetition. used a tapped delay Iine to funcüonally enhance the input aniving z# each unit (including hidden unes). Training was by means of Me generalised delta nile algorithm.

Your bmin ... has liferally nUUiDns of sentries &st continuorisly gazhg at a

portion of the extemal WOM. ready to sound the a h n and dmw your atkntbrt to

anythhg muelarad reieuanthcrppeniras in the wrid (D. C. Dennett)

S. The karninB Mode1

5.1 The McCufloch-Pitts Neuron

Although the narnes 'McCulloch* and Pitts' are usually associated with the birth of connectionism. in fact they laid the foundation for both the syrnbofic and subsymbolic paradigrns (Boden. 1990).' in their seminal paper. with the

foreboding title @A Iogical calculus of the ideas immanent in nervous activity".

McCuUoch and Pitts (1943) dernonstrated that any finite logical expression can be realised by neurons adhering to the following five basic a~sumptions:~

1) The activity of the neuron is an 'al-or-none' process.

2 ) A certain number of synapses must be excited within the period of Latent addition in order to excite a neuron at any tirne. and this number is independent of previous activity and position on the neuron.

3) The only sigmficant delay within the neivous systern is synaptic delay.

4) Any active inhibitory synapse absolutely prevents excitation of the neuron at that time.

5) The stmcture of the network does not change with t h e .

' Under McCulloch and P i (1943) interpretation, theoretical psychology becames the design of 'nets' capabie of the computations carried out by minds. t these nets are thought of as red neural connections, the resuit is connectionism. if, however, the f m s is on their logic processing properties, we have the information-processing approach wcai of traditional artifiial intelligence (Boden, 1990). That is to Say, any Tuhg machine can be constmcted fmm these simple neumns.

in temis of neuroanatomy. the specifics are now believed wrong. Howwer the

paper p v e d that a brain-îike machine was capable of extracting many of the regulaities of the world. opening the door to a number of exciting possibilities. For as Johnson (199 1) pointed out:

[A McCuUoch-Pitts network] could talk with other networks. examining ttieir output for regularities. It could explore its inner world. looking for patterns among its own ideas. It could know what it knows. niis.

McCuiloch wrote. would let the neural machinery have 'an idea of ideas. which is what Spinoza calls consciousness . . . ." (p. 136)

5 -2 The Perceotron Cunver&nce Theorem

m e r McCulloch and Pitts (1943) successfùlly showed that neural networks could implement any system of logicai predicates. the main problem facing researchers in the area was to understand how sucih networks could learn. Aithough (ironically) Minsky was b t on the scene with a learning machine. the real beginning of meaningful neural network learning models can probably be traced to the work of Frank Rosenblatt (Rumelhart and McClelland. 1986). It was Rosenblatt who devised the farrous perceptmn wnuergence pmedure which. as Block (1962) pointed out. is behaviourist in design:

'he Iperceptmn] m r wn-&n procedure Is as folbws. A stimulus Si is

shown and the perceptron gives a response. If the response is correct then no refnforcement is made. If the response is incorrect then the [weightj op

for active associators t+ is incremented by qpi [where pl is the target (- 1. + 1) response class]. The inactive associators are Ieft alone. (p. 144)

This procedure represented an important advance over the original Hebb mle: for unme the Hebb mle which only considers the strengthening of synaptic bonds. in Rosenblatt's procedure synapses also 'atrophy'.

nie tem 'perceptron* refers to a class of self-organising systems designed to

shed iight on the problem of explaining brain function in te- of brain

stmcture. At the heart of these brain models is the peroeptron leaming theoran

(Rosenblatt . 1962) which bolw asserts:

Given an elementary alpha-perceptron. a stimulus world W. and any

classitication C(W) for which a solution exists: let al1 stimuli in W occur in any sequence. provided that each stimulus must reoccur in finite tirne: then beginning from an arbitréuy initial state. an error correction procedure will

always yield a solution to C O In finite time.. . . (p. 596)

Simply put. the percephun leaming theorem states that the perceptron convergence procedure will aïways terminate after a finite number of iterations.

That is not to Say that it will always terminate with success. as indicated by Rosenblatt's oft ignored proviso 'for which a solution existsw. In fact. the

single-layer alpha-perceptron (the simplest of Rosenblatt's leamirig models) is

limited to the capture of second-order ~tnicture.~ a limitation exploited to the

fullest in an (in)famous book caîled Perceptmns (Minsky and Papert. 1969). A

mathematical tour-de-foze. PercepSOm argued that pardel machines are beset by many of the same problems of scale as seriaï machines. In particular. it

demonstrated that the alpha-perceptron is unable to calculate basic functions such as par@ and connectedness.' without using an absurdly large number of

'predicates* (i.e. binary threshold units). But whiie Minsky and Papert pwed that an alpha-perceptron is limited with respect to the set of functions it could cornpute. their oieorans stmpty do raot app4 to rmdttkqer system nor to systans

that allow feedback Ioops (Rumelhart and McCleiiand. 1986).

That is to say. the patterns to be d&ed must be limarty separabie (Rumelhait and McCWland, 1986). ' As Minsky and Papert use the tenn, the topofogical predicate (fundion) 'connededness' arb whether a figure cm be d m without lifüng pencil from paper. Thai is. are ai points mat ae 'on' connected to di other points that are 'on' either diredly or via othw pahts that are 'on' (Rumeihart and McClelland. 1966).

5.3 Correlation Matrix Mernories

Ihe correlation matrix model of associative memory is an extension of the anatomy seen in Rosenblatt's 'perceptrons'. Not O- did it reflect the growing body of experimental evidence that neurons are more accurately desa i id as

analogue (rather than binary) devicesq5 it also incorporated neurophysiological

principles such as reentrant connectivity (Edelman. 1989). which had been discovered subsequent to the pubiication of Rosenblatt's seminal work. n i e

basic neural element thus became an analogue integrator with continuous valued output. ailowing it to be modeiled by simple maMx multiplication.

Receators Associators

mure 5.1 : Correlation matrix model of associative memory (Kohonen. 1972).

The input vector consists of two parts: a key field and a data field.

' Unlike earlier modets (8.g. Roseciblafi, 1958) which viewed neural a d M t y as an alla-none (Le. b inq) event, the oorrelation matr8c rnemory mode1 assumes that the adMty of a neuron is proportional to the sum of the synaptic weights times the adivity of the innervating neurons.

Page 57

As depicted in Figure 5.1. in a correlation rnatrix memory the input pattern is divided into two parts: a key field and a data field. Ail signals of the key field. taken together. form a key vector denoted by xp. where the subscript p is a discrete-time index labelling a parücular input pattern. Sirnilarly. when taken together. the signals comprising the data field form a datum vector. yp.

The memory elements (associators). labelïed by the index pair (i, j) reflecting the ith element of the key field and the jth element of the data field. are determined by a generaïisation of the Hebb rule in which synaptic strength changes in proportion (p) to the product of pre- and pst-synaptic activity:

5.3.1 Comnlete Correlation Matrix Mernorv

if connections exist for al1 possible key-data pairs the memory is caiied a c~rnplete mrrelabion mabus m r y (Kohonen. 1972). in this case the contents of the associators are described by the Hebb mle taken across aU pairs p:

The recall of a particular datum p(,) is effected by the transformation:

if ail of the key vectors are orthogonal then recd will be perfect (i.e. y(,) = y(,)).

However. if the keys are not orthogonal they give rise to cro~stalk.~

5.3.2 tncorn~le te Correlation Matrix Memory

Since the number of elements in association matrix W is the product of the dimensionaiity of its key and data vectors. the size of the association rnatrix may grow impractical as the dimensions of the key and data vectors increase. It would be helpful. therefore. to be able to use a subset of the elements in W

to represent the complete correlation matrùr. Fortunately. this is possible. Foliowing the method originaliy outlined by Kohonen (1972). one needs Girst

to d e h e a set of sampiing coefficients Sij which take on the value of one at al1

(randomly) sampled elements of the comelation matrix. and zero elsewhere. Thus. the incomplete correlation ~ x l i i t r i x is defined by the expression:

Analogous to 5.03. the recalled pattern is given by the equation:'

In order to determine the noise (statistical error) introduced by the use of a randomly sampled incomplete matrix instead of the complete correlation rnatrix model. it is necessaxy to derive expressions for the apected value (E)

and variance (var) of the patterns recalled fiom memory (Kohonen. 1972). The fideîity of recaU can then be described by the relative standard deviation.

' Here m is the number of eiements in the key vector.

It is here stated. without further proof.' that the relative standard deviatiori

(rsd) of the recalled data. for all j. is given by the expression:

rsd = J

where O < 5 ' 1 is the probabiliw that Sij = 1. TO see precisely what this rather

complicated expression means. consider the case in which there is only one memorised pattern. and the key vector consists exclusively of elements from the set {- 1. + 1). If. as prwiously. we defùie m as the number of elements in the key vector. n as the number of elements in the data vector. and s as the nurnber of sampled matrix elements. then the relative standard deviation is given by the much more manageable expression:

rsd = J[(m - s) + ms] (5.08)

If 5 << 1 and m >> L. as is usually the case. and if the maximum rsd i s o. then

the minimum number of sampled elements (s) is @en by the equation:

As an example. if a = 0.1 then s 2 100n. In general. the number of elements

sampled i s directly proportional to the number of elements in the data vector

(Kohonen. 1984) .'

" 0 interested reader is refened to the proof offered in Kohonen (1972). Note that the dimensionality oi the key vector plays no mie in the detemination of the minimum number

of sampled elernents- a fact which takes on partiailar relevance if the key vector has many elernents.

5.3.3 Autoassociative Correlation Matrix Memorv

Because üze infomtation in a mrretnfinn m a n is e d e d in radundant f o m 'stored' items can often be retrieved with an incomplete retrieval key (Kohonen. 1972). This fact is partîcularly crucial when irnplementing an autoassociative memory: for in an autoassociative memory any part of the

input (Le. stimulus) pattern might be used as the retrieval key.

To implement an autoassociative memoxy. simply substitu te the input (key) vector for the output (data) vector during memorisation. in other words. the contents of the associaton are descrîbed by the equation:

where. as before. y is the nomialising constant (i.e. learning rate). in this way

wery memory element becomes connected to two different elements in the input field. in effect. mixing up the key and data fields. Pattern recall is then elicited by the equation:

where zi is one for known elements in the key pattern. and zero otherwise. The incompiete correlation matrix memoxy recaii equation is sirniiariy dewed:

Page 61

5.4 The Noveltv Detector

'The novelty detector is an instance of the lowest-order. nontrivial model of leaming and meniofy (Kohonen. 1984). These equations take the general fom:

where a(.) and p(.) are (possibly nonlinear) scalar functions of W. x. and y." in

the case of the novelty detector. this general equation is manifest as:

where w is the weight vector, x is the input vector. and y is the output vector. Unfortunately. 5.14 typically either 'blows up' (i.e. diverges) or converges to the zero vector. making it of little practical value as a learning model. However. if

the forgetting t e m is ignored altogether (by setting 6 = 0). and if the learning

constant. p. is set to a small nqat iw value. sornething rernarkable happas: the weight vector converges to a M t e non-zero vector of the f o m [P . w(O)] . the rnatrix P being an orthogonal projection operator (Kohonen. 1984). Thus. the system equations which define the novelty detector are given by:

' O The t m 'g(.)wl represents forgetting. As Kohonen (1984) observes, "Pure forgetting or decay effects in naturai phenornena ae usualty proportional to the prevaiting values of the vafiables, although, again, the rate may be a general function of dl system parameters which indudes nonlinear effects." (p. 92)

This simplified form permits the following exact solution:

where m(t) = 1~11-~(1 - éh) and h = -p ~ ~ l l ~ ( t - t ~ ) . This can be generalised to the

case in which a finite set of input patterns. with an arbitrary fiequency of occurrence. and order are integrated mer the half-open intervals [tk. 1. tk). If i t is assumed that is replaced by tk- equation 5.17 becomes:

which. when appiied recursively to the set of input vectors. yields:

(S. 19)

Provided the leaming rate (p) fails with certain limits. '' equation 5.19 always

converges to a projection rnatrix P. such that w*(tdP is orthogonal to di of the vectors in the input set (Kohonen. 1984). Thereafter. if one of the known vectors (or an arbitrary linear combination) is input. the corresponding output wiii be zero. If, however. a new (i.e. novel) input pattern is chosen. the output wiïl be non-zero. This is how the novelty detector cornes by its narne. As Kohonen (1984) observes. 'If this phenornenon were discussed within the context of experimental psychology. it would be termed haMîm&m.' (p. 101).

" The learning rate must conforni to the folbwing Iimits:

1, § ~ ~ ~ ~ 2 1 1 ~ l 1 ~ - §

The Noveltv Filter

Although the simplicity of the novelty detector makes it the prefemd choice (by Occam's razor) as a model of habituation. unfortunately it is a rather slow leamer. Kuhonen (1984) has determined that the speed of convergence is

roughly hversely proportional to the dimensionality of the input patterns. with a radical reduction in the rate of convergence o c c m g as the number of patterns approaches their dimensionality. Fortunately. convergence can be greatiy accelerated by the inclusion of negative feedback information:

Acceleration of conversion is achieved by the inclusion of negative feedback in the system. ... iTjhere exïsts one parücular type of feedback which guarantees extremely prompt convergence: the asymptoric state of

the optimal novelty filter can be achieved in one single cycle of application of the input patterns. (Kuhonen. 1984. pp.109-110)

in other words. instead of taking independent novelty detectors. Kohonen ( 1976) constructs a faster learning model (which Kohonen calls a novelty filter) by assuming mutual interaction between units:I2

nie &st equation states that the output signals are linear combinations of the input. x. and feedback signals. That is. wery element in the output vector is assumed to receive a feedback from a.U the other elements through a set of variable (Le. adaptive) weights. in pursuance of the system principles outlined for the novelty detector. these feedback connections are assumed to adapt in proportion to the outerproduct of output vector.

' 2 Note that this model is based on collective phenornena (Kohonen, 1984).

While the inclusion of feedback information greatly improves the rate of convergence. it also considerably compiicates a theoretical analysis of the learning model. ?he overall transfer operator is descnhed by the square

rnatrix. LI. solved nom the implicit feedback equations:

And the daerential equation for f is the matrix Bernoulli equation:

which has a stable asymptotic solution when C( 2 O (Kohonen. 1984). From a iinear dynarnfcs perspective. Kohonen's novelty filter can be seen as

an ongoing cornpetition between externally applied forces (the input vectors xi) which endeavour to dislodge the system h m its c m n t state. and a set of internai dynamics (embodied in matrix W). A stable solution is obtained when a state of equilibrium is reached such that the forces acting upon the system are precisely counterbalanced by the intemal forces. More precisely. the weight

matrix W can be written as the surn of the identity matrix. which txies to keep the system where it is. and a second matrix (U) which tries to move it about:

Remanging 5.26. one can see that equilibrium is oniy achieved when:

Kohonen's leaming mle (5.23) admits a similar dynarnic interpretation. For

if y defines the state of system U. then subtracüng the covariance rnatrix yyT from U shrinks the distribution of y's dong the h t principle component and expands the second principle component, makhg the distribution of the Ys more spherical (Hinton. 1996. personal communication). In effect this shrinks

the system state so that any novel force acting upon the s y s t m generates an exaggerated response (Hinton, 1996. personal communication).

5.6 The Enhanced Noveltv Filtef

Although derived fiom Kohonen's seminal model. the ENF introduces an important innovation which not oniy greatly simplifies the system equations . it aiso substantially accelerates leaniing. For like Kohonen's novelty mter the

ENF converges in a single epoch. However the ENF goes the n~velty filter one better. achieving its asyrnptotic state in a single iterntion!

The basic argument is that learning muiimall . . y involves extracthg the novelty in any @en situation. The ENF is a device which models this process. Formaily. the proposed ENF learning model is defined by the system equations:

where Mx] is a rnapping of input vector x ont0 a vector in n-dimensional phi

space. W is the connection strength (i.e. weight) matrix." and p is the leatning

rate which is dynamiozlIy determined by means of the equation:

"The (square) weigM maMx W is initiaiised to the identity m m (1,).

In order to get a sense of the ENFs leaming dynamics. it is helpful to interpret the system geometrically. First note that. when equations 5.29 and 5.30 are combined. the ENF weight update mle is given by:

Since the weight matrix is initialised to the identity maVix (i.e. Wo = 1). on the

initial pass equation 5.31 can also be written as:

This is the standard matrix operator for rotating subspaces (Kohonen. 1995).

That is to Say. if y is an arbitrary vector in Rn. equation 5.32 d l generate a matrix which projects any vector in a given subspace S onto a new subspace S'

in which dl of the vectors are orthogonal to y (Kohonen. 1995). The projection of y onto S' would be a zero vector. That 5.31 is a general means of generating orthogonal projection operators can be derived from Grevüle's theorem for computing the pseudoinverse (M+) of a general matrix (M):

(1 - Mk- M+k- 1 lmk Pk= if the numerator is + O

where M+, = rnlT(mlT ml 1-l if ml is a nonzero vector and M+l = oT if ml is the

zero vector. Specificaiiy. if Xk is a matrix with xi. x2. . . . . xk itS columns. and

it is partitioned as 1 Xk 1. it foliows from Greville's theorem that:"

(O. 34)

where q, is as defmed for equation 5.33 (above). with the rn-vectors replaced by corresponding y-vectors. It further foliows that:

where (1 - Xk-,Xck- ) is the projection operator on the space orthogonal to the

space spanned by the xl . . . x ~ . ~ . and (1 - X$k) is the corresponding projectiori operators (with x, ... xk its spanning vectors). Assumllig that the recursion

starts with Wo = 1. then equation 5.35 c m be simplitied into the form:

where yk = Wkeixk (Kohonen. 1984). This is. of course. the equation set which we sought to derive. i.e. the equations which defîne the ENF.

" A more detailed discussion of the following derivation (equations 5.34 and 5.35 is beyond the scope of the present work. lnterested readers are referred to the excellent review in Kohonen (1984) .

Page 68

There is an interesting relationship between the ENF and the Widrow-Hoff

( 1960) de& rule.15 The goal of the learning pmedure is to generate an output

value of zero for any known input. This goal can be formalised as foiIows:

where % is the ih input element. W. 1s the weight connecting inputs i and j. 1J

and AWij is the desired change in this weight given by the delta le:'^

Since in the ENF the following relationship holds:

mitkrespest to a @en W . output veetor y is t$iiely eqamht top the pinput

vector x. This means that equations 5.36 and 5.37 could have been written:

and

'' I wish to thank Dr. Hinton for pointing out this relationship. l e Note that the target value in the ENF variant of the ddta rule value is constant (i.0. zero).

which is. of course. Kohonen's (1984) noue@ Wer learning rule. Substituting

5.40 into 5.39 and distributing yi across the bracketed terms we get:

or equivalently:

Since ZWijyi = Yj. equation 5.42 can be further sirnplitied to:

This leads directiy to the learning rate defdtion:

which is, of course. the proposed leaming rate rule (equation 5.30):

It must be admitted that there is a small slight of hand involved in the last

derivation. For whiie it is true that x and y vectors are 'e'effectively equivalent" in that both x and y generate emdty the same output vector when multiplied

by the connection strength (i.e. weight) matrùr. there is one important sense in which they are not equivalent. By passing an input vector through the weight

rnatrix (Le. by taking the inner product of the input vector and weight ma-) the resulting output vectors are guaranteed to be orthogonal to each other!

That is to Say (in vector notation):

Thus. if al1 of the weights in mat* W are changed in proportion to a vector yq that is orthogonal to yp, then Wxp = O. Moreover. since yq is orthogonal to al1 previous input vectors, yq is guaranteed not to compt any previous e n m . In essence. the ENF is just the delta mle using the novel part of each input.

5.6.1 The Ca~acitv of the ENF

As Nilsson (1965) noted. one measure of the effectiveness of a discriminant function farnily is the total number of dichotomies of n patterns its manbers c m effect. That is. suppose we are @en a phi function with d+l adjustable weights and a set of n patterns in phi general position" in the pattern space.

capacity can be dehed in terms of the probability that a randomly selected

dichotomy can be implemented lfor sane s&hg of the mights) by the function. However capacity. so defùied. is statlc. A more dynamic sense of capacity is provoked by the question: 'How many patterns can the phi function leam?".

" Phi generai position simpiy means that the points in phi space ae not coifinear. More fmally, for n > d we say that a set of n points is in general position in a dimensional spaœ iff no subset of d+l points lines on a d-1 dimensional hyperplane. When n s d, a set of n points is in generai position if no n-2 dimensionai h yperplane contains the set (Nilsson, 1 965).

Page 71

One way to approach this question is to examine the underlyirig dynamics of the connection maMx (in particular, its eigenvalues) as leaming progresses. Consider the farniliar set of sïxnultaneous linear equations:

As noted prwiously. this equation defines matrix W as a linear rnapping fÏ'om vector subspace x to vector subspace y. If W is singular. some subspace of x (its nul1 space) maps to zero (i.e. Wx = O). The number of linearly independent vectors x which can be found in the nuli space is caîled the huUiw of W.

Conversely. the rank of W is the subspace of y that can be reached by W.

Now it is a fundamental theorem of rnatrix algebra that the rank of matrix W plus its nullity must always equal its dimen~ionality.'~ That is to Say. if W

is an n by n matrix then its rank plus its nullity equals n. What does ai i this

have to do wi th the ENI? Weii. the ENF ir-zcre~ses the nullity of W. Specitically. in order to mernorise each input vector. x. the ENF must adjust the weighting

coefficients such that Wx = O. And as the aforernentioned theorem States. in doing so it sirnultaneously decreases the rank of matrix W. in effect. the ENF

'uses up" dimensions as it leanis (Hinton. 1996. personai communication). '' This is reflected by the changes in W s eigenvalues as each pattern is rnastered. For instance. @en the randomly (- 1 < xi < 1) generated training set:

Pattern hput 1 2 3 4 5

I 0.0277 -0.6566 -0.8322 0.0707 0.6459

' O If the maûix iç non-singular then its range will be ail of the vectors in y so its rank is n. 'O Less colloquiaily, on each training example the systern adds dimensions to the nuIl spaœ.

and assuming a Uear phi function is used. subsequent to training on the h t

input pattern. the weight ma= is:

The comsponding eigenvaiues and eigenvectors are:

eigenvectors gigenvalues 1 2 3 4 5 6

Foiiowing training on the second input pattern the weight m a t e is:

Page 73

with the corresponding eigenvaiues and eigenvectors:

Eigenvectors Ei~envalues 1 2 3 4 5 6

After all five patterns have been rnemorised the weight matrix is now:

generating the eigenvalue and eigenvector set:

Eigenvectors Ei~envalues 1 2 3 4 5 6

Page 74

Note that the system's capacity has been reached: all of the eigenvalues but

one are now (effectively) zero. in fact, if asked to mernorise another pattern its weight values aii go to zero (erasing aU stored engrams). Thus. the capacity of the linear ENF is limited to the number of elements in its input vector or. equivaiently. the dimensionality of its weight rnatrix minus one.

With this Limit in mind, in order to calculate the ENFs capacity for other members of the phi family. it is necessary to first determine precisely how the

dimensionaiity of a pattern changes as a result of applying the phi function. This. of course. depends on which phi family member is being applied. Since the family which is of primary interest in the present work is the family of rth-

order polynomials. let me illustrate by d e t e m g how the dimensionality. hence capacity. of a pattern changes as a result of subjecüng it to polynomial expansion. Fortunately. Nilsson (1965) has already worked this out:

(n + r)! d = - 1

n! r!

Equation 5.45 states that the dirnensionality of an n-elernent vector subjected

to a n rth-order polynomial expansion is a function of the binomial coefficient of the number of elernents in the vector and its degree of tensor expansion. This can, perhaps. best be seen when equation 5.45 is reformulated as:

where the function binomial-coefficient(n + i - 1. il is dehed as:

( n + i - l)! / (n - Il! i!

Page 75

As table 5.1 shows. a network's capacity grows quite rapidly :

degree of expansion #inputs 1 2 3 4 5

Table 5.1: Partial listing of the capacity of an f i-order polynomial ENF.

Unfortunately so. too. does the number of elements in the weight matrix. In fact. the weight matrix grows in proportion to the square of the dimensionality

of the input vector120 It may. however, be possible to reduce the computational

load imposed by this exponentiai growth in the number of connections using a variant of Kohonen's inamplete mrrelatfon IimtrLu paradigm (see section 5 A.2).

if so then the number of updates (Le. elements sampled) is merely proportional to. rather than the square of, the number of elements in the data vector.

Actually, the number of input vector elernerits plus one (Le. the threshold) squareci.

Page 76

5.6.2 The ENI+ AbiUtv to Generaiise

if a learning model has too many f?ee parameters relative to the number of cases in the training set. rather than leamfng the basic stnictures in the data (thereby enabling it to generalise weil). the mode1 learns irrelevant details of individual cases (Masters. 1993). This is known as overfitting (see Figure 5.2).

Figure 5.2: (a) A good fit to noisy data. (b) Overfltting of the same data. The fit is good on the -training setœ (represented by the Xs) but

likely to be poor on a 'test setw (represented by the circle).

A useful rule of thumb. directly foiiowing fYom Occam's razor. is that if two models fit the data equally well. the simpler model probably generaiises better (Hinton. 199 1. lecture notes). More concretely. a model's cost function should take both error (i.e. data misfit) and complexity into account:

cost = error + A complexity

Page 77

The proportionality constant i~ can be detennined empiricaliy using a simple

yet pciwerhil statistical technique known as cross-uuW&on. Cross -validation uses the rnodeî's performance on a validation set2' in order to determine the

appropriate proportionaiity constant to employ in equation 5.48. Somewhat surprisingly, however. it turns out that when cross -validation is used. the cornplexity measure can be discarded altogether (Hinton. 199 1. lecture notes)! The model's performsuice on the validation set aione serves as an adequate guide as to when to stop training. Of course. c m must be taken to ensure the

training and validation sets are representative of the same population.

5.6.2.1 The Leave-K-Out Method of Cross-Validation

A parücularly usehl technique for detemiining whether or not a leaming model wiii generalise properly is known ai the hve-k-out methocl. Often used

in discriminant analysis . the Ieave- k-ou t method assumes that the researcher has a collection of samples from several known categories. A smaU fraction of the known cases. often just one. is held back for testing. The remaining cases are used for training. Masters (1993) picks up the story From here:

This is a fair. unbiased test of that particular discriminator's capability.

Unfortunately. since just one or at most a few cases are tested. the random error inherent in the sampling procedure guarantees us a high probability of significant emor in the performance estimate. So we do it again. this time holding back a different s m d subset. By repeating this as rnany times as are needed to test evexy known case. we have made optimal use of our data.

yet we still have a performance estimate that is not biased by mer using the same case for both training and testing - very clever indeed. (P. 13)

*' The validation (test) set is a collection of representative input patterns to which the model bas not previously been exposed (i.0. patterns whicb were not used in training the model).

Page 78

5.6.2.2 The Jets and Sharks

AU that remains is to detemine the input pattern set to be used to evaluate

the ENTs generalisation capabilities. A rather interesting pattern set is J. L. McCielland's ( 198 1) Jets and Shnncs leaming task (see table 5.2).

Name Art Al Sam Clyde Mike Jim G w John Doug Lance George Pete Fred Gene Rdph Phi1 ike Nick Don Ned Karl Ken Earl Rie k O1 Neai Dave

Education Jr. High Jr. High CoUege Jr. Hïgh Jr. High Jr. High High School Jr. High High School Jr. High Jr. High High School High School Coiiege Jr. High College Jr. High High School Coliege Coliege High School High School High School High School Coiiege High School High School

Marital Status

Single Married Single Sîngle Singie Divorced Manied Manied Single Married Dlvorced Single Singie Singie Single M e d Shgie Singie Married Married Married Singie Married Divorced Married Singîe Divorced

'Occu~ation" Pusher Burglar Bookie Bookie Bookie Burglar Pusher Burgïar Bookie Buîgiar Burglar Bookie Pusher Pusher Pusher Pusher Bookie Pusher Burglar Bookie Bookie Bur#ar Burglar Burglar Pusher Bookie Pusher

mu Jets Jets Jets Jets Jets Jets Jets Jets Jets Jets Jets Jets Jets Jets Jets Sharks Sharks Sharks Sharks Sharks Sharks Sharks Sharks Sharks Sharks Sharks Sharks

Table 5.2: ?he characteristics of a number of indMduals belonging to two

street gangs - the Jets and the ~harks .*~

22 FrOcn "Retrieving General and Speafii Knowledge From Stored Knowledge of Specificsn by J. L. McClelland. 1 98 1 , Prmedings of the Third Annual Con ference of the Cognitive Science Smety.

As Table 5.2 notes. the Jets and Sharks are two hypothetical street gangs. Each gang member is descriid by set of characteristics: first name. age range. educational level, %ccupationw (criminai specialisation) . and gang -1iation. To generate the bipolar patterns used to train and test the ENF the following

coding scheme was ernpl~yed:~

1. The 27 gang rnembers were simply encoded by the 27 element sequence:

Art = +l -1 -1 ... - 1

Al = -1 +1 -1 ... -1

2. The three age range categones were given as:

3. The educational levels were simiiarly coded as:

Jr. HighSchool = +1 -1 -1

Hi@ Sch001 = - 1 +1 - 1

Cokge = -1 -1 +l

4. The marital status categories were assigned the values:

Single = +l -1 -1

Married = -1 +l -1

Diwrced = - 1 -1 +1

'' More sophisticated coding saiemes am. of course. possible. Many of these schemes reduce the dimensionality of the input vectors. However, since the patterns are orthogonal (the narne field is unique) wh& rneans that a linear phi functbn can be used, t h e - aiternate schemes are unnecessary.

5. The 'occupation" [criminal specialisation) categories were coded as:

Bookie = +I - 1 -1

Burglar = -1 +1 - 1

Pusher = - 1 -1 +1

6. Fuialiy, the gang affiliation uses the single bipolar coding:

Jet = +1

Shark = -1

Since the resulting (40 element) pattern set is linearly separable. a linear phi function wiU suffice to test the ENFs generalisation capability. Using the leave- k-out procedure outlined in section 5.6.2.1. each candidate gang mernber is classified as either a Jet or a Shark based on their reporteci characteristics. The results of this experiment are shown below:

nanie

Art Al Sam Clyde Mike Jin G w John Dow Larice George Pete FYeci Gene

gang - actual estirnated Jets Jets (-0.19) Jets sharks(0.121 Jets Jets (-0.12) Jets Jets (-0.17) Jets Jets (-0.11) Jets Jets (-0.26) Jets Jets (-0.02) Jets Jets (-0.27) Jets S h k s (0.27) Jets Jets (-0.27) Jets Jets (-0.26) Jets Jets (-0.13) Jets Jets (-0.15) Jets Jets (-0.15)

narne M P ~ P hi1 Ike Nick Don Ned Karl Ken Earl Rick 01 Neal Dave

3ctuai estimated Jets J e l (-0.13) Sharks Sharks ( O. 14) Sharks Jets (-0.32) Sharks Sharks ( 0.06) Sharks Sharks [ 0.28) Sharks Sharks ( 0.15) Sharks Sharks ( 0.05) sharks Jets (-0.27) Sharks Sharks ( 0.20) Sharks Sharks ( 0.3 1) Sharks Sharks ( 0.14) Sharks S h a h ( 0.09) Sharks Sharks ( 0.14)

'Eible 5.3: Results of leave- k-out evaluation of ENF's generaiisation ability. The bracketed values are the ENF's classification estirnate.

Page 8 1

As 'Pable 5.3 indicates. the ENF comectly idenaed the novel gang membem in 23 of the 27 cases (Le. 85% accuracy). It is interesting to examine the cases

on which the mode1 emed. Based on the distribution of attrfbutes in Table 5.4, using Bayes' theorem one can calculate the probability of gang rnembership for each misclassified rnember .

Attribute EahE J s ? Sharks

Education: - Jr. High School 0.60 0.08 - High S c h d O. 27 0.58 - Coilege O. 13 0.33

Marital Status: - Singie 0.60 0.33 - Married 0.27 0.50 - Divorced 0.13 0. 17

Occupation: - Bookie 0.33 0.33 - Burglar 0.33 0.33 - Pusher 0.33 0.33

Table 5.4: Percentage distribution of attributes for each of the gangs.

Since the mfsclassified rnembers are Al. Doug. fie. and Ken, the probability

cornparisons of interest are:24

Al: Prob(Jet 130'9, JH. M. Bu) vs. Prob(Shark130's. JH. M. Bu)

Doug: Prob(JetI30's.HS.S.Bo) vs. Prob(Shark130's.HS.S.Bo) ke: Prob(Jet[30*s.JH.S.Bo) vs. Pmb(Shark(30's.JH.S.Bo) Ken: Prob(Jet 1 40's. HS. S . Bu) vs. Prob(Shark 1 40's. HS. S. Bu)

24Abbrevi~ons used: JH = jr. high, HS = high çchool. S = single, M = manid. Bo = bookie. Bu = burglar.

The result of the

Al Doug Ike Ken

Bayesian analysis is shown below (Table 5.5):

Prob(Jet 1 attributes) Prob(Shark 1 attributes)

0.64 0.36

0.27 0.83

0.85 0.1 5

0.87 0.13

Table 5.5: Probability of gang m a t i o n for the rnisclassified members .'5

Apart from Al (the misclassification of whom is truly a mystery). we see that the attributes which define Doug. Ike. and Ken. rnake them much more typical of the alternative gang than of the gang to which they belong. That is to say.

Doug is more representaüve of a Shark than a Jet: and ike and Ken are more typically Jets than Sharks. mus it is hardly surprising that the ENF classified Doug. Be. and Ken as it did. Indeed. it would seem to be the only reasonable decision. @en the information at hand.le It is difficult. therefore. to consider

the misclassification of Doug. Ike, and Ken as a failure of the ENFs ability to generalise. Anthropomorphfsing. the ENF simply 'felt" that these indMduals were better suited to their mal gang. And based solely on Srpicaüty. the ENF is correct. Ln short. Doug. Ike. and Ken rnay have joined the wrong gang!

"Bayes' theorem r n a b no assumptions about the conditionai independence of the features given the classes. However, for mputational convenience, it is often assumed that the features provide independent evidence for the c i m . That is, it assumes the probability of the joint occurrence of A and 8 given H is just the product of the individual probabilities, Le. P(A,B IH) = P(A IH)P(B 1 W. When one considers the fadors underlying the Jet-Shah situation, there may wd1 be interdependencies among the fadm (e.g. education levei and 'occupation'). But as Schmitt (1969) has observed: "In a very tenuous serise d things on eartti are relatecl. As you lean wer and tum the page of this book. the of rotation of the earth k shifteâ and the telescope ad Mount Pdoma. goes a bit off focus." (p. 50). in short, independence & dways a rnmw of degree. And in relative ternis, an assurnption of independence for the Jet-Sharks factors would seem warranted.

Of course the €NF cm (and does) mernorise ttiese abenant cases when they are part of the training set.

Page 83

'Concept' is a vague concept (Ludwig Wittgenstein)

6. The LeaminQ Tasks

There is Little agreement on precisely how non-associative leaming. which

includes habituation and sensitisation. and associative leaming. which includes classical and operant conditioning. relate to one another. ?hem is even less agreement on how conditioning relates to higher for= of thought. As noted previously. a growing body of neurophysiological evidence suggests that the mechanisms of the complex forms of learning might be generated fkom combinations of the rnechanisms found in habituation and sensitisation. That is. that more complex f o m of leaming are generated from an 'alphabet'

of the mechanisms of lower forms of learning (Hawkins and Kandel. 1984). In

this chapter 1 present a progression of experiments which demonstrate how a habituation-based 'alphabet' might corne to 's peu-out' higher-order cognitive

processes such as concept formation and even rational thought.' Although the details diner somewhat h-om experiment to experirnent, the leaming tasks al1 follow essentially the same methodology. First. the stimulus pattern is functionally enhanced using rth-order polynornial expansion. The enhanced stimulus vector is then augmented with its associated response vector (if any).

and imprinted ont0 the ne t~o rk .~ When aU of the patterns in the training set

have k e n successfully mastered the network is again presented with the enhanced stimulus segment (or a variant) only this time with the response elements set to zero. The challenge facing the learning mode1 is to regenerate the missing values with on& the (possibly degradedl pattern as a retrieval key.

' A cdlÏd evaluation oi the ENF as a mode1 of habituation is provided in Appendix D. Although the ENF does display the defining feature of habituation (i.0. deaease in responding with repeated stimulation). it was found severely wanting with respect to its abiiii to reproduce severai other prominent features of habituation observeâ to occur in nature (0.g. dishabituation). ' It is important whether the functional enhancement precedes or follows concatenation; for if it follows concatenation then the amount of missing information grows as a fonction of the degree of expansion.

The Paritv lkro Problem

Ever since Minsm and Papert's (1969) scathing review of Frank Rosenblatt's seminal 'perceptron* learning rnodels. the parity two (i.e. exclusive -or) problem has become a classic test of a learning rnodel's basic mapping capabilities. Its

attraction largely stems from the fact that it is both simple and nonlinear. i.e. although a line paNtioning me and m s e is readiiy found for the inclusive-or tmth table. no Ilne can be drawn in the exclusive-or paradigm (see Figure 6.1).

u u r e 6.1 : An illustration of the inabiiity to partition the mlusioe-or (xor)

truth values by means of a linear discriminant function.

To effect the excIusfve-or tnith table. one must consider input intercLctiOrzs.

These interactions can either manifest in the form of a multilayer network or. as in the experiment to follow. by fÛnctionaUy enhancing the input pattern. in this case, a simple conjunction of the inputs (i.e. a second-order polynomial phi function enhancement of the input vector) changes the sirnilariSr mapping'

sufficiently to aiiow the solution to be learned.

' The essentiai charader of a lin- network is that it maps similar input to similar output patterns. AS Rurnelhart et d. (1986) note: "This is what ailows these networks to make masonable generaiisations and perfotm reasonably on patterns that they have never before been presented." (p. 318). Rurnelhart et ai. further suggest that, because the input pattems are mapped diredly to a set of output patterns, there is no internai representation (ibid, p.318). But this does not necessarily follow; to wit the abilrty of (linear) constraint satisfaction networks to regmerate missing segments from degenerate input.

6.1.1 The Training Set

- Stimuli - Pattern X 1 X2 Resmnse

6.1-2 The Test Set

- Stimuli - Pattern X 1 *2 Resmnse (unknown)

6.1.3 Results

As c m be seen in Table 6.1. the leaming mode1 reproduces (in negative) the

correct response classification in each and every case.

Elements Pattern Threshold xl x2 xI2 ~ 1 x 2 xZ2 Response

Izable 6.1: The parity two (exclusive or) output matrix.

Notice that only the element correspondhg to a conjunction of the inputs (i.e. 2~~x2) generates a non-zero output value. This suggests that a jlo7bion's

degree of actWation reflecCs its o m a ü cvntribtctiDn to the response regmemtbn: at zero the junction contributes no useful information at all. In principle it should. therefore. be possible to selectively prune these junctions and still

reliably regnerate the associated response. Although speculative. this notion accords rather well a growing body of theory that draws an explicit analogy between the development of neural connections and natural selection:

A substantial literature has accurnuiated on the establishment of adult connectivity by a process of selection from a more extensive early repertoire of neural connections (Change- et al.. 1973: Changeux and Danchin.

1976: Edehan, 1978, 1981, 1982. 1985. 1987: Edelman and Finkel. 1984:

Young. 1979: Changeux et al.. 1984: Ebbesson. 1984: Changem. 1985:

Toulouse et al.. 1989). Evaluation of this body of work is particularly important. because much of it has been written for a wider audience of biologists and non-biologists which. by and large. has received this idea with enthusiasrn. These theories. put fornard to explain the brain or rnind

in manifestations as diverse as perception. memory. consciousness. and h e will. are necessarily dlfferent in their particulars. The theories share. however. the notion that development generates an initial set of neurai connections that is ultimately reduced, anatomicaiïy or functionaily. to a more restricted, permanent set by the selection of some neural circuits and the regiession of others.' ... Theories of selection from an initiai repertoire

hold that a large number of co~ect ions are made initially. but that only the usefiil ones are retained. (Pums. 1988. pp. 169-170)

' A major point of contention 6the period over which this "neural Darwinismn occurs. Dale Purve~' (1988) trophic theory of neural connections maintains #at development involves the ongoing creation of connections, not only for a period of earfy life (as di the aforementioned theories contend) but, in many instances. on into maiunty. To quote Dde Purves (1988): "hem the trophic perspective the repertoire of connections is never complete, and the regtession of some connections Is an inevitable consequence of the way in which neural adjustment ocairs, connections hing continuaily made and broken." (p. 170)

The discovery of the blocking effect was of major import: for it showed that stimuli must contain information (novelty) in order to associate. As Hawkins (1989) put it:

?he discovery of blockhg was very influential in the history of thinking about conditioning, because blocking demonstrates that animais may not acqyire a conditioned response despite rnany pairings of the CS and US.

This resuit suggests that o o n d ~ ~ is not sbnp4/ an outcorne of stbmùs piring but rnay instead invobe cugnûive p ~ ~ ~ ~ s e s . For example. Kamin (1969) proposed that an animal f o m -RS about the world.

compares ment input to fhose aq>ectatlorzs. and Zmms only when something

unexpeicted OQCL~S. (p. 93. emphasis added)

The blocking expriment has two stages. The first stage involves the pairing of a conditioned stimulus (CS 1) with an unconditioned stimulus (US) as per Pavlov's classic paradigm. In the second stage, a new conditioned stimulus (CSz) is added to CSI. and the compound stimulus is then paired with the US.

This generaliy produces Uttle or no conditioning to the new stimulus. although good conditioning occurs to CS2 if CS, is omitted in stage two (Kamin. 1969).

Aithough the predse mechanism underlying blocking is a matter of some considerable debate, Rescorla and Wagner ( 1972) have suggested that it implies

that the associative strength of a CS. in effkct. is subtracted from the strength of the US with which it is paired. More accurateïy, the Rescorla-Wagner nile

States that. on each trial. the change in the associative strength of the expectancy of a particular CS for a given US is proportional to the dinerence between the actual occurrence of the US on the trial and the totai expectation of the US on that trial.'

' The Rescorla-Wagner rule is fomieriy equivalent to the Widrow-Hoff deRa rule (Sutton and Barto, 1981).

Page 88

The Trainine Set

It is relatively straight forward to replicate the conditions necessafy to elicit the blocking effect. AU one requires are two patterns which. when presented in

isolation. reliably regmerate their associated response class. These patterns

are then concatenated into a compound stimulus. and the response appended. in the rirst stage. when oniy one stimulus pattern is pairrd. the compound stimulus is presented with its W a t e d pattern set inactive. In stage two the complete compound stimulus is paired. This yields the following training set:

Pattern Cs1 -2

6.2.2 The Test Set

Assurning the above training set. the appropriate test set would be:

Pattern -1 Cs, Resnonse

That is to Say. the test set is simply the CS2 portion of the compound stimulus CS1 CS2 presented in stage one. If blocking is successfidly modelied then this test set should generate no output at ail (Le. the response value wiU be zero).

6.2.3 Results

The results displayed in Table 6.2 c o r n that a (linear) ENF successfdly repiicates the blocking phenornenon.

Pattern Es tirnated Resmnse

Table 6.2: Results of the blocking experiment.

Had the segment corresponding to CS1 been set to zero (i.e. unknown) in stage two, this would not be the case (see Table 6.3).

Tkble 6.3: Response to test patterns when CS1 is set to zero in stage two.

?he notion of schemata has a long history, originating with Kant in 1787.

However the concept has traditionaily been shrouded in mystery. As a result. until relatively recenüy the term was large@ shumed. In fact. it was not until a number of artificiai intelligence researchexs (e.g. Minsky. 1975) began to Bnd the notion of practical value, that schemata again returned to prominence. And retum to prominence it has. To quote Rumelhart et al. (1986):

mere are many important concepts h m modem cognitive science which must be explicated in Our hmework. Perhaps the most important. however. is the concept of the schaTm or related concepts such as scripts. frames, and so on. Indeed. . . . the schema has, for many theorists. become the basic building block of Our unders tanding of cognition. (P. 7 )

nie centrai idea is that schemata are stmctures for repmsenting the generic concepts (i.e. situations. objects. events. and actions) stored in our mesnories. Under the connectionist interpretation. to process information with a schema

is to detennine which schema best fits the current situatiod ParadoxicaUy,

schemata are dynarnidy created by the very environment that they seek to interpret (Rumehart et al.. 1986). Thus. under the mmecaOnist irzterprefation,

schemata are not 'things' per se: mther they emerge at the moment they are neededfnxn the interudon of the[r 00RStifLLenf elementç. This Sords a degree of

flexibility not found in the traditional (i.e. predicate calculus) representations. For instance. the instantiation of a variable can change the default associated with other variables in the schema (Rumelhart and Ortony. 1977). This makes the resulting Wmctures" extremely malleable. enabhg them to meet a wide

range of representational challenges (McClelland and Rumehart, lQ86).'

McClelland's (1981) Yets and Sharks' prob(em (Chapter 5) nicely illustrates this property of sdiemata ' It is a frequent challenge to mod& ta ûy see # hey are abb to handie afrnost any input imaginable. Whiie this imposes niteresting and, for the most part. important tests of amodel, one must be cautious not to dismiss a modd simply because it may mt be readüy apparent how it would handie a speaîic instance.

Page 91

6.3.1 The Room Scherna Problern

In the room schema problem dinerent kinds of rooms are represented by the attributes which typically define them (see Table 6.4). nie challenge facing the ENF is to identify the room it is in @en a set of detected attributes.'

Attribu te

ceiiing walls door windows very Large large medium srnall v e v smaU desk telephone bed tvpewfiter bookshelf cafpet books desk chair dock picture flmr iarnp

4 J J 4 J J J . J J

J J .

J .

4 .

-Room?Lpe- Attribute A BCDE

sofa easy chair coffee cup as htray fireplace mpes stove coffeepot refrigerator toaster cupboard sink dresser television bathtub toile t scale oven compu ter hangers

Table 6.4: Attribute set used in the rmm schema problemg (A = kitchen.

B = bathroom. C = living m m . D = bedroom, E = office).

"As an example, since bedrooms have beds but not bathtubs, if a bathtub is deteded the condusion that it is a bedroom is substantiaily reduced, simultaneously increasing the likelihood the room is a bathroom. 'Taken from "Schemata and Sequential Thought P ~ O C ~ S S ~ S in PDP Modelsn (Rumethart et ai., 1986).

Page 92

6.3.1.1 The TraininQ Set

The training set is a direct translation of the attribute listing shown abwe (Table 6.4). For each of the 40 attributes. one indicates the feature is typicaily associated with the room type: negative one indicates it is not. 'O

Pattern Stimuli - Response - 1 I

1 2 3 ... 40

'O The 40 aîtributes are listed left to right. top to bottan in the "stimuli" section of the following training set. " The d i n g scheme used for "room type" is: kitchen = {1,-1.-1,-1,-11, bathroom = (-1. 1 .-1.-1 ,-l ), living r w m = {+1, 1,-11-1}1 M m m = {-1,-1,-1 . l1-1). and o f f i = {-7.-II-1,-1, 1).

6.3.1.2 The Test Set

The first experiment asks 'Does the ENF know what room it is in?". Note that al1 of the attributes associated with a room are specifled. For the purpose of illustrating the dynamics of room recognition. this wül suffice. However a

more realistic example would probably involve only a subset of the attributes."

Pattern S tirnuli - Response -

'O course, the subset of attributes rnust, either individually or in combination, serve to adequately define the given room type. A higher order polynomial phi is required when attribut0 combinations are involved.

6.3.1.2.2 Attributes h m Room

The second experiment introduces another important property of constra.int satisfaction based schemata: htuittm UIfere~~e is bidirect[onal. That is to Say. i t

is often possible to regenerate the triggering stimuli fkom the elicited response. in the present context this means that the ENF is able to detemiine the set of attributes which (typicaiiy) define each of the specified roorn types. In short. the ENF 'retrievesW (regenerates) each room's prototype.

Pattern Response - Stimuli -

As Table 6.5 indicates. thé ENF correctly identifles al1 room types. The high .

degree of certainty in its type classification (Le. the large response magnitude) is a reflection of the fact that &i attributes of a room are used to trigger recaii.

Pattern - Response Segment -

Table 6.5: Estimated type (in negative) generated by room attributes.

In fact. even a single URQW attrîbute oRen suffices to key room recognition. aç when in Table 6.6 the ENF correctly identifies the room as a kitchen aven only the presence of an oven. knows it is in a bathroorn when it detects a bathtub.

correctly identifies a living room when a sofa is specified. a bedroom if a bed is

specified. and an office given only that the room has a computed3

Pattern Response Segment -

Table 6.6: Estimated type (in negative) derived Born a single unique attribute.

"CH course the ENFs degree of confidence in its conclusion is substantially rduced when only a single attribute must serve as the basis of the classification.

Page 96

6.3.1.3.2 Amibutes h m Room

As was noted previously. the ENF's abiiity to form coherent structures from interacting discrete pmperües enables it to generate prototypes. Evidence of this is suggested by the data iisted in Table 6.7. However because the attribute

set is provided by a single instance of each class. in this case the ability to fom generalisations masquerades as (simple) leatning by rote.

RmmSrpe- Room Type - Altribute Ç Q Attribute A B c B E

ceiling -0.21 -0.21 -0.20 -0.17 -0.23 walls -0.13 -0.1 1 -0.14-0.1 1 0.04 door 0.08-0.12 0.09-0.12 -0.14 windows -0.13 -0.1 1 -0.14 -0.1 1 0.04 vexyiarge 0.17 0.16 -0.04 0.17 0.19 large 0.19 0.27 0.22 -0.08 0.22 medium -0.04 0.18 0.16 0.13 0.17 smail 0.13 0.11 0.14 0.11 -0.04 verysmall 0.16 -0.07 0.13 0.19 0.13 desk 0.13 0.11 0.14 0.11 -0.04 telephone -0.1 5 0.01 -0.1 5 0.06 -0.1 2 bed 0.19 0.27 0.22 -0.08 0.22 typewriter 0.13 0.1 1 0.14 0.1 1 -0.04 bookshelf 0.09 0.05 -0.1 1 0.1 0 -0.07 c w t -0.1 5 0.01 -0.1 5 0.06 -0.1 2 books 0.09 0.05 -0.1 1 O. 10 -0.07 desk chair 0.13 0.1 1 0.14 0.1 1 -0.04 dock -0.16 0.07 -0.13 -0.19 -0.13 picture -0.16 0.07 -0.13 -0.19 -0.13 flmr lamp 0.17 0.16 -0.04 0.17 0.19

sofa 0.17 0.16 -0.04 0.17 0.19 easy chair 0.17 0.16 -0.04 0.17 0.19 coffee cup -0.1 1 0.07 0.1 O 0.07 -0.09 ashtray 0.09 0.05 -0.11 0.10-0.07 fireplace 0.17 0.16 -0.04 0.17 0.19 hl= -0.13 -0.1 1 -0.1 4 -0.1 1 0.04 stove -0.04 0.18 0.16 0.13 0.17 coffeepot -0.11 0.07 0.10 0.07-0.09 renigerator -0.04 0.1 8 0.1 6 0.13 0.1 7 toaster -0.04 0.18 0.16 0.13 0.17 cupboard -0.04 0.18 0.16 0.13 0.17 sink -0.08 -0.10 0.09 0.1 5 0.08 dresser 0.19 0.27 0.22 -0.08 0.22 television 0.16 0.21 -0.03 -0.09 0.18 bathtub 0.16-0.070.13 0.190.13 toilet 0.16 -0.07 0.13 0.19 0.13 scale 0.16 -0.07 0.13 0.19 0.13 oven -0.04 0.18 0.16 0.13 0.17 cornputer 0.13 0.11 0.14 0.11-0.04 hangers 0.12 0.16 0.16 -0.14-0.04

Table 6.7: Estimated attributes (in negative) associated with room types.

What is a concept? 1s a lamp one concept or many (e.g. cord. base. shade)? As Quine's (1969) famous example iiiustrates. even ostension offen fails to adequately clart@ the matter:

Thus consider the problem of deciding between 'rabbit" and 'undetached rabbit part" as translation of @gavagaiW. No word of the native language is known. except that we have settled on a working hypothesis as to what native words or gestures to constme as assent and dissent in response to our pointings and queryings. Now the trouble is that whenwer we point to different parts of the rabbit. we are pointing also each time to the rabbit. When, conversely. we indicate the whole rabbit with a sweeping gesture. we are still pointing to a multitude of rabbit parts. " (p. 32)

Leaming a ooruept amun& to detenninlng its similart& hasts (Quine. 1974).

Le. the distinctive trait shared by the episodes assented to. Consider, for instance. a child 1e-g the concept (one word obsewation sentence) 'red':

Among the myriad features of episodes of overaii impingement. those features that are melevant to 'red' would in the long nui cease to compte. Times when the sound 'red' was reinforced would show their cornnion features wer more clearly as their irrelevant features continued to Vary at random. until at last the child ... would get to using the word 'red' at just

the right tirnes. " (Quine. 1974. p. 44)

"Of course an amal fmiâ linguis? wouM equate 'gavagai' with 'rabbit', dismiasing uundeta&ed rabbit partn out of hand. The impM ma#n guiding the choice is that a relatkiy hornogeneous object moving aç a whde against a mntrasting background, is a likdy referenœ for a short eqession (Quine, 1974). " h practice. of course, things move fast=, thanks to dience. No multiple inductive steps ae needed to eUminate irreievant features if Me relevant patch is set off in vahous ways. ss tn example. if it is focally situated. brighUy lighted. garishly coloured, or rnovhrg against a background.

Although concepts are a fùnction of their simiiarity basis. it need not. indeed does not. foliow that categories are detennined by cornmon properties. Rosch's ( 1973b) discovery that ail m e r s of a category are not created equal long ago forced the abandonment of this 2.000 year old folk psychology theory (Iakoff. 1987). For if a category was deflned by the properües that ali members share. then no member of a category should be a better example of the categoiy than any other. But experiment shows that sorne mernben are better: to wit. a robin is usuaily seen as a better example than a penguin of the category bird. Categones. therefore. are not defined solely by the properties rnembers share.'"

So what. then. are concepts founded upon? Udess one wishes to contend that properties are totally irrelevant. a view refuted both by common sense and by the dixovery that average proper-ties abstracted fkom exemplars play a key role in determùiing typicality (Smith and Medin. 198 1). one is forced to conclude that a concept emerges either fmm the intemctbn of its properties. or- Me

intercLctiOn of its pmperties with mntext infoTTTUZtiOn (see Figure 6.2).

context

properties

u u r e 6.2: 'Lhe proposed architecture of a concept. The properties which define a concept interact with its sumounding context such that either reoccurrence of its defùiing properties or the context in which it typically arlses (or a subset of both) act to revive it.

This conclusion necessariiy follows (by modus tdlens) from the stated prernises.

The TC problem sounds simple enough: just l e m to discriminate between a T and C independent of its orientation. But as Koestler (1967) has observed. the problem is far from m a l [emphasis added]:

How does one recognise a face. a iandscape. a printed word. at a glance? EX>en the cdentÿioation of a single letter, wrttten by m u s hands. in m % m s sizes. and qpewing on ocrrious positions on oie ret[nn. and heme on oie occpitnl cortex. preserûs an abnost intmctabk problem for the psydiDlogist . . . Some very complex scanning p m s s must be involved which first identifies characteristic simpler features in the complex whole (visual holons iike loops . triangles. etc.): then abstracts the relations between these features: and then the relations between the relations. (P. 81)

In the example to be discussed. the problem is further compiicated by the fact that each character consists of only of five bits (i.e. pixels) of infornation. This is precious little information upon which to base a decision. In fact. as McCleiland and Rumehart ( 1986) note. in order 'to see the Merence between the sets of patterns one must look. at least. at configurations of Uiplets of squares.' (p. 348). In mathematical termuiology it is a problem of order three. Sirnply put. TC categorisation anerges fkom feature interaction. l7

" it should be noted that this problem, and the mplex discrimination problem which follows. can both be trividly (and effiaently) sdvd by the appiication of standard nearest neighbour met ho&. R should ako be noted. however. mat the ENF does not predude the possibility of incorpwating these methods arr fundional enhancements. in ceses such as the TC probiem. for exampie. it may be m e feasible to use the resonanCe of a set of radial bask func(ions as input C the leaming model. As noted previwsly. the phi famly of rtheder p d y m i a i fundionai enhanœments are considered in this wwk because of their biological plausibiiii (Le. mey cm potentially be irnplemented thmugh synaptic interaction). And while it rnay be passible for specific problems to employ problem sdving algorithm that are mie efficient. this must be balanced against the vafue of having a mwe general purpose problem solving model which can address a wide range of ieaming tasks. aibeit less effiintly.

Page 100

6.4.1.1 The Training Set

The TC training set consists of ali possible T and C characters in an n-by-m 'retina". Since each character has four possible orientations. in the sirnplest case (a three-by-three retina) this yields eight patterns (see Figure 6.3): "

Figure 6.3: The TC training set.

Under the coordinate mapping ((1 ,l ). (1 2). (1,3), (2.1 ). (2.2), (2.3). (3.1 1. (3.2). (3,3)),

the training vectors correspondhg to the states shown in Figure 6.3 are:''

Pattern

1 2 3 4 5 6 7 8

' O Four other patterns ae possible (Le. the C rotating against the border). These patterns have not been induded in the TC training set as they generate a disproporn'onate nurnber of C patterns. l e A reamnable case cm be made that the brain perfarms sinüar sensory transfomations aç, for example, when a crossdon of the neural a d i ri the optic nerve is interpreted aç a linear transformation of twodirnensionai retinal input (see discussion Ctiurdiland and Sejnowski,l992).

Page 101

6.4.1.2 The Test Set

To test the ENFs mastery of rotational invariance. the response Beld is set to zero (unknown). and the network is pennitted to settle to its best estimate of the response class. Thus the TC test set becornes:

6.4.1.3 Results

Within the context of the TC discrimination task. the ENF leams to rnaster the notion of rotational invariance with relative ease (see Table 6.8).

Table 6.8 : Third-order polynomial solution ta the TC discrimination task.

Page 102

Because the TC weight matrix has 2212 (or 48.841) elernents. it is extrernely

difllcult to precisely detemiine the method by which the learning mode1 eEects rotational invariance. As before. however, the essential discriminants can be infemd from the elements contributing Wormation to response restoration. When this analysis is applied the interactions shown in Figures 6.4 and 6.5 are found to play a cmcial role.'*

Uure 6.4: Input interactions contributing to a T classi&ation. Locations marked by an X work in combination to generate the response.

6.5: input interactions contributing to a C classification. Locations marked by an X work in combination to generate the response.

Specifically, the Figure 6.4 interadions assume positive values when the response dass 6 T whereas those in Fgure 6.5 are negative. When the class is C the signs reverse.

Page 103

The fact that a i i of the input interactions contributing to the classification decision are active at output would seem to imply that a l i of the discriminants are simultaneuusly invohred in the class determination. That is. imm&ma? is achfewdwhenaUorlentatfons~intoa~kd~noftheabodled&~. It is not a matter of the extemaUy activated interaction eliciting its associated

response class which. in turn, triggefs the afnliated indicators: for the class is retrieved in negative. Thus. if anything. regeneration of the response segment should serve tu dissuade the aUed indicators from emerging. Rather. evidence of the presence of any discriminant interaction is imrnediately disseminated to a l l key discriminants (fkiend and foe). In essence. by ciustering (cohering) the interactions into opposing camps. the ENF for= two high-level detectors: one attending to a T in any orientation and the other to a C in any orientation. This is. of course. precisely what it was asked to do.

The Com~lex Discrimination Problem

The cornplex discrimination task is quite dinerent fi-om most learning tasks. in the typical leamhg situation the network is first trained on noise-free instances of each class to be dis~riminated.~~ Only after the network masters the noise-fkee instances wiU it attempt to classe (possibIy) noisy examples. However. in the complex discrimination problern the situation is reversed. nie network must first leam the basis of the discrimination fkom noisy instances. Concept formation is then tested on (possibly) noise-free examples. in short. the system must induce the correct discriminant from cormpted instances.

h the present conte* hoisew is defined s the addiion a deletion of information (bits) fmm the target pattern set. This definition WH, no doubt, differ lran m e in that. even organised extraneous information iç here viewed as noise, relative to the specified discriminant. For while the extraneous information may be coherent in-and-of-itself. it either obscures the disaminant sürnulus or it represents m aitemative which must be inductively eiiminated through cornpetition. h other words, even when some portion of the visual field cm be meaningfully interpreted as for instance. atree. the information cornprising the tree is rnereiy ndse when the task is to d iminate between @rs ad I rucb.

Page 104

In the classic (paradigm) version of the complex discrimination problem. the

subject must learn to selectively respond to complex elernents in photographs of mal-life settings. Given that each photograph contains numerous elements upon which the discrimination couid be founded. it's remarkable that subjects l e m to identify the required criterion. It is ai i the more remarkable when one considers that the subjects are pigeons! Rachlin (1970) describes the task:

The critical point of the expeximent was that for some pigeons pecks wext reinforced only if the picture contained a tmck or part of a truck. For other pigeons pecks were reinforced only when the picture contained cars or parts of cars. ... nie two groups of photographs were equally iight. equally colourful. and equally complex. Yet withui a few weeks of daily exposure to the photographs the pigeons came to peck rapidly when exposed to a picture containing a truck (if that was the SD) or a car (if that was the SD) and

little or not at aîl to the picture containing the SA vehicle. (P. 175)

Some may wish to dismiss this experiment as trivial. large& by virtue of the fact that its subjects were pigeons. To dispense with this line of reasoning it

should be pointed out that the experirnent has also been repeated with human subjects (e.g. Wason and Johnson-Laird. 1972). Nor can it be dismissed on the &round that it fails to capture the complexity of human concept formation. Indeed. Wason and Johnson-Laird (1972) refer to their implementation of the

complex discrimination problem as the 'concept attainment task":

The point of the concept attainment task ... is that the subject is exposed to positive and negative instances of the concept (or 'idea') which the

experimenter has ia rnind. His task is to discover this concept. and demonstrate he has discovered it. by anticipating correctly whether further instances are positive. or negative . . . . (p. 67)

6.4.2.1 The 'I'rauune Set * .

Although in theoqy it might possible to use a digitised photograph in the

following experiment. in practice the resulting combinatortai explosion of inputs makes consideration of complete photographs untenable. The Ni-order polynomial variant of the ENF oniy exacerbates the situation. Fortunately. however, the complex discriminant can be any unique stimulus. Thus we can convenientiy use the TC patterns (outlined above) to represent the concepts of mck and Çar. mat is. 'photographs" which contain a T or part of a T can be arbitrarily defined as the S*. those containing a C or part of a C as the Sb.

Although noise rnay randomly be injected anywhere into a T or C pattern, not ail locations generate vaiid distortions. In parücular. since the resulting character could legitirnately be interpreted as either a T or a C. a bit" cannot be added to any of the shaded locations (or their rotations):

Efeure 6.6: ïnvaiid locations (shaded) for the addition of a bit.

Simüarly. removal of a bit Born any of the shaded locations below will yield an ambiguous pattern (Le. a pattern equally likely to be a T or C):

6.7: Invalid locations (shaded) for the rernoval of a bit.

Page 106

Thus the valid training set becomes:

mure 6.8: Noisy TC patterns used in cornpiex discrimination experirnent. The comesponding 90°, 180°. 270° rotations are not shown.

When this m a t a representation is converted into input vector format the

foiiowing training set is generated:

n - Stimuli n - Stimuli - R 21 -1 1 1 1 1 -1 -1 1 1 -1 22 -1 1 1 1 1 1 1 1 1 -1 23 1 1-1 -1 1 1 I 1-1 -1 24 1 -1 1 1 1 1 -1 1 -1 -1 25 -1 1 1 -1 1 1 -1 1 1 -1 26 -1 -1 -1 1 1 1 1 1 1 -1 27 1 1 -1 1 1 1 1 1 -1 -1 28 1 1 1 1 1 1 -1 -1 -1 -1 29 -1 -1 1 -1 1 -1 -1 1 1 -1 30 -1 -1 -1 1 1 -1 1 -1 1 -1 31 1 1 -1 -1 1 -1 1 -1 -1 -1 32 1 -1 1 -1 1 1 -1 -1 -1 -1 33 -1 1 1 -1 -1 -1 -1 1 1 -1 34 -1 -1 -1 1 -1 1 1 -1 1 -1 35 1 1 -1 -1 -1 -1 1 1 -1 -1 36 1 -1 1 1 -1 1 -1 -1 -1 -1 37 -1 1 1 -1 1 -1 -1 -1 1 -1 38 -1 -1 -1 -1 I 1 1 -1 1 -1 39 1 -1 -1 -1 1 -1 1 1 -1 -1 40 1 -1 1 1 1 -1 -1 -1 -1 -1

Page 107

6.4.2.2 The Test Sef

Since the centrai question at issue is whether or not the proposed leaming mode1 is able to comectly form the concepts of T and C from noisy instances. the (undistorted) TC character set will be used to test -conceptw recognition. Howwer it would fh t seem prudent to ver@ the ENF has. indeed. mastered the training set, by asking it to regenerate the (zeroed) response class.

Surprisingly. the ENF faüs to learn the complex discrimination pattern set!

In fact. the response estimate is zero across ail input pattern^.^' In order to isolate the problem state(s) each member of the training set was

removed (one at a time) and the remahhg patterns irnprinted. The matrix was then tested for its ability to generate the requisite discrimination. in aii eight problem states were isolated using this method (see Figure 6.9)."

mure 6.9: The eight -devian tw complex discrimtnation pattems.

These observations suggest that capacity has been reached. However this is not the case (n = 219). a Why these paroailar patterns? AIViough it can't be stated with certeinty that the corruption spawned by these states is due ta any single feature, one thing ha! distinguishes these states tnxn their siblings is that they aie the oniy ones in which the centre bit is "off" (-1). This would seem to suggest that the centre bit plays a key role in encoding the discrimination aiteria Precisely what role R gays is not dear. t may sirnply rnean that a still higher order phi function would be required to effect the discrimination.

Page 108

In any event. if these deviant patterns are removed from the training set the

ENF re-y l e m s to discriminate T nom C (Table 6.9) even though it has newr

seen the mtse-free pattmms beji040te.~' This is really rather îemarkable. sug@sting

that the ENF h a learned to discem each concept's essential properües.

Table 6.9: Estimated response to the TC test set following training on noisy instances of the target response classes.

Moreover. it seerns to accord rather weli with W. V. Quine's (1969) account of early language acquisition:

Ln one's earliest phase of word learning. ternis like 'mama" and 'swatef were leamed which may be viewed retrospec~eîy as names each of an 0bSe~ed

spatiotemporal object. Each such term was learned by a process of reinforcement and extinction. whereby the spatiotemporal range of application of the terni was graduai& perfected. ... nie second phase, rnarked by the advent of individuative terms. is where a proper notion of object ernerges. . . . mhese individuaüve temis. e.g. 'applen. are leamed still by the old method of reinforcement and extinction: . . . . (pp. 1 1 - 12)

'' In f a d if mv of the patterns is eliminated frorn the training set the ENF masters the discrimination (M. 6).

Page 109

6.5 Intuitive Reasoning

6.5.1 The Commsite S-ymbol Structure Problem

Arguably the most frequently cited critique of connectionism is Fodor and Fylyshyn's ( 1988) 'compositionality objectionw:

in the Classical [von Neumann] machine. the objects to which the content A & B is ascribed (viz.. tokens of the expression 'A & B') literally contain. as proper parts. objects to which the content A is ascribed ... . Moreover. the semantics (e.g. the satisfaction conditions) of the expression 'A & B' is

cietennineci in a uniform way by the semantics of its constituents. By contrast. in the Connectionist machine none of this is true: the object to which the content A & B is ascribed ... is causaiiy comected to the object to which the content A is ascribed ... : but there is no structural (e.g. no part / whole) relation that holds between them. (P. 16)

The compositionality objection critically hinges on two implicit premises :

1. That nesting is the only way to explain 'the productivity of t h ~ u g h t ' . ~ ~

2. That connectionist systems cannot generate nested representations.

Neither is vaiid. For while one might agree that recursion (nesting) does indeed offer an elegant explmation of the productivity of thought. it is hardiy

the only plausible account. Considered in anibination, Our 10 l4 synapses

alone would seem to have the potential to house a virtuai infinity of thoughts. Moreover. not only are comectionist systems capable of recursive structures, as the foliowing experirnent vividly illustrates. they are inherentïy recursive.

25The p h r s e "productivity of thought" (Fodot and Pyîyshyn, 1988) is intended to reflect the fad that the representationd capability of a cognitive system is unbounded while its resources are bounded.

Page 110

6.5.1.1 BoltzCONS

A number of important comecticmist models have been offered to refute the

compositionaiity obje~tion.~' One of the first and. as it established the genre. one of the rnost important was Touretzky's (1990) Y3oltzCONS".

BoltzCONS is a recursive procedure which dynamically creates composite symbol structures (e-g. stacks. trees) using a linked-list fornialism. Adopting a methodology fîrst employed by John McCarthy's list processing language LISP.

BoltzCONS represenb a list as a series of addressed cells linked together by their contents. Each cell is encoded as a three-tuple. i.e. (TAG. CAR. CDR).

where the TAG field is an address label defining the three-tuple. and the CAR and CDR fields refer either to the tag of another cell or to a symbol primitive.

Thus the sentence -John kissed M W . which in LISP can be represented by.

(Event (Agent John) (Action kiss (Patient Mary)) past)

would be encoded in BoltzCONS by the foilowing three-tuple set:

CAR CDR

Event q ) r t 1

Agent s ) John S 1 U 1 Action v ) kiss w 1 X w 1 Patient y ) M a r y Y 1 P=t 1

Table 6.10: Three-tuples corresponding to the sentence Vohn kissed M W .

'O For an overview of this work see Hinton's (1990) "Connectionist Symbol Processing".

Page 111

L i s t W i s b y a s s o a a t w e . * r*. That is to say. once the three-tuple

set has been successf~ imprinted onto the cmstra.int satisfaction network. in this case using the ENF. each cell in the list can be successfvely pmessed by copying the contents of the CDR field into the TAC field and allowing the network to settle to a stable state. This 'fetches' the three-tuple comesponding to the copied field from tuple-memow. retrieving its associated CAR and CDR values (by intuitive inference).

LISP's 'caf function is siniilarly implemented except that the contents of the CAR field is used to induce retrieval. Moreover. associative retrieval is not limited to following the pointers in a fornard direction. As T o u r e w observed. it is also possible to define un-car and un-cdr which. as their names suggest.

undo the effect of the CAR and CDR operators. Together these four operators

pexmit movement anywhere within the lis t's implicit hierarchy.'?

6.5.1.1.1 The Training Set

Since there are 19 distinct terms to be coded. in total log2(19) = 5 bits will

be required to represent these te- [see Table 6.11).

Coding Tm 2

Event Agent John Action kiss Patient Mary Past

Coding

Table 6.1 1 : Coding scheme used in composite symbol structure problem.

"An ernbedded list (nested parentheses) is an altemate way to represent a tree structure (Knuth, 1973).

Page 112

Under this codlng scheme. the training set corresponding to the tuple set listed in mble 6.10 would be:

Pattern

1 2 3 4 5 6 7 8 9

10 11

Stimuli - TAG -

Responses C A R - - CDR -

6.5.1.1.2 The Test Set

The comesponding test set thus becomes:

Pattern S tirnu li - TAG -

Responses - C A R - - CDR -

Page 113

Although an order three polynomial is required to capture the essence of the three-tuple patterns. the ENF is able to master the complete three-tuple set.

Pattern - Estimated CAR Field - - Estirnated CDR Field -

Table 6.12: Recail of CAR and CDR portions of imprinted three-tuples.

Note that ail of thejieIds are regenemted from the same set of weighis. In the

obtuse jargon of Fodor and Pylyshyn (1988). the network forms a stnicturally molecular representation possessing syntactic constituents. themselves. either stmcturally molecular or atomic (i.e. syrnbols). But is the implementation of recursion reaUy aii that is required to answer the compositionaiity objection?

Fodor and Pylyshyn ( 1988) suggest that it is not:

Because Classical mental representntbns have combinatorial structure. it is possible for Classical mental opmtims to apply to them by reference to

their form. n i e result is that a paradigmatic Classical mental process operates upon any mental operation that satisfies a @en stmctural description. and transforms it into a mental representation that satisfies another stnicturai description. (p. 13)

Consider. for instance. modus ponens (Le. @en any true proposition 'p' and

the true implication 'p impiies q'. one may validly conclude eny 'q'). The t e m 'p' and 'q' are completely arbitrary. But in BoltzCONS. in fact in comectionist models in general. temis are invariably bound when they are created. hdeed they must obtain to key recall. lronicaily, it turns out that hhe essence of the

agnltivist objection is not that ternis must represent mns[stently ut boM the utornic

and mlecular bis. but mMer thnt they must be able to represent notturyl ut d. However this objection loses its force when it is realised that. in a veiy real sense. variables too are bound at creation. For variables must also possess a uniform identity (within a context) in order for their instantiation to result in uniform substitution across aii instances. Under this in terpre tation variable

binding is thus viewed as an arbitrary association between two bound terms - the variable's name and its instantiating value (see Smolensky. 1990b). In any event. this issue only arises if one adopts the functionalist's predicate calculus rnodel of rnind in which 'the Iam of thought" are required to knit propositions together. And (to reiterate) as Hinton (1990) observed. rational inference need not necessarily imply the application of laws of thought:

More complex inferences require a more serial approach in which parts of the network are used for perfonning several different intuitive Uiferences in sequence. This will be caiied 'rationai inferenceW. (p. 50)

Or put another way. complex thought is large& (perhaps wholly) a reflection of each holon's impiicit hierarchy. To again quote Hinton (de& vu Koestler):

It appears that whenever people have to deai with complexity they impose part-whole hierarchfes in which objects at one level are comped of inter- related objects at the next level down. (ibid. pp. 47-48)

Page 115

6.5.2 The Tic-Tac-Toe Problem

In tic-tac -toe two players alternately put crosses and ciphers in the cells of a figure formed by two vertical lines crossing two horizontal lines. nie object of the game is to obtain a vertical, horizontal, or diagonal row of three crosses

or three ciphers before your opponent does. Although the game of tic-tac-toe has been enthusiastically played. at one time or another. by nearly every child. it is generaiiy known that if each player plays flawlessly. the best outcorne that can be achieved is draw after draw. Thus few adults express much interest in the game. It does. however, present a considerable challenge for the learning rnodel. in essence, it is analogous to teaching an insect to play.28 Moreover.

as McCleUand and Rumehart ( 1986) point out: 'Other more complex garnes. such as checkers or chess require more effort, of course, but c m . in principle. be deait with in the same way." (p. 40).

Although. @en sufficient üme, in principle the ENF should be able to learn to play a reasonable game of tic-tac-toe by simply being exposed to instances drawn from interaction with players at different cornpetence levelsas noted above. to accelerate leaniing. a more directed set of training instances was generated.lg In order to describe the generation of these training instances. it

will Brst be necessary to introduce a useful notational convention.

Sinca theie are 9 elements in each board state, and since athird-odw pdynomial 8 required to capture the underlying relationships. in total 2M elements result from tensor expansion. When mis vedor is augmenteci wim the 9 eiement response vedor, in totd the network contains 229 inputs and. therefore. only 229 "neuronsn (Le. the weight matrix must be square).

h order to minimise the amputational load, and yet süü i1usttate the basic prinuples, the scope of the iearning tssk has been intentionaiiy simplifed. That is ta say, the number of plausible game continuations hom any given position has been mimmised. However, a detailed account of how these strategies might plausiMy evolve h a been provided by ûowiey and S i i e r (Cognitive Science, 1993). Unfoftunately, detaüing how novices came to acquire the des of the game se beyond the scope of the present work. The interested reader is, therefore. refend to the Crowley and Siegk paper.

Page 116

if we are shown the following state (Figure 6.10) we have no way of knowing which of Xs moves came first.

Fi~ure 6.10: An illustration of potential move sequence ambiguity.

Did X Grst go to the centre and then to the corner position. or vice versa? The move sequence is lost. Fortunately. there is a simple way to retain the move sequence: instead of just recording a nought or cross record the move number. Thus the two distinct rnethods of achieving the state shown in Figure 6.10 are:

sequence A sequence B

Fietire 6.11: Two move sequences which generate the same state.

The foliowing vectors represent these two sequence possibilities:

board coordinates (row. column)

(1, 1) ( 1 1 2) 1 3 (2, 1) (2, 2) (2. 3) (3. 1) (3. 2) (3, 3)

sequence A 3 O O O 1 2 O O O

sequence B 1 O O O 3 2 O O O

Table 6.13: Vector representation of Figure 6.1 1 move s e q ~ e n c e s . ~ ~

'' The &ai vector represantation employed in the following experiments uses X = 1 and O = -1 rather than move sequence. Thus both sequences wouM be encoded as: (1.0.0,O. 1.-1.0.0.0).

This simple scheme not ody faciiitates the representation of an entire game in a single state. it also suggests a method of generatfng training instances. If one starts with a completely bIank board (Le. the zero vector). there are nine possible locations to place the first mwe. eight locations to place the second move. swen for third. and so on. This rneans that there are potentiaiiy 9! or 362.880 dinerent games in the tic-tac-toe problem space. Fortunately. not di of these configurations are -legai" under the mies of the game. For instance. the 362.880 figure includes games which continue after a win has occurred. In fact. when ail invalid games are pruned only 55.648 remain. nie breakdown of these games is shown below:

.................. Player 1 wins 26,208 (47%)

.................. Player 2 wins 1 3.376 (24%)

.............................. Draw 16,064 (29%)

Table 6.14: Breakdown. by outcorne. of tic- tac-toe games.

Even ignoring the fact that these games must eventuaUy be expanded into board states. the leaming task is already weU beyond the capacity of the ENF.

Fortunately. it's possible to make substantial reductions in the number of training candidates: for by placing one's Brst mark at the board centre. one maximises the number of rows. columns. and diagonals that can be completed. in short. occupying the centre position maximises the opportunity for a win?

It is reasonable. therefore. to make the simpllfying assumption that the Brst player (X) places the& mark at the board centre. Moreover. if it is assumed that the first player occupies the centre position. further oppomuiities to prune the number of games emerge. in particular. it can be observed that if ttie

oppmerttrespondsto theinitialmowwllhanythiragottwthnnamwtoacomer

position Mefirst player is able ti force a iufn

" This heu ristic has mbeen empirically verified.

These winning games are:

El,gvre 6.12: Centre-side forced win sequences (rotations not shown).

Assuming that player taro selects a corner in response to player one's centre selection. the ENF's next rnove wiii be to the comer diagonaiiy opposite the corner location selected by the opponent. This continuation offers a number of advantages: not only does it ensure that the opponent cannot force a win. if

the opponent responds with anything other than another comer on their next move. the ENF can once again force a win (see Figure 6.13).

Wre 6.13: Forced continuations of centre-corner-corner-side sequence (rotations not shown).

Page 119

With the side option mied out, only two possibilities remain for player two (i.e. the open corner positions). Selection of either corner forces the ENF to block on the next move. It is interesting that the block move not o d y keeps

the opponent h m winning, it also eiiminates the opponent's last opportunity for a win. Moreover. it simultaneously threatens to win on the next move. Assuming the opponent plays a rational game (i.e. they prmnt a loss when no opportunity to win avaiis itsew. the complete trainhg set thus becomes:

m r e 6.14: Complete tic- tac-toe training set (rotations not shown).

This is not to say that there are only 12 game states. It must be remembered that each is shorthand for a series of board states. As an example.

Page 120

Under the proposed coding scheme there are (at least) two ways to reflect a progression of moves. Both use a variant of the BoltzCONS methodology.

Perhaps the most intuitive approach is to simply imprint pairs of states, such that the second state in the pair is a plausible continuation of play should the first state be detected (Le. instantiated). in short. the Rrst state

triggers the release of its successor. Thus the concatenated state pair:'"

- stimulus (state 1 ) - - response (state 2) - 1 - 1 - 1 O 1 O 1 0 - 1 1 - 1 - 1 1 1 O 1 0 - 1

would correspond to X s winning rnove:

mure 6.15: Move sequence implied by the state pair vector.

Precisely the same result can be achieved. however. if instead of repeating the entire board. each successor state simpty notes the recommended move. Thus the win shown above (Figure 6.15) could have also been encoded as:

- stimulus (state 1 ) - response (move) - 1 - 1 - 1 0 1 0 1 0 - 1 0 0 0 1 0 0 0 0 0

This variant greatly facilitates learning. in that the network need not restore the entire state (oniy the recommended move must be regenerated)."

Page 121

In total. the 48 game sequences shown in Figure 6.14 (plus their rotations) generate 152 training patterns. However many of the sequences yield identical board states. if these duplicate states an? removed the training set reduces to only 73 patterns. weil within the capacity a third-order polynornial phi ENF. These patterns are listed in Appendix A.

The Test Set

?he test set is identical to the training set iisted in Appendix A except that the response segment is set to unknown [Le. zeroed).

6.5.2.3 Results

When a third-order polynomial phi ENF is used to train the network. al1 73 response vectors are successfully regenerated (see Appendix BI. Note that it

was unnecessasr to introduce such esoteric concepts as diagonais. occupancy, or blo~king.~~ N m WQS it n w to interpolate teleology. It is important to

underscore what this means - for it suggests that many complex strategies can be effected by a series (chain) of simple stimulus-response links .'" I t does not.

however. impw a return to the mythicai reflexological model. As per Koestler. each holon (Le. connection matrix) evaluates its txiggering stimuli within the context imposed by its embodied canons (associations).

Blocking, in this context, refers to the occupation of a cell to prevent a loss on the next move. "This is, of course, just a restatement of Hinton's powerful reconceptualisation of rational inference.

Page 122

W e are deceiveci at evey b e l by our htmspection (F. Crick)

7. Conclusions

The range of learning phenornena that can be accounted for by the 'simple'

processes of interaction. habituation. and feedback is realiy quite impressive. Given a high enough degree of interaction. an rth-order polynanial ENF am

g e n m e any - * . mapping thnt is mnotonic of its inputs. ' Moreover. if

complexity theorists are right. the impact of interaction extends far beyond the introduction of higher-order terms into input-outpu t mapping equations:

The distinction between linear and nonlinear systems is fundamental. and provides excellent insight into why the mechanisms of life should be so hard to Rnd. .. . Neither nucleotides nor amino acids nor any other carbon- chah molecule is a h e - yet put them together in the right way. and the dynamic behavlour that emerges out of their interactions is what we caU

life . (Langton. 1989. p. 41)

Intemction is the ghost in the rrmcSiine. Essential properties emerge through interaction. This should not. however, be taken as an invitation to mysticism: for while it rnay be tempting to leap nom the observation that things are not invariably deked by their shared properties. to the conclusion that they are independent of their shared properties, the conclusion does not necessarily foiïow. Although it is tme that the whole is greater than the sum of its parts.

Me wholeispreciselyequnltothes~onof ï tspwtspius theirinfemctions.

' This conclusion is bas& on the Rumeihart and McCblland (1 986) observation tha! their 'sigrnapi' units, which shihly considers synaptic interactions, ae sufficient to mimic any function rnonotonic of iEs inputs. tt should be pointed out, however. that adjustable bask functions (0.g. the sigmoids of backpropagation) offer a saiking advantage when compared to the ENFs fixed basis functions. Specifically, sigrnoidai nets achieve a smaller integrated squared emx than that achieved using a series expansion (Banon, 1993). Moreover, they achieve mis irnproved performance wilh a relatively snd number of parameters compared to the exponentiai number required by polynornial, spline, and trigonornetric expansions.

Page 123

In this final chapter 1 briefly revisit my thesis with a view to summarlsing the key evidence from which it draws support. Of course much of the support is empiricai, specificaily the extensive range of cognitive phenornena which the ENF is able to reliably reproduce. The selected experiments form a continuum: schemata arising £?om the interplay of interaction. habituation. and feedback ground concepts which. through secpenced recursion. provide a basis for serial reasoning. 'Ihus the d e L @seIf. large& serues to estabüsh its own sljfficiency.'

Main Thesis Revisited

Since p is a s w condition for q if and only if q is true whenever p is tme. my main thesis can be restated as follows:

If habituation. feedback. and interaction are present in a model of cognition

then a wide range of cognitive phenomena must occur.

This is not to Say. however. that habituation. feedback. and interaction are necesSan/ for the emergence of these phenomena. For it is possible that a plausible model will be advanced which employs none of these properties. Nor

is itmeantto imply thataîZmnceiuabkoognitivephenanenastemsoklyfrom ttu? interpihg of Mese three pmperties. Rather. it is hoped that this thesis provides a basis for the dweloprnent of more elaborate models which wiil include. but

are not limited to. the principles explored. in fact. much of the groundwork has already been laid (cf. Selfridge, 1958: Koestler. 1964. 1967: Margoiis. 1987).

Where the present study strhres to make its contribution is with respect to the identification of the core structures upon which cognition is founded. For it is

rnaintained that the enhanced novelty mter generates 'holons'. That is to Say. it is rnaintained that the ENF generates structures possessing al1 the essentiai properties Koestler attributed to holons.

' Simulation is a third f o m of science. standing halfway between theory and experiment (Waidrop. 1992).

Page 124

Perhaps the strongest evidence in support of this contention is found at the fiinctional lwel:

Output hierarchies generally operate on the tngger release principle. where a relatively simple. impiicit or coded signal releases cornplex. preset mechanisms. . .. in the performance of leamt skills. includùig verbal skills. a generalised implicit command is spelled out in expiicit terms on successive lower echelons which. once triggered into action. activate their sub-units in the appropriate strategic order. guided by feedbacks. . .. Input hierarchies operate on the reverse principle: instead of triggers they are equipped with 'filter'-type devices (scanners. 'resonators'. classifiers) which strip the input of noise. abstract and digest its relevant contents. according to that parücular hierarchy's criteria of relevance. ... In perceptual hierarchies. filtering devices range from habituation and the efferent control of receptors. through the constancy phenornena. to pattern-recognition in space and tirne. and the decoding of linguistic and other forms of meaning. Output hierarchies spelï. concretize. particularise. input hierarchies digest.

abstract. generaüse. (Koestler. 1967. pp. 344-345)

But support is aiso found at the stmctural level. In particular. both the ENF

and holons use negative feedback to achieve self-regdation? And both employ

association by similarity4 to trigger appropriate actions. Just as the parts are

restrained by the whole. the whole is driven by the self-assertions of its parts.

Together they form the two faces of &mus: exogenous and endogenous control.

' ln order to prevent the overloading of information channels and overshooting of responses. the controls which represent the interests of the whde vis-à-vis the part. are largely inhibitory (Koestler. 1964). ' Koestler (1964) defines 'association by similarity' as aform of intuitive inference: "Association by similarity of perceptions would accordingly mean that an input-pattern A at some stage of its assent in the nervous system initiates the r e d of some past experience B whidi 6 equipotential to A with respect to the scanning proces cd that particular stage, but not in other respects. We might say that A and 6 have one 'partial' in cornmon which causes 6 to 'resonate'." (p. 538)

Under the proposed holon-based mode1 of cognition. perception and memory cannot be unscrambled. Even elementary perceptions have the character of inferential constnictions (see Bartlett, 1958). in a series of relaying operations input is gradua* stripped of irrelevancies according to the criteria determined by the holon which occupies that particular tributary of the input Stream. nie incoming patterns are subjected to generahation and discrimination ut the same ttme. the two being complernentary aspects of a single process. mat is. each holon discriminates by extracthg subtie similarities to its past. To again quote Koestler ( 1964):

The degree of originality which a subject will display depends ceteris parlbus. on the nature of the challenge - that is. the novelty and unexpectedness of

the situation. Familiar situations are dealt with by habitua1 methods: they can be recognised. at a glance. as analogous in some essential respect to past experiences which prwide a ready-made mle to cope with them. The

more new features a task contains, the more Wcult it will be to fhd the relevant analogy. and thereby the appropriate code to apply to it. (p. 653)

?he tenderzcy to reduce troublesane feedbadc ta a minimum is the wy ess- of habit-fomatbn (Koestler. 1964). Moreover. the transformation into habit is typically accompanied by a dimming of awareness. Consciousness decreases in proportion to habit formation. This reduces the dernands placed on cognition. neeing the organisrn to concentrate on less routine matters. As an example. skiîled cùivers on famillar roads are generaUy able to manoeuvre on 'autopilot'. leaving them free to thLnk about something else. Operating as closed feedback loops these autornatised routines are self-regulating. in that there is no need to refer decisions to higher lwels. This structural ronstra.int has profound fùnctional ramiilcations (as WU later be detailed). in parücular. it completely eliminates the need for a central representatIon (Brooks. 199 1 ]!

Generatine Temwral Seauences

To underscore his philosophical objection to the conceptualisation of brain as rnind, D. O. Hebb once remarked:

If mLnd is a brain process ... we could not hear the dock strike twelve: the

brain gets the same message twehre times. so. if that is al1 there is. what one would hear is the clock striking one over and over again . .. . (quoted in Amit. 1989. p. 2 19)

Hebb was not taking a metaphysical stand: he was pointing out that the state of the brain is modffied by its encounter with first stimulus in such a way as to make the impact of the second stimulus markedly different from the first. William James (18901 had made essentiaiiy the same point many years earlier:

If recently the brain tract a was vividly excited. and then b. and now vividly

c. the total present consciousness is not produced simply by c's excitement. but also by the dying vibrations of a and b as well. if we want to represent the brain-process we must write it [as] three different processes coexisting. and comlated with them a thought which is no one of the t h e thoughts which they would have produced had each of them occumed alone. (p. 187)

What both James and Hebb address is the importance of temporal sequence. As was the case with spatial interactions. essential properties can emerge from temporal interactions. in the simplest case these temporal sequences can be effected by a 'tapped delay line' which converts the temporal pattern (sequence) into a spatial pattern. More fomially. the resulting lag vector can be defined:

where xt is an element selected from the input stream and r is the delay.

Page 127

As Gershenfeld and Weigend ( 1993) point out. tirne-lagged vectors are more than just a way to introduce temporal flow [emphasis added]:

Delay vectors of sufnicient length are not just a representation of the state

of a linear system - it tums out that d e 4 zmtors orui remuer oie jidZ

g e m m & k d s ~ o f a m n l i n e a r s y s t a n . (p.20)

Unfortunately. althuugh the tapped delay line mode1 fits exceedingly weil with the ENFs notion of functionaily enhancing the input Stream. there a~ several drawbacks to this general appmach to sequence recognition (Mozer. 1989). For not only must the deïay line length be chosen in advance to accommodate the

longest possible sequence. the resulting nuniber of input units potentiaiiy may become large, leading to slow computation. A more promising alternative is to implement the sequence using delayed f i k (Fïgure 7.1):

W r e 7.1 : Basic sequential circuit (Kohonen. 1984).

Because each block in Figure 7.1 is a combinatorial circuit possessfng an arbitrary intemal state. in order to generate an effective output two circuits are required (Kohonen. 1984). However if the interna1 states are not arbitrary. i.e. when each block contains replicas of the patterns it has encountered. this circuit can be further sirnplified (Figure 7.2):

Page 128

mernoxy

bb mure 7.2: Autoassociative memory for stmctured sequences .'

Known as a remment network. 'the state of the whole network at a particular

üme depends on an aggmgate of previous States as weU as on the current input.' (Hertz et ai.. 1991. p. 179). That fs to Say. assuming that a sequence of extemal patterns { pl. m. ... . } has been received and stored. when the first input pattern arrives the efkctive input sequence becomes:

where 4 denotes an input with no signal. Associative r ecd of the memorised

sequence is achieved by applylng one member of set S. thereby keying retrieval

of the remainder of the sequence. Thus. for example. applying the key (pl. 9)

retrieves the entire temporal sequence. Le. no further keys a. for k = 2 to n.

will be nectssary. Speciflcaiiy. the initial k y (pl. +) retrieves ($. p Q . producing

the output p2 which can then be used to retrieve p3 and so on.' Of course it is

always possible that the retriwed couplet (+. pk) rnay match severai couplets in

memory. making it unclear what the successor state should be. A number of possibiiities for handüng this situation exïst [e.g. using a similarity measure with a higher resolution). The interested reader is refemed to Kohonen (1980).

' Note that thû netwoik is indisthguishable frorn the ENF system model (Figure 5.2)! ln essence, this is the rnethodology used to traverse a list in the composite syrnbol structure problem.

7.3 The Problems of Scalinp and Generaiisation

When considering a learning task like the complex discrimination problem. the realities of the worid soon begin to impose themsehres. Even if a 'retinaw could be represented by a 25 by 25 pixel array (which. of course, it cannot), the

total number of inputs would be 252 or 625. Assuming that this 625 element input vector is even muiunall . . y (Le. second-order phi) expanded. it imrnediately gram to 196.250 elements in length. And since the number of elements in the connection ma- is the square of this number. a complete correlation matrix wilï contain 38,541,062,500 elements! As is tme of comectionist models in generai (Minsky and Papert. 1988). the ENF does not scale-up particularly weii.

This also has important implications with respect to the proposed rnoders ability to fonn generaüsations. in particular. Barron (1993) has proven that,

relative to adaptivc basb function mod& [CS. the genudiscâ delta de).

any mode1 which empîoys bads hctions b severcly lîdted wîth

respect to its abiïity to form generaliaatio~t~. Or as Masters [ 1993) put it:

One of the beauties of networks that send raw inputs directly to a complex hidden-neuron layer is that the user is relieved of the responsibilïty of choosing rneaningful representations or pattern matchers. The functional link nebvork is, in one sense. a step backward. . . . [O]verfitting can easily lead to a network that perfonns weii on the training set but fails miserably when applied to the general population. (P. 225)

Thus. while it is m e that the use of a polynomiai expansion ailows the systern to mode1 arbitraxy input-output mappings. it does so only at great expense.'

' in the TC problem. for exampie, 48.841 parameters ae used to achieve the cwrect answer for only 8 training cases! However, it should be pointed out that the proposed (rtharder) vatiant wars chosen f r m the infinite nurnber of potential enhancement functions. not because of ils axnputational efficiency in fonning generalisations, but because of its biological piausibility as a rneans of doing sa Undoubtedly other functions may be more powerful in specific problem si?uatims.

But does this mean that it is therefore of limited practical value. forever

restricted to the shadowy world of toy problems? 'Ihere are two ways to answer this question. The Brst is to consider the introduction of techniques which help scale the magnitude of the problem. in particular. using the method ouüined prwiously (section 5.3.2). it may be possible for the ENF to fonn

diable associations by simpiy sarnpling elements fkom its feedback matrlx.'

In this case. the system equations would becorne:

An alternative is to recognise that biological systems. too. are 'restricted to the shadowy world of toy problems". ïronically. it may have been Minsky and Papert ( 1988) who put the point best [emphasis added] :

Pemsps the scale of toy probians is lhat on which in physldogical actud@* much of hhe firnctiorzing of intelligenae operates. ... We have used the phrase 'society of minda to refer to the idea that mind is made up of a large number of components. or 'agents'. each of which would operate on the scale of what. if taken in isolation. would be lit& more than a toy problem.

. . . In many situations. humans clearly show abilities far in excess of what could be learned by simple. uniform networks. But when we take those

s W s apart. or txy to find out how they were leamed. we expect to find that they were made by processes that somehow combined the work (already done in the pst) of many smaller agencies. none of which. separately, need to work on scales much larger than do those in PDP. (p. 591)

Aithough the recolledion is orûy a approximation. the incomptete correlation mabDc mode1 offers the advantage that the number of elements in weigM maûix need only be dnedly proportional to the number of input eiements (rather than the square of this nurnber). Kohonen (1984) has detertnined that asample çire ratio of about 40 times the number of inputs is sufficient for statisticai acairacy-

Page 131

7.4 Consciousness

Variations on this theme have been echoed by many theorists. Omstein ( 199 1) for example. contends that (emphasis added] :

nie rnind is a squadron of simpletons. I t fs not un@ed. it is not mtionnl it is mt well designed - or designed at dl. ... Lÿre the rest of biological evolution. üiehummmindisa ~ r ) ~ o f a d o p t a t r o n s ( t h e p m p e n s i t y t o d o ~ tightthingl

to d13erertf sifilnti0n.s. Our thought is a pack of fixed routines ... . (p. 2)

And once recruited for a purpose. the Ymind in placew perfonns as if it had been there forever. then steps aside to be replaced by another actor. one with

dinérent mernories. priorities, and plans [Omstein. 1991). We succumb to the illusion of a unified consciousness because the swapping of reactions leaves US

unaware that a new and different *mind* is determining Our reactions. In his recent book. provocatively entitled Consciousness ExplaUzed, Dennett

( 199 1 ) underscores and elaborates on Omstein's point:

In oui brains there is colbled-together collection of specialist brain circuits. which. thanks to a famiy of habits inculcated partly by culture and partly by individual self-exploration. conspire together to produce a more or less orderly. more or less effective. more or less weU-designed virtual machine. the J o y m madifne. By yoking these independentiy evolved specialist organs together in a common cause. and thereby gMng their union vastly enhanced powers. this virtual machine. this software of the brain. perfionns a sort of interna1 political miracle: It creates a virtual captain of the crew.

without elevaüng any one of them to long-term dictatorial power. (p. 228)

Both Omsteïn and Dennett are chaiienging Descartes' famous inference that. although dreams. hallucinations. and madness ali dernonstrate that one can vaiidly doubt the report of the senses. one thhg beyond doubt is that there is someone doing the doubting. To quote Bertrand Russeil (1945):

'1 thinkW is [Descartes1 ulümate premïse. Here the word 'r is reaiiy illegitimate: he ought to state his ultimate premise in the form mere are thoughts". The word 'I" is grammaticaiiy convenient. but it dœs not

describe a datum. .. . He nowhere pmes that thoughts need a thinker. nor is there reason to beiieve this except in a grammatical sense. @. 567)

Bomowing De~et t ' s (1981) metaphor. there is no Cartesian theatre. That is.

there is no site where everything ammes toge-. Nor is there an audience! For while. as William James (1890) obsewed. amcimsness is a p m not a thing. it does not foUow. however. '... that Our conscious minds are located at the termination of al1 the inbound processes. just before the initiation of al1 the outbound processes that implement Our actions." (Demett. 1991. p. 10B).

Cbmchsness does mtflow in a stngle s t r m : rather it flows in many streams. each one a specialist:

lnstead of such a single Stream (however wide). there are multiple channelc

in which specialist circuits try. in parallel pandemoniud to do their

various things. creating Multiple Drafts as they go. Most of these

fragmentary drafts of 'narrative' play short-lived roles in the modulation of current activity but some get pramoted to further functional roles. in swift succession. by the actMty of a virtual machine in the brain. ... The basic specialists are part of Our animal heritage. 'lhey were not cievebped to

perform peculiarly human actions. such as reading and writing, but ducking. predator-avoiding. face-recognising, grasping. throwing. berry- picking. and other essentiaï tasks. (Demett 1991. pp. 253-254)

This is a referenca to Selfridge's (1 959) seminai 'Pandemonium" mode1 of cognition.

Rodney Brooks (199 1) has successfully put Dennett's mode1 into practice! Like D e ~ e t t . Brooks contends that functionalism's decomposition of systems into central and peripheral modules has becorne increasingly suspect. But

unlike Dennett. Brooks contends that awrything is both cenbnI and pf iphauf . The result is a layering of systems, operating in paraiiel. each cornpiete in -and- of-itself. and each having its own iniplicit purpose the preconditions of which are continuously matched against events in the real world. in this way ttie

world serws os its ourn niiode& negating the need for explldt represeRtQtiOn. Brooks refers to his implernentation as 'subsumption architecture" because

the layers are combined through (input) suppression and (output) inhibition. Each layer (Le. behaviour) competes for control. the winner king determined by what the robot's sensors detect at any particular moment. As an example. when a sensor indicates that there is an obstacle en route. it tnggers the L E T

FRONT LEC behaviour. if the leg doesn't quite clear the obstacle a sensor in the h e e triggers a LIFT HIGHER routine. If LIFT HIGHER succeeds then the

MOVE FORWARD behaviour takes over and the robot starts to puil itself up. Climbing beha~our unerges fi-om the interaction of these simple behaviours. ?hem was no need to plan a sequence of moves to effect the climbing action. The perception of a purposeful locus of control. Brooks (1991) suggests. derives not from the organism. but rather from the obsemer: 'It is only the observer of the Creature who imputes a central representation or central control. The creature itself has none: it is a coiïection of mrnpeting behaviours. Out of the chaos of their local interactions there emerges. in the eye of an observer. a coherent pattern of behaviour." (pp. 148- 149). Like the elaborate meanderings

of Simon's ant. 'O ampkx behm&r may thus more r e m the aomplexity of the

eRuirO~ttinnthecanplerityoftheorganlsm

' O S i o n (1969) made the point that apparent mmpiexity mey descend ron, from aproblem than hwn th6 system tha! sdves it by desaibing the path of si ant making Ils way a h g a pebbled beach. Afthough the path smms cornplicated as the ant probes, doubles back, clatmnavigates, and zigzags, close sautiny revezùs that these adions are the product of control decisions that are 40th simple and few in number, not deep and mysteriws manifestations of intellectuel power (Nibon, 19n).

Page 134

7.5 mntribu tions

The proposed leaming mudel makes a number of contributions. Apart Born the increase in representational power which functional enhancement irnpiies. the fact that the ENF only uses a single layer of interconnections greatiy simplifies and. thereby. accelerates learning relative to multilayer approaches. In fact. each engrom is encodeci in a smïe itemticm! Add to this the fact that.

as in Kohonen's original novelty filter, the asymptotic state of the proposed leaniing model is achieved in one cycle of application of its input patterns. the ENF promises to be among the fastest of al1 of the leaming paradigms.

Unfortunately the speed with which the ENF learns also raises a problern. For whik the ENF learns in a single trial. organisms [generally) do not. And

since the ENF purports to mode1 biologicai leaming. this would seem to imply that the model is either in some way wrong or. less harshly. incomplete.

That the model is incomplete is readiîy conceded. Perhaps its most glaring omission is that the ENF does not forget. ' ' Fortunately. it's a relatively straight

forward matter to introduce forgetting into the ENFs learning equation (5.29). Ln the sirnplest case. each element decays at a rate proportional to its value:

Here A is a constant positive scalar in the range O c A < 1. More elaborate schemes can. of course. be devised. In general these equations take the form:

where a(.) and p(.) are (possibly nonlinear) functions of x. y. and W. returning scalar values in the range O to 1 .

The proposed leaxning mode1 also makes a contribution with respect to the method by which its learning rate is detemined. Most connectionist models either assume a constant leaming rate. or they decrease their rate over time.12

However the ENF d p a d d l y sets îts rate in accordance with the equation:

Il Nx) il2 (7.03)

where Mx) is the functionally enhanced input pattern. This affords the ENI? a

considerable advantage wer the constant rate models which, in order to avoid overshooting the error surface minimum, are usually forced to severely limit the magnitude of their leaming rate (thereby drastically slowing convergence).

Finally. the observation that the bal magnitude of activation at a junction

appeârs to reflect the degree to which that junction assists in the regeneration effort. suggests a further contribution. For it implfes a method which may be used to cietennine the minimal parameter set (and thereby equation) needed to effect a given stimulus-response mapping. Simply put. the closer a junction's final activation value is to the coding interval rnidpoint. the less information the junction has contributed. It would stand to reason. therefore. that at the rnidpoint activation value (e.g. zero. when bipolar coding is used) the junc tion afTords no usehl information at ail and can be pruned with absolute impunity.

Aithough further research is needed to confhm this insight, if true. then the . .

mUW7d ~ @ l C J equat[o~ (Und- the specjfled meoiod 0ffuncttoTZd enhanopmeno is gIwn by a simple ünear cabûmtion of the sroviuing stimulus terms.

l2 As ai example, Vi the "delta-delta nib" (Jacobs. 1988) the learning rate dateci for each weight is adjusted ori the basis of the equatim:

where Mt) is adiagonal rnabix of leaming nûe valies at tirne t. J is the erm fundion to be minimiseci by the weight update mle, and Mt) is the value of a single weqht a time t, çoII requires a step ske parameter (y ) which is subject to preciçely the same problems the original ieaming rate (p) encountered.

Page 136

But it is perhaps the scope of my undertaking which is its key contribution.

For the situation in 'the cognitive sciences" is. at present. much as Hermann Hesse once descrilied in his utopian classic Ihe Gloss Be& Game:

The astronomers, the classicists, the scholastics. the music students ail played their Games according to their ingenious rules. but the Game had a speciai language and set of rules for every discipline and subdiscipline. ... mey longed for philosophy. for synthesis. The erstwhile happiness of pure withdrawal each into his own discipiine was now felt to be inadequate. Here and there a scholar broke through the barriers of his speciality and tried to advance into the terrain of universality. Some drearned of a new alphabet. a new Ianguage of symbols through which they could fornulate and exchange their new inteliectuai experiences. (pp. 26-27)

~ r r r . r ~ - ~ v ~ r r O O ~ ~ r r r O r O O O O O O O O r ~ r I I I I I I

~ ~ 0 ~ ) 0 0 ~ 0 0 ~ 0 0 0 ~ 0 0 ~ 0 0 ' r 0 ~ ~ 0 0 0 0 0 0 0 0

O O O O O O O c ~ * - ~ O O O r . ~ ~ ~ ~ r ~ ~ ~ - r ~ ~ O O ~ O O I I )

Page 138

Stimulus Response

0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0

Page 139

en dix B: Estimated Tic-Tac -Tm Responses

What foiiows is the recommended response (underiined) to the board States

given in the stimulus portion of the training set (see Appendix A).

Page 140

Page 141

A~mndix C: ENF Source Code Listing

n i e following is the cornplete source code listing (in Symantec C++) for the

proposed leaming model. Only the second-order polynornial variant is shown. The rnatrix class header me (mat1ix.h) provides aU of the object defmitions.

(a) mabtir.h

const int DEFAULT = 2: / / default rnatrix dimensions

class Cvector ( private:

int m-size : double *m-vector:

public: Cvector(int n = DEFAULT): / / constructor Cvec tor(const Cvec tor&}: / I copy constructor -Cvector(): / / destmctor int getSize(){ re tum m-size: ) double& operator()(int i): Wector& operator=(const Cvector&):

l :

class Cmatrix ( private :

in t m-rows : int mdcolumns: double *Vn-rnatrix:

public: CmatrixIint r = DEFAULT. int c = DEFAULT): / / constructor Cmatrix(const Cmatrix&): I / copy constmctor - CmatrixO : / / destructor int getRows()( retum m-rows : ) int ge tColumnsO{ retum rn-columns: ) double& operator()(int r. int c); CmatrYL& operator=(cons t Cmatrix&):

1:

/ / name : Cvectoaint) I I description : Cvector class constnictor

rn-vector = new doubIe[m-size] : assert(m-vector != 0): for (i = 0: i < m-size: i++){

m-vector[i] = 0.0: 1

/ / - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - / / name : Cvectorfcons t Cvector&) / / description : vector class copy constmctor / / - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Cvector: :Cvector(const Cvector& v) : m-size[v.m-size) {

int i:

m-vector = new double[m-size] : assert(m-vector != 0) for (i = O: i < m-size: i++){

m-vector[iJ = v.m-vector[i]: 1

Page 143

11 name : double& operator()(int) I / description : owrload round brackets operator / / / / . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . inline double& Cvector: :operator()(int i) I

retum m-vector[i] : }

CvectorBr Cvector: :operator=(const CvectorSr v) {

int i:

if (this != &v){ deletel] m-vector: m-vector = new double[m-size = v.m-size]; assert(mdvector != 0); for (i = O: i < m-size: i++){

m-vector[i] = v.m-vector[i]: 1

1 return *this:

m-matrix = new double* [m-rows] : assert(m-matrix != 0):

Page 144

for (i = O: i < m-rows: i++K m-matrix[q = new double [m-columns j : assert (m-matrix[i] ! = 0) :

1 for (i = O: i < rn-rows: i++)(

for (I = O: j < m-columns: j++){ m-matrix[i]ü] = 0.0:

} }

1

l I description : matrix class copy constmctor

m-matrix = new double* [m-rows] : assert(m-matcix != 0): for (i = O: i < m-rows: i++){

m-matrix[i] = new double [m-columns] : assert(m-matrixp] != 0):

1 for (i = O: i < m.m-rows: i++){

for ü = 0: j < m.m-columns: j u g m-matrix[i] ü] = m.m-matrix[i] ü] :

1 1

1

/ / name : -Cmatrix() I I description : matrix class destructor

Cma t r i ~ : : - Cmatrix() {

int i;

Page 145

for (i = O: i < rn-rows: i++){ delete [] rn-rnarix[i] :

1 delete [] m-matrix:

/ / name : double& operator()(int.int) / 1 description : overload round brackets operator

inline double& Cmatrix::operator()(int r. int c ) {

return m_matrix[rl[c]; 1

/ / narne : Cmatrix& operator=(const CrnaVixBr) / I description : overload assignment operator

if (this != &m) { for (i = O: i < m-rows: i++){

delete (1 m-rnatrix(i1: 1 delete [] rn-matrix m-rnatrix = new double8[m-rows = m.m-rows]: assert(m-rnam != 0); for (i = O: i c m-rows: i++){

m-matrix[i] = new double[m-colurnns = m.m-columns] ; assert(rn-matrix[i] != 0):

1 for (i = 0: i < m-rows: i++)(

for ü = O: j < m-columns: j++){ m-matrix[il ü] = mm-matrix[i] Dl:

} 1

1 retum *this:

Page 146

1 / for FILE declaration / / for fabs fùnction 1 / matrix class

const int DEGREE = 2: 1 1 degree of tensor expansion const int MAXLENGTH = 10: 1 1 maximum output elements per Line const double PRECISION = 0.0001 : 1 / minimum output value

static int ninputs: static int nexpanded; static int noutputs: static int npattems:

1 1 number of inputs (original) / 1 number in inputs (enhanced) 1 1 number of outputs 1 1 number of patterns

/ / name : factorial(int) 1 1 description : iteraüvely calculate n factorial / / - - - - - - - - - - - - - - - - - * - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

double factorial(int n) {

double count, m:

for (m = count = 1: count <= n: ++count)( m *= count;

1

return m:

/ / . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I l name : binomial-coefRcient(int, int} 1 l description : calculate the binomial coefficient (x over y). / / . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . int binomiaî-coefficient(int x. int y) 1

return (int)(factorial(x) 1 (factorial(x - y) *factorial(y))): 1

Page 147

/ / name : degree-two-tensor(Cmat.r3x&) I I description : tensor expand input patterns (degree 2 )

/ / include threshold

for (p = O: p < npatterns: u p ) ( n = rn: for (i = 1 : i < m: ++il{

for (j = i: j < m: ++j){ input(p.n++) = input(p.i) * input(p.j):

1 }

/ / name : load-inpu t( ) / l description : load input matrix from "filename"

CmatrVr Ioad-inpu t(char *filename) {

FILE *fp: int i. j. m. n:

nexpanded = binomial-coefficient(ninputs + DEGREE. DEGREE): n = nexpanded + noutputs:

Cmatrix input(npattems. n):

for (i = O: i < npattems: ++i){ input(i.0) = 1: for (j = 1; j < ninputs + 1: ++j){

fscanflfp.'W, 8rinputU.j)): 1 for (j = nexpanded: j < n: +j){

fscanflfp.""/of". âinput[i.j)): 1

1

fclose(fp1: retum input:

1

Page 148

/ / threshold activation

/ / name : outpu t-matrix(Cmatrix&, chaP) / / description : output matrix information to "filename" / / - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - void output-rnatrix(Cmatrix& data. char. fiiename) {

FILE *fp: int i. j. m. n:

fprintflfp." %d\nN. m): fprintf(fpqN %d\nm. n):

for (i = O: i < n: ++i)( for (j = O: j < m: uj){

if (j o/, MAXLENGTH == O){ fpu tc(' \ n'. fp):

1 fprintfWp." %+.W. data(i,j)):

1 }

fputc(' \ n', fp): fclose(fp) :

}

/ / name : leamingrde (Cmatrixtk) I I description : leanning mode1 / / . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . void leamingmle(Cmatrix& input )

FILE *fp: double alpha: int i. j. n. m. p:

fp = fopen("training.outm. "w"): assert(fp != N U ) :

rn = input.getRows(): n = input .getColumns():

Cvector output(n): Cmatrix weight(n. n):

for (i = O: i < n: ++i)( weight(i.i) = 1:

1

1 1 number of patterns / / number of elements

1 1 allocate and initialise to zero vector 11 allocate and initialise to zero matrix

1 / initialise weights to identity matrix

for (p = O: p < m: ++pl(

printfIuprocessing pattern: %3d \nn.p):

for (i = O: i < n: ++il( output(i) = 0: for (j = 0: j c n: ++j)(

output(i) += weight(i,j) input(p.j):

if (fabs(output(i)) c PRECISION){ output(i) = 0:

1 1

alpha = 0: for (i = O: i < n: ++i)(

alpha += output(i) Coutput(i): 1 if (alphaK

alpha = 1.0 I alpha: 1

Page 150

for (i = O: i c n: ++iK for (j = O: j < n: +j){

weight(i.j) -= aipha output(i) * outputu): 1

1

fprintflfp."%~d: alpha = %finn. p. aipha):

printf(" \ nwriting rnatrix to 'weight. dat' . . . \ na): ou tputmatrix(weight. "weight. dat"):

/ / name : enhanced-noveltytyfilter(Cmatrix&) I l description : enhanced novelty filter algorithm (degree 2) / / -----------------------------------------.--..-------------------- void enhanced-noveltytyfilter(Cmatrix& input)

degree-twotensor(input): learningmle(input):

1

void main()

char filename [8O] :

ios : : sync-with-s tdio() :

cout << "input flle? ": c h >> filename; cout << end:

Page 151

A~mndix D: The ENF as a Mode1 of Habituation

In an influential paper published Ln 1966. niompson and Spencer identifled nine features of habituation widely obseived to occur in nature:

a decrease in responding with repeated stimulation recovery with rest more rapid rehabituation faster habituation with weaker s tirnuLi

generaiisation of habituation dishabituation by presentation of a strong stimulus to another site habituation of dishabituation fas ter habituation with shorter inter-stimulus in temals subzero habituation

Given that the ENF purports to model habituation. it would seem prudent to consider whether or not the learning algorithm displays these features.

(a) Decreased resmnding with remated stimulation

A decrease in responding with repeated stimulation is the defmitive feature of habituation. It is also the defmitive feature of the proposed learning model. As Kohonen (1984) noted of its progenitor. the novelty filter:

The centrai phenornena that takes place in this system is that the 'filter'. by and by. becomes 'opaque' to the presented pattern: the output wili graduaiiy fade out. As this kind of adaptation occurs for a number of patterns. the operation is very much similar to various aftereffects that are met in the

visuai system: the model will become 'habituated' to the inputs. (p. 118)

Page 152

Recovenr with rest

in neural preparations Kandel (1979) found that dthough a single training session of 10 to 15 tacüie stimuli is usualiy sufficient to produce habituation of the Aphjsia's giil withdrawal reflex. partial recovery can be detected within

an hour. and alrnost complete recwery occurs within a day.' 'Recovery in this

type of leaming is equivalent to forgetting." (ibid. p. 32). Unfortunately the leaming model explored in the present work does not forget. However. as was previously noted. in principle it should be possible to introduce forgetting into the ENF's learning equation. in the simplest case this will take the fom:

where A is a scalar representing the rate at which the model forgets (recovers).

Under this variant spontaneous recovery would indeed be observed. suggèsting that a key mle of forgetiiq may be to enabie rewvery fiwn habüuntion

(c) More r a ~ i d rehabituation and faster habituation with weaker stimuli

Rehabituation is typically more rapid than habituation (Carew et al.. 1972).

In other words. if a systern is again habituated foliowing spontaneous recovery. habituation typically requires fewer stimulus presentations than was originally required. Moreover. contr- to what cornmon sense might suggest. Pinsker et al. ( 1970) aiso discovered that weaker stimuli produce more rapid habituation. in fact. strong stimuli actuaiiy impede habituation. In any event. the fact that theENFaznleamanyputtemIifi tIs~leunderthespecjf iedenhanoement)

in a single iteration means that it neither displays more rapid rehabituation nor faster habituation to weaker stimuli.

'As few as four training sessions produces a memory for the stimulus that lasts for weeks (Kandd, 1979).

Page 153

(dl Generaiisation of habituation

Generalisation of habituation is said to occur when. following habituation to one stimulus there is a decrease in responding to similar stimuli. As was illustrated by the 'Jets' and 'Sharks" example (5.6.2.2). the ENF successfully f o m generalisations. Thus it can be said that the algorithm does display this

feature. at least to the degree to which it c m be said to embody habituation.

(e) Dishabituation bv mesentation of a strone stimulus to another site

An organism's response to habituated stimulus can be quickly restored by

the presentation of a potent second stimulus to another site. This effect. known as 'dishabituation' (Pavlov. 1927). was originaiiy thought to represent the removal of habituation. However recent studies (e.g. Carew et al.. 197 1)

have shown that the two processes are not reciprocal. in fact. dishabituation appears to be a special case of sensitisation (Hawkins. 1989) and. as such. is more related to operant and classical conditioning than it is to habituation. in parücular. dishabituation appears to invoive a new and distinct form of leaming known as acavityàependmt fntiilitBtinn (Kandel et al.. 1983):

Hebb Syname Activitv-Dewndent Facüitation

P Presynaptic Neuron Y Modulatov

Neuron

Presynaptic Postsynaptic Presynaptic Postsynaptic Neuron Neuron Neuron Neuron

mure 1: TWo mechanisrns of leaming. Shading denotes where coincident activity must occur in order to produce an associative change.

Page 154

The interactive nature of the proposed (rth-order polynornial) variant of the

ENF would seem to accord rather weii with th& activity-dependent facilitation based mode1 of dishabituation.' hdeed. it would seem to have anticipated it!

To this point, however. no suitable training-test suite has k e n devised. 'Ihus the status of the ENF with respect to this feature is not known.

(e) Habituation of dishabituation

A dishabituating stimulus tends to become less effective with repeated use (Pinsker et al.. 1970). That is. a dishabituatory stimulus habituates. Since

the ENF does not dishabituate. it cannot (with certainty) be said that it would exhibit habituation of dishabituation. However. given the ENF's past success

in habituating, habituation of dishabituation is anticipated.

(f) l m ~ a c t of the inter-stimulus interval and subzero habituation

Unfortunately the ENF fails to display either of the remaining features. Since the ENF always learns in a single iteration. shorter intervals have no impact on acquisition rate. As to the issue of subzero habituation. not only does the ENF fail to exhibit subzero habituation. it is not at al1 clear exactly

what changes would be needed in order to rnake it do so.' For although the

ENF will reduce network activity generated by an input pattern to zero. at zero al1 leaniing stops. This would seem to imply that habituation cannot solely be

a function of changes in response to extemal feedback (neural output). as it is in the ENF. Intemal (intracellular) feedback. too. must play a part.

And also of sensitisation and (classical) conditionhg (Hawkins, 1989). ' A possible mechanism is offered by Hawla'ns (1989): This effect can be simulated with the additionai assurnption that there is a disaete threshold in the function relating excitatory postsynaptic potential (EPSP) amplitude to behavioural output such that the depression of the sensory neuron-motor neuron EPSP continues after the behavioural response has reached zero." (p. 80).

Page 155

A_pgendix E: Correspondence with Dr. Kohonen

Dear Dr. Kohonen

1 am a Ph.D. student studying Cognitive Science at the University of Toronto. 1 have k e n a great admirer of p u r work for sorne üme now. having spent many in terestlng hours exploring both your self-organising feature de tector and. more recently. LVQ learning aigorithms.

Recently. however. during my investigations into extensions of one of your eariiest leaming rnodels (i.e. the novelty filter) I have nin into some conceptual diflticulties. As my academic training is in neurophysiology and psychology. not mathematics. at fwst 1 thought that 1 was merely

misunderstanding the seemingiy s traight - forward equations . Since we are fortunate enough to have Professor Geofney Hinton on staff at the University of Toronto. 1 took the problem to him. uicredibly he. too. was puzzled! So I

am writing this letter to you with the hope that you can explain where Dr.

Hinton and 1 have erred.

The basic probkm is that according to your activation equation: y = x + Wy.

with x the input vector. y the output vector and W the weight matrix. on the iteration after the output vector goes to zero the output vector again takes on a value (Le. y = x). This means that the weight adjustment equation: AW = - ByyT wiii further modify the weight matrix values. ultimately leading to divergence. Even reducing the learning rate (i.e. B) over t h e . as you suggest. oniy serves to delay the inevitable. What is it that I am not understandingr 1s the activation equation applied on each time step or only on the first? Is the output vector never aiiowed to reach the zero value (by very rapid degeneration of the learning rate or some sirniiar mechanism)? Do you only iterate the residual output through the activation equation after the first time step?

Page 156

Professor Kohonen. I would greatly appreciate your guidance in clearing up m y confusion in this matter. 1 look forward to hearing from you. and 1 thank you in advance for your assistance.

Sincerely.

March 10. 1992

Dear Mr. Yeo,

The problem you presented on the Novelty Filter (NF) is more formal than real. You and Professor Hinton are right that in the ideal f o m of the NF,

the weights diverge: but then you have to take into account the following

facts.

1. Do you ever need the NF in ideal foxm? In practice you can stop leaming when the output stiU has a few percent of 'noveltyu lefi. Notice that the

convergence is very slow at the end. proportional to t - l l 3 (and this law

has been found in biology!).

2. You can always set saturation Mts to the weights. or let them be 'forgotten" like of pp. 116-1 17 of my book. This will elirninate the divergence. and the filter is still a vew good approximation. You would be surprised how weli the NF works men with simple saturation Umits. if you continue leaming.

Page 157

3. In biology. if this p ~ c i p l e exists. there may be thousands of inputs to

every neuron. Then the sum of the synaptic effects may be appreciable.

although the indiviâuaî effects remain small. in artiacial networks people usuaUy want to use srnail dimensions: but I am aware. e g . that Professor Dana Anderson h m Denver has worked out the NF optically. where the

effective dimensionality is very large. and no problerns then exist.

So you see where you are going to end if you take the mathematical formalisms too seriously. I think that we could prove only a few theorems. if we are not alîowed to use the concept of infinity (that can never be reached)! Achiiles and the turtle already demonstrated it.

Would you please show my answer to Professor Hinton. too.

Sincerely .

Page 158

Ackley. D. H.. Hinton. G. E. and Sejnowski. T. J. ( 1985). 'A learning algorithm for Boltzmann machines". Cognttu>e Scfenre. 9. 147- 169.

Alcock. J. ( 1975). Aninml Behaoiouc An EIusWruuy Appmu&. Sinauer Associates, Inc-. Sunderland.

Aîkon. D. L. (1987). M m r y Trar:es in the Bmin. Cambridge University Ress. London.

Amari. S. ( 1977). 'Neural Theory of Association and Concept Formation". Siobgml Cybemetks, Vol. 26. No. 3. 175-185.

Amari. S. ( 1980). Topographie Organisation of Newe Fields'. Bulletin of MatFzemQticQl Blology. VOL. 42. NO. 3. 339-346.

Arnit. D. J. ( 1989). M o d e m Bmin Functiom h e w r l d of atbartor netwrks. Cambridge University Press. Cambridge.

Anderson. J., PeUïonisz. A. and Rosenfeld. E. (eds.) (1990). N m m p u t i n g 2:

DU-ections for Resaudi. The MIT Press. Cambridge.

Arbib. M. A. ( 1989). sSchernas and Neural Networks for Sixth Generation Computing . Journal of Pamüel and Mstributed Computing. Vol. 6. No. 2. 185 - 216.

Bandura. A. (1974). 'Behaviour theosy and the models of man". American

[email protected]. 29: 859-869.

Page 159

Barron. A. R. (1993), 'Universal Approximation Bounds for Superpositions of a Sigrnoid Function", IEEE TZnnssctiom On Information -y. 39. 3.

Barto. A. G.. Sutton. R.S., and Brouwer. P. S. (1981). 'Associative Search Network: A Reinforcement Leaming Associative MernoS. Btîlogiml

Qbemdics. 40.3. 201-21 1.

Bartlett. F. (1958). hinking, G. Allen and Unwin. London.

Bartle tt. F. ( 196 1). Rememberfng. Cambridge University Press. Cambridge.

Bernstein. J. (1981). 'Profiles AI: Marvin Minsky.' The New Yorker. December 14th. 50- 126.

Block, H. D. ( 1962). 'Perceptrons: a mode1 for brain functioning." in J. A.

Anderson and E. Rosenfeld (Eds.) Neun>complrting: Foundatiom of Research. The MïT Press. Cambridge. 1988. 138- 150.

Bobrowski. L. ( 1982). 'Rules of Forming Receptive -Fields of Fomal Neurons During Unsupervised Learning Processes". Bio- Cybemetirs. 43. 1. 23-8.

Boden. M. A. (ed.) (1990). ~ P h i l o s o p h y ojArtJtcrat InteUige~~e. Oxford University Press. Odord.

Boole, G. ( 1854). T7w Laws of Thotght, Dover Publications [nc.. New York.

Brause. R.W. ( 1996). -Sensor Encoding Using Laterai uihibited Self-Organising CeUular Neural Network.", Neuml Networlcs, 9, 1, 99- 120.

Brooks. R. A. ( 1987). 'intelligence without representationw . Artcficinl I n t e ~ m c e 47 (1991), 139- 159.

Page 160

Brown. B. B. (1974). New M M , New Body . Bantam E3ooks hc.. Toronto.

Brown. P. L. and Jenkins. H. M. (1967). 'Conditioned inhibition and excitation

in operant discrimination leaniing". Journal of l3qxmwW Psycfwlogy.

75(2), 255-266.

Brown. T.. Kaitiss. E. and Keenan. C. ( 1990). 'Hebbian Synapses: Biophysical

Mechanisms and Algorithms* . Arviual Rat[au of Neuroscience. 13.475 -5 1 1.

Carnap. R. (1936). Testabiiity and Meaning". PhiIosophy of Science. 3: 419-71.

Changew. J. ( 1985). Neuronal M m h e Biodogy of Mid. Oxford University Press, Odord.

Chamiak. E. and McDermott. D. (1985). Intmduclion to A ~ I Intef@ence.

Addison -Wesley Publishing Co.. Reading.

Churchland. P. M. (1989). A Neurocomprltntinnal PersjxYWe. A Bradford Book: The ha Press. Cambridge.

Churchland. P. S . and Sejnowski. T. J. (1992). The Computational Bmin. A

Bradford Book: The MIT Press. Cambridge.

Cohen. P. and Feigenbaum. E. ( 19821. nie HQRdbOOk of A m InteUgeme (Volume 3). William Kaufkmn. hc., Los Altos.

Cooper. L.. Liberman. F. and Oja. E. (1979). Theory for the Acquisition and Loss of Neuron Specificity in the Visual-Cortex". BioLogfail Cybemetics. 33. 1, 9-28.

Page 161

Copleston. F. ( 1959). A Histo y of Philosophg (Vol. 5 1. An Image Book: Doubleday. New York.

Cottreii. M. (1988). 'Stability and Attractivity in Associative Memoqy Networks". BwlogifaI CgbemeacS, Vol. 58.. No. 2. 129- 139.

Daunicht. W. J. ( 199 1). uAutoassociation and NoveIty Detection by

Neurornechanfcs", Scierrce. Vol. 253, No. 5025. 1289- 1291.

Dawkins. R. ( 1986). The Blind Watdunaker, Penguin Books Ltd.. Middlesex.

de Groot. A. D. (1965). lhoughtand h i n o in chess. The Hague: Mouton.

Dennett. D. C . ( 1969). Content and Comciousness. Routledge and Kegan Paul. Boston.

Dennett. D. C. ( 1978). h e lnfeRfiOml Stance, A Bradford Book: The M F Press. Cambridge.

Denne tt. D. C. ( 1987). Bminstom, A Bradford Book: The MIT Press, Cambridge.

Dennett. D. C. (1991). C ~ R S C ~ O U S R ~ S S D ~ ~ ~ ~ ~ ~ & . Little. Brown and Co. Ltd..

Boston.

Dennett. D. C. (1995). Danoin'sDangerous Idem Euohdionand the Meanuigs of

We. Simon and Schuster. New York.

Dennett. D. C. ( 1996). Kind of Minds . BasicBooks. New York.

Page 162

Deno. C.. Keller. E.L. and CrandaIl. W. F. (1989). 'Dynamical Neural Network

Organisation of the Visual Pursuit Systemw. LEEE TransacfiOns on Biomedical m e r t n g . Vol. 36. NO. 1. 85-92.

Donahoe. J. W. (1988). 'Skinner: The Darwin of ontogeny'?". in A. Charles Catania and Stevan Hamad (Eds.) Ihe Sektion of Behaolour, Cambridge University Press. Cambridge. 36-39.

Easton. P. and Gordon, P. E. (1984) Stabilisation of Hebbian Neural Nets by

Inhibitory Learning . Blobgid Cybemet[cs. Vol. 5 1. No. 1. 1 -9.

Edelman. G. M. (1978). 'Group selection and phasic reentrant signalling: A

theory of higher brain function. in 'Ihe M i n c i ! B r m . G. M. Edelman and V.

B. Mouncastle (eds.). Press. Cambridge.

Edelman. G. M. ( 1987). N-1 Danuinlsm. The 'Iheor y of N e w n a l Gmup Selection, Basic Books Inc., New York.

Edelman. G. M. ( 1989). Ihe RBnanbered hesent. A Biologioal Theog of

CotZSCiOusness. Basic Books hc.. New York.

Edelman. G. M. ( 1992). Bright Air. Brüliant Fire: On the Matter of the Minci. Basic Books inc., New York.

Erickson, R. P. ( 1984). 'On the neural bases of behaviciuf. American ScieRtiSt,

72, 233-241.

Page 163

Feldman. J. ( 1989). 'Neural Representation of Conceptual Knowledge" in Nadel. Cooper. Culicover. and Harnish (Eds. ). Neuml CoRRectiOm. Mental C o m p ~ n . A Bradford Book The ba Press. Cambridge. 69- 103.

Flew. A. ( 1979). A Dictio~uy of rihilosophy. Pan Books Ltd.. London.

Fodor. J. A. ( 1975). ?he Language of hoW. CroweU. New York.

Fodor. J. A. ( 1983). Ihe M o d d a I t g of Mind. A Bradford Book: The MIT Press. Cambridge.

Fodor. J. A. and Wlyshyn. 2. W. ( 1988). 'Connectionism and cognitive architecture: A criticai analpis.'. Cognition. 28. 3-71.

Freernan. W. J. (199 1). The Physiology of Perception" in Scientÿic American.

Feb. 1991. 78-84.

Fukushima. K. and Miyake. S. (1978). -A Self-organising Neural Network with a Function of Associative Mernory: Feedback Type Cognitronw . B w b g m l

Cybemetics. Vol. 28. No. 4. 201-208.

Genis. C T. ( 1989). *Relaxation and Neural Learning: Points of Convergence and Divergencew. Journal of P&e1 and Distributeci Processing. Vol. 6. No. 2. 2 17-244.

Goodman. N. ( 1979). Fad. FLction and ForeOQSf. Harvard University Press. Cambridge.

Graf. P. and Schacter. D. L. (1985). 'hnpiicit and explicit memory for new associations in nomial and amnesic subjects.'. Joumtzl of EKperimental P~ydrobgy: L E C Z ~ & ~ J . Mernioy, and Cognitro~, 1 1.501 -5 18.

Page 164

Gregory. R. L. (198 1). MLnd in Meme, Penguin Books, London.

Groff. P. (1994). What Goes Up Must Corne Down: Behaviourism. Empirical Generaiisation. and Science". in Cognoscenti. Bulletin of Me Toronto CognitSe

Science Socieq/. Toronto. 1994. No. 2. 25-27.

Guttman. N. and Kalish. H. 1. ( 1956). 'Discriminability and stimulus generalisation". Journal of EZrpenmental Fsydu,losy, 5 1. 79-88

Gynther. M. D. ( 1957). "Differential eyeiid conditioning as a function of stimulus similarity and strength of response to the CS". Joumuf of ~ ~ t a l P s @ ~ ~ I o g y . 53.408-416.

Hamming. R. W. ( 1950). Bell Systans Tedinioal Journal. 29. 147.

Harnad. S. ( 199 1). T h e symbol grounding problem" . in Stephanie Forrest (ed. l EInergent Computation. MIT Press. Cambridge. 335-346.

Hasselmo, M. E., Wiison. M. A., Anderson, B. P. and Bower, J. M. (1990).

*Associative Memory Function in Pirifom (Olfactory) Cortex: Computational Modeiüng and Neurophamiacologlr. Coid Springs Harbour Symposia on Qm Biology. Vol. 55. 599-6 10.

Hawkins. R. D. ( 1989). biologicaiiy based computational rnodel for severai simple forms of leafning in R. D. Hawkins and G. H. Bower (Eds.). 1989, Compututioruù Mdek of Lemùng in Simple Naval Sgstems. Acadernic Press. ïnc.. San Diego.

Hawkins. R. D. and Kandel. E. R. (1984). '1s there a ceii biological alphabet for simple for= of iearning?". Psychologtcal Reoiew. 91. 375-391.

Page 165

Hawkins, R. D.. Abrams. T. W.. Carew. T. J.. and Kandel. E. R. (1983). -A

Cellular Mechanism of Classical Conditionhg in Aplysia: ActMty Dependent Arnplitication of Presynaptic Faciiitation". Science. Vol. 1. No. 2. 97- 103.

Hebb. D. 0. ( 1949). The first stage of perception: growth of the assembly" in J. A. Anderson and E. Rcsenfeld (Eds.) N e z u ~ c a n p w : Foundatiom of Research, The MlT Press. Cambridge. 1988.45-56.

Hesse. H. ( 1943). Magfster Ludk 'Ihe Glass Sead Gume . Bantam Books. New York.

Hinton. G. E (1977). RelaxPtion and its d e in utsion Unpublished Doctoral dissertation. UniversiSr of Edinburgh. Edinburgh.

Hinton. G. E ( 1990). 'Mapping part-whole hierarchies into connectionist networks.". in G. E. Hinton (Ed.). Cmertionist SymboI Rmessing, A Bradford

Book: The ha Press. Cambridge. 1990.47-75.

Hirai. Y. (1983). -A Mode1 of Human Associative Processor (HASP)". lEEE Tr~~l~actiOns on Systems, Man. and Cgbemetics. Vol. 13. No. 5. 85 1-857.

Hume. D. ( 1739 1 1962). A TreatiSe of H m Nature. William Collins Sons and Company Limited. Glasgow.

Hume, D . ( 1748 1 1955). An Inquinl Comeming H m Under~tnnding.

Macrniiiian Riblishing Company. New York.

Page 166

Ikeda. N. and Torioka. T. (1990). -A Mode1 of Associative Memory Based on

Adaptive Feature-Detecting Ceiis". LEEE Ttnnsacfions on S y s t m . Man and a b m . Vol. 20. NO. 2.436-443.

Jacobs. R. (1988). 'Increased Rates of Convergence Through Learning Rate Adaptationw. Neuml Networks. 1.4. 295-308.

James. W. ( 1890). îTu? ALnciples of Psychology. Harvard University press. Cambridge.

Jenkins. H. M. ( 196 1 ). The effects of discrimination training on extinction".

Jo,umalof -pirchoIogy, 61.111-121.

Kamin. L. J. ( 1969). 'Predïctability. surprise. attention and conditioning". in B. A. Campbell and R. M. Church (Eds.) Ptcnishment and A m & Behnviour.

Appleton-Century-Crofb. New York: 279-296.

Kandel. E. R. (1979). "Small Systems of Neurons". in Scientific American's he Bmin. W. H. Freeman and Company. New York.

Kantor. J. R ( 1970). -An analysis of the experimental analysis of behaviour.".

Journal of the 0fBehaoi0~r. 13: 101- 108.

Kimble. G. A. ( 196 1 ). Hward and Marquis' Conditioning and Leaming. Appleton- Century-Crofk. New York. 78 -98.

Keynes. J. M. [ 1948). A Tratise on Prr,babifity. Macmillan. London.

Kirkpatrick. S.. Gelatt Jr.. C. D. and Vecchi. M. P. ( 1983). 'Optimisation by simulated annealhg . Science, 220: 67 1 -680.

Page 167

Knuth. D. E. (1973). The Art of Canputer Progmnoning: Fundamental Algorithms

Nolume 1). Addison-Wesley Publishing Co.. Reading.

Koestler. k ( 1964). Ihe Act of Creation. Arkana: The Penguin Group. London.

Koestler. A. ( 1967). nie GhOst in the M m e . Arkana: The Penguin Group. London.

Kohonen. T. ( I97S). 'Correlation matrix mernories. " El523 Trnnsactions on

Cornputers, C-2 1: 353-359.

Kohonen. T. and Oja. E. ( 1976). 'Fast Adaptive Formation of Orthogonalising Filters and Associative Memoxy in Recurrent Networks of Neuron-Like Elements". Biologiml Cybemetics. Voi 2 1. 85 -95.

Kohonen. T.. Reuhkala. E.. Makisara, K. and Vainio. L. ( 1976). uAssociative RecaU of Imagesw. Biological Cybemetics. Vol. 24. No. 4. 181-198.

Kohonen. T.. Lehtio. P. Rovamo. J. Hyarinen. J.. Bry. K.. and Vainio. L.

( 1977). -A Principle of Neural Associative MernoS. Netu~science. Vol. 2. No.

6, 1065-1076.

Kohonen. T. and Oja. E. ( 1987). 'Cornputhg with Neural Networks". Science.

Vol. 235, No. 4793. p. 1227.

Kosslyn. S . M. and Koenig. 0. ( 1992). Wet Mlnd: Ihe New CbgnitWe

N m s c i e n c i e , The Free Press: Macmillan Mc., New York.

Kripke. S. A. (1982). WittqensteinOnRufescrndPrioateLmguage, Harvard University Press. Carnbridg?.

Kuhn. T. S. ( 1977). 'Zhe Essenfial Tension, The University of Chicago Press. Chicago.

Lakoff. G. ( 1987). Wamen. me, ctnd Dangrnus ?hUigs: What CategorTes ReveuI about the Mind. The University of Chicago press. Chicago.

Langton. C. G. (1989). (ed.) A- Llfe, Addison-Wesley Publishing Co..

Reading.

Lorenz, K. ( 1965). Evolrliinn crnd MudSflmtron of Behauiour. University of Chicago Press. Chicago.

brenz. K. ( 1970). Stuàies on Anunal and Human Behaviour. Volumes 1 and 2,

Harvard University FYess. Cambridge.

Launasmaa. O. V.. Hari, R., Joutsinierni. S. L. and Hamalainen, M. (1989).

'Multi-SQ UID Recordings of Human Cerebral Magnetic Fields May Cive Information about Memoqr ProcessesW. Ezmplysics ktters. 9.6. 603-608.

Margolis. H. (1987). Patterns. IhuiMng. anà CognCognitiom A ?heoy of Judgment . The University of Chicago Press. Chicago.

Margulis, L. (198 1). Synzbiosis in C d EvoWn. W. H. Freeman. San Francisco.

Marshall. J. A. ( 1990). Y%%'-Organising Neural Networks for Perception of Visual Motion", NeuralNetwsrks. Vol. 3, No. 1, 45-74.

Page 169

MartineU. G. and Perfetti. R. ( 1994). 'Generalised Cellular Neural-Network for Novelty Detection". EU3 Trnnsacaorzs on C[rnuts and Systems - Plmefamental 2 7 ~ 0 % and AppIfrntul~~. Vol. 41. NO. 2. 187490.

Masters. T. ( 1993). Practfoal Neumi Network Recipes in Cte. Academic Press Inc. Boston.

McCarthy. J. ( 1960). @Recursive Functions of Symboiic Expressions and Their Computation by Machine". Conanunirclt[ons of the ACM. Aprü 1960. 184- 195.

McCleUand. J. L. (198 1). Tetrieving General and Specific Knowledge fkom

Stored Knowledge of Specifics'. Pmedings of the niird ANural C o n f e m of the Cognitiue Science Society. Berkeley.

McCleiiand J. L. and Rumelhart. D. E. (1986). Phrallel Distributeci Prolcessing:

Psydtobgkui unci B i o k g i d Modds . 2. A Bradford Book: The MIT Press. Cambridge.

McCuUoch, W. S . (1965). Enibodfnilenb of M M . The ha Press. Cambridge.

McDemott. D. (1985). "A Critique of Pure Reasonw in M. Broden (Ed.). The

PNlosophy of- Inteü@em?, Oxford University Press. New York. 1990.

206-230.

Mead. C. ( 1990). 'Neuromorphic Electronic Systemsw in Pmeedings of Me E E E , 78. 10.

m e r . G. A.. Galanter. E.. and hibram, K. H. (1960). Pbns and the Structure

of Behavlout, New York: 18. 30.

Miller. N. E. and Carmona. A. (1967). 'Modifkation of a visceral response. salivation in thirsty dogs. by instrumental training with water rewarda. of ~ e a n d P h y s i o ~ P s y d i o l o g y , 63(1): 1-16.

Minsky, M. and Papert. S. (1969). Perceptrr ,~~. The MïT Press, Cambridge.

Minsky. M. and Papert. S. ( 1988). 'Epilog: the new co~ectionism* in Neurooomputrng 2: DU1ecfhns for Researi3h . Anderson. J. A. Peliionisz. A. and

Rosenfeld, E. (eds.). 1990. n i e Press. Cambridge. 583-597.

Minsky. M. ( 1985). 7Re SxieQ of M W Touchstone: Sirnon and Schuster ïnc..

New York.

Neisser, U. ( 1968). The Processes of Vision' in Contempmy Ps yduilogy: Readirigsfmm Scïentjffc Amerinul. W. H. Freeman and Co.. San Ftancisco. 124-131.

Neisser. U. ( 1987). Concepts and Conceptucù Deuebprr~nt. Eco- arad

inteUecfLlalfCtCtOrs In aLfegOrisat[on, Cambridge University Press. Cambridge.

Nilsson. N. J ( 1965). Ihe Mathanah;azI Foundtztbns of Leamirig M m .

Morgan Kaufinann Ribiishers. San Mateo. Ca.

Oja, E. (1977). YAsymptotic solutions of a class of matrix differential equations arising in neural network modelling". IntematioRQL & m l of S y s t m Sdem,

Vol. 8, NO. 10, 1145-1 161.

Page 171

Pavlw. 1. P. ( 1927). Condibioned rejkxes: An inwstigation of the physiulogical

acavity of the oerebml cortac, Oxford University Press. London.

Perkel. D. H. and Bullock. T. H. ( 19681. Nmsciences Raearch Progmrrnne Btrlletin. 6.221.

Pinsker, H.. Kupfemiann. 1.. CasteUucci. T. J.. and Kandel. E. R (1970).

'Habituation and dishabituation of the gill-withdrawl reflex in &lysla".

Science, 167, 1740- 1042.

Popper. K. R. (1963). Cory'ettures and Refutations: 'Ihe Gmwth of scient@

Knowledge. Harper Torchbooks. New York.

Premack. D. (1962). 'Reversibility of the Reinforcement Relation". Science. 136. 255-257.

Purves. D. (1988). BmUiarad Body:ATrophicTheoy of Neuml Comect io~ ,

Harvard University Press. Cambridge.

Pylyshyn, 2. (1989). 'Computing in Cognitive Science'. in M. Posner (M.), FoundQtiOns of Cognitiw Science. Bradford: ha Press. Cambridge. pp 53-9 1.

Quine. W. V. ( 1953). Fhm a Logical EbW of Vtw. Harvard University Press. Cambridge.

Quine. W. V. (1969). 'Naturai Kfnds". in Z M Philosophg of Sdenm. edited by Richard Boyd. Phiiip Gasper and J. D. Trout. A Bradford Book: The MPT

Press. Cambridge. 159- 170.

Quine. W. V. and üilian. J. S . (1970). 77w W b of Sellef. Random House. New York.

Page 172

Quine. W. V. (1974). ?heRoob of -fm, Open Court Publ. Co.. La Salle.

Quine. W. V. (1987). Quidd&s: AR I . t Q EWbophiaal LMionary . The Be-p Press of Harvard University Ress. Cambridge.

Rachlin. H. (1970)lntroductlon to Modem Behaoioulisnz W.H. Freeman & Co.. San Francisco.

Rao. C. R and Mitra, S . K. (1971). GenerCLÜSal Imerse of Mabices and Its ~ ~ R s . Wiley, New York.

Reichenbach. H. ( 1949). 'On the Justification of Inductionw. in Readirags in

PMosopNcal Analysis. Herbert Fei# and Wilfred Seiiars (Ed.). Appleton- Century-Crofb. New York. 324-29.

Reiter. R (1987). 'Nonmonotonic Reasoning" in Howard Shrobe (Ed.) &pCoring

Artifildal ZnteU@ence. Morgan Kauham Publishers ïnc.. San Mateo. 1988.

Rescorla. R. A. and Wagner. A. R. (1972). -A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement." in A.

H. Black and W. F. Prokasy (Eds .) Ciassical Conditioning II: Current Res- and %y. Appleton- -Century-Crofts. New York.

Rühimalci, E.. Hall. L.. Eistola P. and Korppi-Tommola. T. ( 1976).

'Enhancement of AbnormaIity in Gamma Images Using Kohohen Filtersw. EXvope~~iJoumalofNudearMedIdne. Vol. 1. No. 4. 259-262.

Roediger, H. L. and Blaxton. T. A. (1987). 'Effects of varying modality. surface features and retention interval on priming in word-fiagrnent completion' M B T U ) ~ and -TI, 15.379-388.

Rosch. E. (1973). 'On the interna1 structure of perceptuai and semantic categories ." in T. Moore (Ed. ) CXgnübe deoelopnent and Me Mqidsition of ûmguage. New York: Acadernic Press.

Rosch. E. (1973b). -Natuml Categories' in CtgWw Psydùology. 4: 328-350.

Rosch. E. (1978). 'Principles of Categorisation" in E. Rosch and B.B. Lloyds (Eds.) Cognition and Cafegorisation, Lawrence Erlbaum Associates, Wsdale.

Rosenblatt. F. (1958). The perceptron: a probabiiistic mode1 for information storage and organisation in the brainw . Psychologlcnl Revfew . 65. 386-408.

Rosenfield. 1. ( 1988). The immtbn of Memon/. Basic Books uic., New York.

Rumelhart. D. E. and Ortony. A. ( 1977). The representation of knowledge in memory." in R. C. Anderson. R J. Spiro and W. E. Montague (eds.) HaRdbook of expdmentalpsydu,logy, Wiley: New York.

Rumeihart. D. E.. Smolensky. P.. McCleUand. J. L.. and Hinton. G. E. (1986).

'Schernata and Sequential niought Processes in PDP Modelsw. in McCleUand and Rumerhart (eds.) in RmdkC Mstributed m'rrcessing: Drplomtiom. 2. A

Bradford Book: The Press. Cambridge, 7-57.

Rumelhart. D. E. and McClelland J. L. ( 1986). Rzmik1 Distributai Prrrcessing: Fo-m. 1. A Bradford Book: The MlT Press. Cambridge.

Rumelhart. D. E. and McCleiland. J. L. ( 1989). RzmkL Mstriblcted Prrwressing:

A HQndbOOk of M d e l s . lWgmmms. and Eker.cises, 3. A Bradtord Book: The MIT Press. Cambridge.

Rumehart. D. E. and Zipser. D. (1985). Teature Discovery by Cornpetitive Learning", in Cognitiw Sdence. 9, 75-122.

Russeli. B. ( 1945). A H W y of Western PhilosophyI A Touchstone Book: Simon and Shuster. New York.

Sahakian. W. S. ( 1976). LamWg: Systems. MOdeIs and Iheories. Rand McNally Coiiege PubiisNng Co.. Chicago.

Schmitt. S. (1969). Menswing Uncertainty. Addison-Wesley Publ. Co., Reading.

SelMdge. O. G. ( 1958). Tandemonium: a paradign for leaming". in

Mectuulisatim of Thowht Pmesses: Pmœedings of a SImposm Held at the NatioRQL Physics Labomtoy . London.

Simon. H. A. ( 1969). 'he Sdenes of oie Artifidal, nie MIT Press. Cambridge.

Simon. H. A. and Chase. W. C. (1973a) 'Perception in Chess". in H. A. Simon (Ed.). Modeis of Thoughf Yale University Press, New Haven. 1979.386-403.

Simon. H. A. and Chase. W. G. (1973b) The Mind's Eye in Chess*. in H. A.

Simon (Ed.). Modek of hought Yale U*. Press. New Haven. 1979,404-427.

Sinclair. J. D. (1981). l7teRestïWndple: A N e ~ a ~ p h y s i o ~ T h e i o y of Behaufour* Lawrence Erlbaum Associates, Hlllsdale.

Skhmeï. B. F. ( 1938). 'Zhe B&w&ur of Organlsms: An ~ - e Analys&.

Appleton-Century-Crofb. New York.

Skinner. B. F. (1988). Teply to Harnadw in A. Charles C a m a and Stevan Hamad (Eds.) IRe Selecüon of Behaoiow. Cambridge University Press,

Cambrfdge. 468-473.

Slgmns. B. (1986). An fn- to IRdUCtWe Log[c. Wadsworth Publishing Company. Belrnon t. California.

Smith. E. E. and Medin. D. L. (1981). Cafegpries and Concep&. Harvard University Press. Cambridge.

Smolensky. P. ( 1989). 'Connectionist Modelling: Neural Computation / Mental Connections" in Nadel, Cooper, Culicover and Hamish (Eds.1. 1989. Neural

ConnecfiO~s, Mental Compirfnfinri, Bradford: ?he Na Press. Cambridge. 49-67.

Smolensky. P. (1990). 'Co~ectionism and the Foundations of AIw in Partridge and Wills (eds.) ZW Foundutbm of Artfffdal InteOQeme, Cambridge University Press, 1990.306-326.

Smolensky. P (1990b) Tensor Product Variable Binding and the Representation of Symbolic Structures in Comectionist Systems", in Hinton (ed.) ComaaOnist qpnbol fitaxsslng. Bradford: The MIT Press. 1990. 159-2 16.

Sobajic. D. ( 1988). N a v a l Nets for Conbol of Rowr Systems. Ph. D. Thesis. Cornputer Science Dept.. Case Western Resewe University, Cleveland.

Soodak. RE. (1994). Simulation of Visual Cortex Development Under Lid-

Suture Conditions'. Bldogloal Cybemehics. Vol. 70. No. 4. 303-309.

Sutton, R S. and Barto. A. G. (1981). Toward a modem theory of adapthre networks: Expectation and prediction.', Ry&&@xd RRl[ew. 88. 135- 17 1.

Taylor, CC. (1964). The Rpkmatim of BetuwiOw, Routledge and Ke@n Paul. landon.

Temace. H.. S. (1973). 'Classical Conditionhg in J. A. Nevin (Ed.) TRe SMy of Betuwiom Uaming. Motbution, Dnotion, and Instinct, Scott. Foresman and Co.. Brighton, 71-112.

Thompson. R.F. and Spencer. WA. (1966). 'Habituation: A mode1 phenornenon

for the s tudy of neuronal substrates of behavioui. Psychological Reukw. 173.

11-43.

Touretzky, D. S. (1990) 'BoltzCONS: Dynamic Syrnbol Stmctures in a

Connectionist Network' in G. E. Hinton (Ed.). Comedunist Symbol Pmmssing. A Bradford Book: The MIT Press. Cambridge. 1990. 5 -46.

von Bertalanffy. L. ( 1945). -Zu einer allgemeinen Systenlehre". Derbsdie

-fùr Phibsophie. 18. in L. von Bert Janffy (Ed.) Gmemf System

heoy . George BraPller Ioc.. New York. 1968, 54-86.

von Bertalanffy. L. (1968). General System lheoy. George BraziUer. New York.

Wan, E. A. (1994). m e Series Rediction by Ushg a Connectionist Network with internai Delay Lines". in A S Weigend and N A . Gershenfeld (eds.) Tbm

2Mles Rediction, Addison-Wesley Publ. Co. Reading, 195-2 18.

Whitehead. AN. and Russell. B. ( 1925-7). Pr[nbpia Madhematica. Cambridge

University Press: Cambridge.

Wiener. N. ( 1948). Cyberrilefics: or Cmtml and CmttmuntcufiOn in üte Anùmd and

the Machine, The MIT Press. Cambridge.

Widrow. B. and Hoff. M. (196û). aAdaptive SoPitching Circuits'. IRE WESCON Co- record, New York IRE, 96- 104.

Widrow. B. (1988). International Neural Netuiork Sodety Addras.

Boston. (Sept.).

Winston. P. H. (1977). Artÿlcial IiiteUigenoe, Addison-Wesley Publ i shg Co.. Reading.

Wittgenstein. L. (1922). lhctutw Lugicu-Fhibsoophicus. Routledge and Kegan Paul Ltd.. New York.

Wittgenstein, L. (1953). PhilosopMml InUeStigatuwZS . . . English translation by G.

E. M. Anscombe. Basil Blache& Odord, MacMiiîian Co., New York.

Yeo. D. B. (1994). '1s Behaviourism dead?". in Ctgnoscentl= Buüetifz of Me

Tomnb Cogniaue Sdenoe Sodefz~, Toronto. 1994. No. 2, 1- 15.

Young. J. 2. (1986). PhUosophy and the Broin. Odord University Press. Oxford.

Zaidi. Q . and Shapiro. A-G. ( 1993). uAdaptive orthogonalisation of opponent- colour signalsa. BiobghzL Qbemetks, Vol. 69. No. 5-6.4 15-428.3

Zhuang. X.. Huang. Y. and Chen S. (1993). 'Better Leaming for Bidirectional Associatfve Me-, NaoalNehuorks. Vol. 6, No. 8. 1131-1146.

Zuriff. G. E. (1985). Bahaviourisrrt. A C o ~ c e p W Reaxlsbuction. New York: Columbia University Press.

IMAGE EVALUATION TEST TARGET (QA-3)

APPLIED I W G E . lnc - = 1653 East Main Street - I , Rochester. NY 14609 USA -- == Phone: 7 16/482-0300 -- -- - - Fa: i l 61288-5989

O 1993. Appiied image. lnc.. All R~ghts Reserved