Toward Spinozist Robotics: Exploring the Minimal Dynamics of Behavioural Preference

19
http://adb.sagepub.com Adaptive Behavior DOI: 10.1177/1059712307084687 2007; 15; 359 Adaptive Behavior Hiroyuki Iizuka and Ezequiel A. Di Paolo Toward Spinozist Robotics: Exploring the Minimal Dynamics of Behavioral Preference http://adb.sagepub.com/cgi/content/abstract/15/4/359 The online version of this article can be found at: Published by: http://www.sagepublications.com On behalf of: International Society of Adaptive Behavior can be found at: Adaptive Behavior Additional services and information for http://adb.sagepub.com/cgi/alerts Email Alerts: http://adb.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.com Downloaded from

Transcript of Toward Spinozist Robotics: Exploring the Minimal Dynamics of Behavioural Preference

http://adb.sagepub.com

Adaptive Behavior

DOI: 10.1177/1059712307084687 2007; 15; 359 Adaptive Behavior

Hiroyuki Iizuka and Ezequiel A. Di Paolo Toward Spinozist Robotics: Exploring the Minimal Dynamics of Behavioral Preference

http://adb.sagepub.com/cgi/content/abstract/15/4/359 The online version of this article can be found at:

Published by:

http://www.sagepublications.com

On behalf of:

International Society of Adaptive Behavior

can be found at:Adaptive Behavior Additional services and information for

http://adb.sagepub.com/cgi/alerts Email Alerts:

http://adb.sagepub.com/subscriptions Subscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

359

Toward Spinozist Robotics: Exploring the Minimal

Dynamics of Behavioral Preference

Hiroyuki Iizuka1,2, Ezequiel A. Di Paolo1

1Centre for Computational Neuroscience and Robotics, Department of Informatics, University of Sussex Brighton, BN1 9QH, UK2Department of Media Architecture, Future University-Hakodate, 116-2 Kamedanakano-cho, Hakodate, Hokkaido, 041-8655, Japan

A preference is not located anywhere in the agent’s cognitive architecture, but it is rather a constrainingof behavior which is in turn shaped by behavior. Based on this idea, a minimal model of behavioral pref-

erence is proposed. A simulated mobile agent is modeled with a plastic neurocontroller, which holds

two separate high dimensional homeostatic boxes in the space of neural dynamics. An evolutionaryalgorithm is used for creating a link between the boxes and the performance of two different phototactic

behaviors. After evolution, the agent’s performance exhibits some important aspects of behavioral

preferences such as durability and transitions. This article demonstrates (1) the logical consistency ofthe multi-causal view by producing a case study of its viability and providing insights into its dynamical

basis and (2) how durability and transitions arise through the mutual constraining of internal and exter-

nal dynamics in the flow of alternating high and low susceptibility to environmental variations. Implica-tions for modeling autonomy are discussed.

Keywords behavioral preference · homeostatic adaptation · dynamical systems approach

to cognition · evolutionary robotics

1 Introduction

How does an embodied agent develop a stable behav-ioral preference such as a habit of movement, a certainposture, or a predilection for spicy food? Is this devel-opment largely driven by a history of environmentalcontingencies or is it endogenously generated? KurtGoldstein (1934/1995) described preferred behavioras the realization of a reduced subset of all the possi-ble performances available to an organism (in motil-ity, perception, posture, etc.) which are characterizedby a feeling of comfort and correctness in contrast to

non-preferred behavior which is often difficult andclumsy. Merleau-Ponty, following insights from Gestaltpsychology, postulated that bodily habits are formedby resolving tensions along an intentional arc wherethe external situation solicits bodily responses and themeaning of these solicitations depend on the body’shistory and dynamics. The overall tendency is thustowards an optimum or maximal grip on the situation,for example, finding just the right distance to appreci-ate a painting (Merleau-Ponty, 1962, p. 153). In theseviews, the fact that a preferred behavior is observedmore often would be derivative and not central to its

Copyright © 2007 International Society for Adaptive Behavior(2007), Vol 15(4): 359–376.DOI: 10.1177/1059712307084687

Correspondence to: Dr. Hiroyuki Iizuka, Department of Media Architecture, Future University-Hakodate, 116-2 Kamedanakano-cho, Hakodate, Hokkaido, 041-8655, Japan. E-mail: [email protected] Tel.: +44-1273-87-2948

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

360 Adaptive Behavior 15(4)

definition (preferred behavior is often efficient but notnecessarily optimal in any objective sense). Followingthis idea, we define a preference as the strength orcommitment with which a behavioral choice is enacted,which is measurable in terms of its robustness to dif-ferent kinds of perturbations (internal or external).Such a preference is typically sustained through timewithout necessarily being fully invariant, that is, intime it may develop or it may be transformed into adifferent preference. In order to understand such dura-ble states from a dynamical systems perspective, it istherefore convenient to study under what conditionsthese preferences may change since this will revealmore clearly the factors that contribute to their genera-tion.

The word preference has many higher-level cog-nitive connotations that may not be captured by thisminimal definition. On the one hand, we recognize thatthe dynamical systems picture portrayed in this articledoes not do full justice to the richness of the concept(e.g., human preferences involve a complex interactionbetween habits, needs, cultural context and sometimesconflicting values). On the other hand, our purpose isprecisely to explore the minimal dynamical propertiesthat might be shared by many instances of preferredbehaviors. We follow the directions of synthetic mini-malism which has been defended as a useful routetowards clarifying complex ideas in cognitive science(Beer, 1999; Harvey, Di Paolo, Wood, Quinn, & Tuci,2005). Our objective is to achieve such a clarificationof the term preference and related terms such as dispo-sition, tendency, commitment, conation, and so forth,using the language of dynamical systems.

Dynamical systems approaches to cognition havetypically examined processes at the behavioral time-scale such as discrimination, coordination, and learn-ing, (e.g., Beer, 2003; Kelso, 1995) and they have alsobeen deployed to describe changes at developmentaltimescales (Thelen & Smith, 1994). But, of course, thestrength of the approach lies in its potential to unifyphenomena at a large range of timescales. In particular,behavioral preferences and their changes lie betweenthe two scales just mentioned (the behavioral anddevelopmental) and share properties with both of them.Goldstein (1934/1995) has argued that we cannotreally find the originating factors of a preferred behav-ior purely in central or purely in peripheral processes,but that both the organism’s internal dynamics and itswhole situation participate in determining preferences.

In this view, it becomes clear that a preference isnever going to be captured if it is modeled as an inter-nal variable (typically a module called “Motivation”)as in traditional and many modern approaches, butthat a dynamical model needs to encapsulate the mutualconstraining between higher levels of function, such asperformance, and lower processes, such as neuraldynamics (Varela & Thompson, 2003). A preferenceis not “located” anywhere in the agent’s cognitivearchitecture, but it is rather a constraining of behaviorwhich is in turn shaped by behavior. A goal of thiswork is to explore and possibly operationalize howthis might be achieved in concrete terms suitable forfurther hypothesis generation.

Preferences as described here are not typicallyaddressed in minimally cognitive or robotic models.Most artificial agents are designed so as to have no pref-erences, or rather a single preference: that of adequateperformance. Straying from the assigned task into adifferent behavior chosen by the agent itself may be asign of increased autonomy but not a frequent or explicitgoal in current robotics. In this article we present anexploratory model of behavioral preference with theobjective of exploring the assertion of multi-causality.We consider that the minimal requirements to capturethe phenomenon of preference is a situation with twomutually exclusive options of behavioral choice. Anagent should be able to perform either of these options;however, the choice should not be random, but stable,and durable. The choice should not be invariant eitherbut it should eventually switch (in order to study thefactors that contribute to switching). There should bea correspondence between internal dynamical modesand stable behaviors. For this we use the methods ofhomeostatic adaptation to design not only the agent’sperformance but to put additional requirements on thecorresponding internal dynamics. We then examinethe dynamical factors that play a role in sustaining andchanging preferences.

2 A Spinozist Approach

In contrast to functional/computational approaches, adynamical systems perspective on cognition makes itharder to conceptualize intentional terms (such asmotivations, tendencies, goals, emotions, etc.) as func-tional states in the cognitive architecture of an agent(often implemented in computational modules; a prac-

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

Iizuka & Di Paolo Toward Spinozist Robotics 361

tice some people refer to as “boxology”). However,there is not as yet a clear and generally accepted alter-native way to deal with intentional concepts from thedynamical camp. Attempts are not lacking. For instance,Kelso (1995) has suggested that it is possible to describeintentional behavioral changes in terms of transitionsbetween dynamical attractors which correspond to dif-ferent behaviors. The proposed process would beachieved through the successive stabilization and desta-bilization of attractors. In this view, intentions emergeas the order parameters of self-organized bodily andneural dynamics leading to behavior when coupledwith the environment. A similar idea has been pro-posed by Thelen and Smith (1994) who use the changesof attractor stability to explain the development ofbehaviors both at the developmental and behavioraltimescales. In their view, the landscape of the stabilitychanges depending on ontogenetic processes. Juarrero(1999) offers a related view where prior intentions areunderstood as the setting up of a dynamical landscapeof attractors (e.g., through recursive self-organizingactivity on the system’s constraints) and proximateintentions imply the local selection of such dynamicalalternatives. All these views share some problems,such as the problem of how the subject or agent towhom intentions might be attributed is itself dynami-cally constituted, that is, who is the agent that re-shapes a dynamical landscape so that we may speak ofintentions as belonging to it. Nevertheless, the centralintuition of dynamical re-shaping is appealing andworthy of expansion and clarification.

An important point shared by both intentionalbehavior for Kelso and Juarrero and developmentalprocesses for Thelen and Smith is that they cannot beseparated from the ongoing behavioral dynamics, thatis, the systems engagement with the environment.Given that these are processes that themselves mayalter the dynamics of interaction with the environ-ment, the emerging picture is one of mutual modula-tion, or circular causality (Clark, 1997; Thompson &Varela, 2001) where intentional and developmentalprocesses re-parameterize behavior and, in turn, inter-action with the environment constrains and modulatesintentions and development.

Dynamical views similar to those of Kelso andJuarrero have been explored in robotics. For instance,in the work of Ito, Noda, Hoshino, and Tani (2006) ahumanoid robot shows transitions between two differ-ent ball-handling behaviors using a special kind of

neurocontroller in which specific nodes are trainedto have an association with each behavior. Transi-tions are demonstrated to occur when the robot isinterrupted by a person, or some physical irregu-larity, changing the position of the ball. Althoughthe functioning of the system may be conceived interms of a re-organization of the attractor landscape,transitions occur through external perturbations. This,unfortunately, does not allow the evaluation of morespontaneous forms of switching in terms of the dynam-ical account of mutual constraining between interac-tive and neural levels that we want to explore in thisarticle.

Following this idea, in this article we suggest thata way of approaching a study of preference is by embed-ding the circular causality between internal organiza-tion and interaction in an embodied agent through theapplication of homeostatically driven neural plasticity(further justified below). Even though this idea is notdisconnected from those of Kelso, Juarrero, and The-len and Smith, the view in our proposed approach isdifferent from the attractor model in the followingimportant sense.

Suppose that a dynamical system has two attrac-tors, each of which corresponds to a different behav-ior. If a transition between them occurs, it could onlybe caused when noise or perturbations to the dynamicsare strong enough to detach the trajectory from thecurrent attractor. As Kelso discusses, this would notconstitute a proper intentional change because thebehavioral transition depends only on a random eventand would not result in durable behavior as in the caseof a preference. Conversely, the system cannot changeattractors without noise, which means that in such acase it would produce a same behavior permanently.

Therefore, there are problems in explaining pref-erences in behavior as the switching between attrac-tors in the proper dynamical systems sense. It shouldbe noted that we are not denying noise and randomevents may play a role in triggering the transitions butinsisting on the significance of transitions caused byinternal factors, which can be seen as higher-leveldynamics as in Kelso’s and Thelen and Smith’s expla-nations. A proper intention, motivation, or alterationof a preference should also be dependent on endog-enous conditions, such as changes to the dynamicallandscape itself as a result of a history of behavioralinteractions with the environment in non-arbitraryways. By explicitly implementing the circular causal-

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

362 Adaptive Behavior 15(4)

ity between inner stability and external behavior intoour model through the use of homeostatic mechanisms(as only one possibility for achieving this), importantaspects of preferences such as durability and transi-tions between behaviors can be captured.

However, the choice of homeostatic adaptation tomodel the dynamics of preference still requires a strongerjustification.

A view on preferences, dispositions and tenden-cies that is amenable to a dynamical interpretation isSpinoza’s doctrine of conatus or striving developed inhis Ethics (IIIp4–7). This view serves as an inspirationfor how these issues are addressed in our model. Weread in part III proposition 6: “Each thing, as far as it canby its own power, strives to persevere in its being.”There are several issues concerning this doctrine dis-cussed by Spinoza scholars, especially the unarguedproposition IIIp4 that states that “No thing can bedestroyed except through an external cause” (Matson,1977); one might think of a bomb as a counter-exam-ple. However, in the context of cognition a dynamicalsystems approach is highly compatible with views thatseek to establish a continuity between life and mind,(Bourgine & Stewart, 2004; Di Paolo, 2005; Jonas, 1966;Maturana & Varela, 1980; Stewart, 1992; Weber &Varela, 2002; Wheeler, 1997). The essence or beingof a living system is its self-producing organization(autopoiesis) and it so happens that IIIp4 is indeedtrue of living systems at least at a minimal level oforganization, (Di Paolo, 2005)—even though this isnot such an obvious statement for more complex sys-tems beyond bare autopoiesis; think of autoimmunediseases. Hence, we may assume that the striving doc-trine is applicable to minimal cognitive systems as welland so we can use this idea as a regulative principleinforming our approach (complex cases such as cul-turally embedded human cognition may present spe-cial kinds of problems such as the origin of conflictingvalues; we are not considering these possibilities here).

So the question of preference becomes the ques-tion of understanding an agent’s changing conatus.For Spinoza, this would directly relate to the agent’scurrent being. A hungry animal strives for food and athirsty one for water. Conatus resolves in the interac-tional domain (actions and perceptions) a tension thatoriginates in the internal constitutive domain (an inter-nal need). To develop such ideas in dynamical termswe must define in the dynamical organization of theagent a condition of tension and satisfaction where the

first is defined as being in a state that leads to altera-tions in the system’s organization (its essence or beingin Spinozist terms) and the second as being in a statewhere the system’s organization remains invariant.This closely follows Ashby’s (1960) idea of an ultra-stable system. Conatus is therefore, on a first approxi-mation, the interactive and internal striving of anultra-stable system to remain in those states that sat-isfy the condition of invariant organization. This neednot be interpreted teleologically even though the termstension and satisfaction have not been chosen inno-cently (for a discussion regarding the conatus doctrineand teleology see: Bennett, 1984, 1992; Curley, 1990).This view will be refined after discussing the resultsfrom our model.

Finally, in order to sharpen the contrast of theview we are proposing to previous ones, a relation toaction selection should be mentioned. The actionselection literature is concerned with how animals,robots or simulated agents can solve the ongoing prob-lem of choosing what to do next in order to achievetheir objectives (Humphrys, 1997; Bryson, 2003).Agents are given several options of action units inadvance and they must decide which one should betaken next and how they should be ordered and com-bined. In other words, in action selection models thesensorimotor coupling at the lower level is fixed anddiscretized in order to set the action units and the aimis to decide how they should be chosen for the plan-ning of the higher-level goals. Roughly, this meansthat planning as a higher-level description is separablefrom the ongoing actions as a lower-level descriptionalthough this may depend on how the action units aredefined. It might be rather more precise to say that thiskind of work tries to illuminate our planning intelligencein an ecological multi-objective context at the cost ofsimplifying (even removing) the effects of the sensori-motor dynamics on the system’s changing organization.This can be seen even in work on embodied roboticswhich models intentional terms such as preferences,motivations and goals as certain variables or “com-partmentalized” functions named emotions or motiva-tions (Breazeal (Ferrell), 1998; Velásquez, 1997). Ourview is different from the idea expressed in mostaction selection models in that planning is not separa-ble from actions (see Ogai and Ikegami, forthcoming).Producing actions based on sensorimotor couplingorganizes and preserves plans, and in turn plans regu-late the sensorimotor flows. It should be clear that our

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

Iizuka & Di Paolo Toward Spinozist Robotics 363

motivations are also different although we wouldexpect that future developments of our work may inter-act more closely with problems in action selection.

3 Homeostatic Adaptation in Neural Controllers

A salient feature of the homeostatic adaptation modelproposed by Di Paolo (2000) is that local plastic adap-tive mechanisms work only when neural activationsmove out of a prescribed region; an idea inspired byAshby’s homeostat (Ashby, 1960). Such a mechanismhas been implemented in a neurocontrolled simulatedvehicle evolved with a fitness function rewarding pho-totaxis and the maintenance of neural activationswithin the homeostatic region. The use of intermittentplasticity in combination with this selective pressureallows for the evolution of a novel kind of couplingbetween internal and environmental dynamics. Oncethe neurocontroller gives rise to behavioral coordina-tion within a given environmental situation that resultsin internal stability, synaptic weight changes no longerhappen. If the situation changes, such as in an inver-sion of the visual field or some other perturbation, thiscauses a breakdown of coordination, which meansthat the neural activations cannot in general be main-tained within the homeostatic region. As this happens,the local adaptive mechanism is activated until it findsa new structure (synaptic weight values) which cansustain the activations within the homeostatic regionand (very likely, though not necessarily) re-form thebehavioral coordination. As a result, the agent canadapt to perturbations it has never experienced before.

This approach aims at creating a high dimensionalbounded set, or box, in phase space that correspondsto the neural homeostasis often linked to suitable per-formance. The dynamics within the box is stable inthat if trajectories go out of the bounds, the network’sown configuration, by design, will change plasticallyuntil they come back into the box again. If trajectoriesremain within the box, the system’s configuration nolonger changes. This plasticity-dynamics relation makesit possible for coordination to occur (under suitablecircumstances) between a higher function, phototacticbehavior in this case, and the process that regulatesthe sensorimotor flows. If the behavior cannot beachieved, homeostatic adaptation attempts to find newsensorimotor flows. This is therefore a concrete exam-

ple of the circular constraining between levels men-tioned above.1

In the current context, we extend this property tocreate a model of behavioral preference. Our idea isthat if the system holds two separate high dimensionalboxes in the space of neural dynamics which are asso-ciated with performing different behaviors, for sim-plicity’s sake phototaxis towards different light sourcesA and B, a preference could be formed by the dynami-cal transitions that select which box the dynamics gointo and stay in. This provides one of our requirementsfor talking about preference, that of durability (bot-tom-up construction of the stability). Once a behavioris formed, due to the stability in a box, the systemkeeps performing that behavior while ignoring otherbehavioral possibilities. This plays the role of a spon-taneous top-down constraint that regulates the sensori-motor flows. However, some disturbances mighteventually cause a breakdown of the stability and thenanother behavior can be reconstructed through thehomeostatic adaptive mechanisms. Since by design,the system has another region of high stability, the sys-tem will be likely to switch into it and then start enact-ing its other behavioral option. In this way, behaviorcan switch because of the corresponding transitionsbetween two boxes. We expect to see both spontane-ous and externally-induced transitions from the view-point of the top-down and bottom-up construction ordestruction of durable but impermanent dynamicalmodes. Here we find our second requirement, that ofthe possibility of transformation, or change in prefer-ence. Except for considerations of symmetry, theplacement of these high-dimensional boxes is arbi-trary in the present model (see discussion).

4 Model

Our proposed minimal model of behavioral preferenceextends the homeostatic adaptation model. The idea isimplemented in a simulated mobile agent with a plas-tic neural controller containing two separated, high-dimensional homeostatic regions. The simulated robotis faced with two different kinds of light as mutuallyexclusive options of behavioral choice. It must visitone of them and this “choice” must correspond to theinternal dynamics being contained in the correspond-ing homeostatic box. An evolutionary algorithm isused to design the neurocontroller. Agents are evalu-

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

364 Adaptive Behavior 15(4)

ated on each type of light separately, and on both typessimultaneously, in which case one of them is pre-sented as blinking. The agent must approach the non-blinking light, (see below). After evolution, this cue isremoved and the agent’s preference is investigated bypresenting two constant, non-blinking lights of eachtype simultaneously.

Agent. An agent is modeled as a simulated wheeledrobot with a circular body of radius 4 units and twodiametrically opposed motors. The motors can drivethe agent backwards and forwards in a 2-D unlimitedplane. Agents have a very small mass, so that the motoroutput directly determines the tangential velocity atthe point of the body where the motor is located. Thetranslational movement of the robot is calculatedusing the velocity of its center of mass, which is sim-ply the sum of the two tangential velocities and therotational movement is calculated by dividing the dif-ference of the two motor outputs by the diameter(leading to a maximum translation speed of 2.0 unitsand rotational speed of 0.25 radians at each Euler timestep). The agent has two pairs of sensors for two dif-ferent light sources, A and B, mounted at angles of π/3radians to the forward direction. The two lights do notinterfere with each other and the model includes theshadows produced by the agent’s body.

Plastic controller. A fully connected continuous-timerecurrent neural network (CTRNN; Beer, 1990) isused as the agent’s controller. The time evolution of thestates of neurons is expressed by:

(1)

where yi represents the cell potential of neuron i, zi isthe firing rate, τi (range [0.4, 4]) is its time constant, bi

(range [–3, 3]) is a bias term, and wji (range [–8, 8]) isthe strength of the connection from the neuron, j, to i.Ii represents the sensory input, which is given to onlysensory neurons. The number of neurons, N, is set to10 in this article, four of which are assigned to sensoryneurons, that is, two neurons for each light sensor.The sensory input is calculated by multiplying thelocal light intensity by a gain parameter (range [0.01,

10]). The decay of the light intensity follows a sig-moid function in order to avoid the situation in whichit becomes too strong when the agent is very close tothe light. There is one effector neuron for controllingthe activity of each motor. Similarly, the motor outputis calculated from the firing rate of the effector neu-ron, which is mapped into a range [–1, 1] and is thenmultiplied by a gain parameter (range [0.01, 10]). Allfree parameters are determined genetically.

A plastic mechanism allows for the lifetime mod-ification of the connection weights between neurons.The homeostatic regions are described by a plasticityfunction of the firing rate of the post-synaptic neuron;this function determines the strength of change of allincoming weights. The plastic function is 0 in the home-ostatic regions, which means no plasticity. To assignthe different phototactic behaviors to different boxesin our extended model, two separated regions are arbi-trarily set corresponding to firing rates of [0.15, 0.4]and [0.6, 0.85] as shown in Figure 1 and for each neu-ron each region is arbitrarily assigned for phototaxis Aor B at the start of the evolutionary run. To reducebias, the upper (bottom) region of half of the internalneurons is assigned for phototaxis A (B), and the otherregion is for phototaxis B (A). These assignmentsremain the same for the evaluations during the evolu-tionary run. However, the scheme with two separateregions is not applied for input and output neuronsbecause this would introduce biases in the input sensi-tivity as well as prescribe particular styles of move-ment. For input and output neurons, a function whichhas a single homeostatic region is applied (Figure 1b).

Weight change follows a Hebbian rule which alsodepends linearly on the firing rate of the pre-synapticneuron and a learning rate parameter. Weights fromneuron i to j are updated according to:

∆wij = ηijzi p(zj) (2)

where zi and zj are the firing rates of pre- and post-syn-aptic neurons, respectively, ∆wij is the change per unitof time to wij, p(x) is the plastic function (see Figure 1),and ηij is a rate of change (range [0, 0.9]), which isgenetically set for each connection. For simplicity, werestrict this parameter to a positive range, so that theproduct of this value and the plastic function alwaysworks in the direction of returning the flow into thehomeostatic region. For example, if the firing rate ofneuron j, zj, is between [0, 0.15] or [0.5, 0.6], the plas-

τiy·i yi– wjizj yj( )

j 1=

N

∑ Ii,+ +=

zi x( ) 1 1 ex– bi–

+ ⁄ ,=

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

Iizuka & Di Paolo Toward Spinozist Robotics 365

tic function returns a positive value and the weightchange is calculated by multiplying the firing rate ofpre-synaptic neuron i and ηij, which are positive. Thismeans that the weight change works to increase thefiring rate zj. If the firing rate is between [0.4, 0.5] or[0.85, 1.0], the plastic function returns a negative value.In this case, the weight change works in the oppositedirection to decrease the firing rate.

This implementation for the plasticity rule is notthe only possible way to realize neural homeostasis.Other studies show different forms of neural plasticity(Di Paolo, 2000; Williams, 2004). However, we use thesimplest implementation for our current purposes inthat plasticity always works in the direction of stabil-ity within the boxes. The validity of stipulating thehomeostatic regions in this way will be addressed laterin the discussion.

The time evolution of agent’s navigation and plas-tic neural network are computed using a Euler methodwith a time step of 0.1.

4.1 Evolutionary Setup

A population of 60 agents is evolved using a rank-based genetic algorithm with elitism. All networkparameters, wij, τi, bi, ηij and the gains are representedby a real-valued vector ([0,1]) which is decoded line-arly to the range corresponding to the parameters(with the exception of gain values which are exponen-tially scaled). Crossover and vector mutation operators,

which add a small random vector to the real-valuedgenotype (Beer, 1996), are used. The best six (10%)agents of the population are kept without change. Halfof the remaining slots are filled by randomly matingagents from the previous generation according to rank,and the other half by mutated copies of agents fromthe previous generation also selected according torank.

The agents are evaluated under 4 different situa-tions: a single light A, a single light B, two-lights-A,two-lights-B. The task of a single light A (B) consists ofthe serial presentation of eight distant light sources oftype A (B) which the agent must approach in turn andremain close to. Only one source is presented at a timefor a period, called a trial, of random duration drawnfrom the interval [700, 1000] update steps. In contrast,under the task of two-lights-A and two-lights-B, twodifferent light sources, A and B, are presented simulta-neously, one is blinking, and the other is constant. Theagent gains fitness by approaching the latter. Our aimin putting a blinking light as a dummy is to encouragethe agent to get to the target while in the presence ofdistracting stimuli that it must ignore. Otherwise, theagent would not get a chance to experience the simul-taneous presence of both sources of stimuli and facethe problem of picking one as the target.

The blinking light flickers with a 15% probabilityat every timestep. As well as the single light task, thetwo-lights task consists of the serial presentation ofeight pairs of distant light sources, which are π/2 apart

Figure 1 Plastic facilitation as a function of firing rate for (a) internal neurons, and (b) input/output neurons.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

366 Adaptive Behavior 15(4)

from each other from the agent’s point of view. Thelength of the trial is chosen in the same way as in thesingle light task. After a trial, lights are extinguishedand new ones appear at a random distance, [100,150].

Each individual agent is tested for 12 independentruns in total, that is, three independent runs for eachtask. At the beginning of each run, the synaptic weightsare reset to the initial values. Each run consists of eighttrials and only the last three of those are evaluated inorder not to penalize slow plastic changes.

Fitness is calculated based on three terms. FD cor-responds to the proportion of reduction between thefinal (Df) and initial (Di) distance to a target, 1–Df/Di.Fp indicates the proportion of time that the agentspends within a distance less than 15 to a target duringa trial. FH represents the average score of neural home-ostasis. For each timestep that a neuron fires homeo-statically within the region corresponding to the targetlight, a counter is incremented by one. If it is within thehomeostatic region assigned for the non-target light,the counter is not incremented and for all other regionsthe counter is incremented by 0.5. These counters arethen averaged for all neurons and over the whole trial.Selecting for high FH will tend to create an associationbetween each homeostatic region and the correspond-ing phototactic behaviors. For each trial, the total fit-ness (FD + Fp)FH is calculated and then averaged overall 12 runs.

5 Results

The two phototactic behaviors can be easily evolved;however, it is more difficult to obtain agents that areable to associate the behaviors with the two homeo-static regions and to maintain the internal dynam-ics within the regions. The evolved agents can behighly sensitive and dependent on the history of theinteractions. Since our purpose in this article is toexplore how the high level concept of preference canbe described in terms of neural and sensorimotordynamics and to demonstrate the logical consistencyof a multi-causal view, we restrict our analysis to thein-depth study of a single successful agent. This willallow us to generate a more concrete hypothesis thatfurther modeling (including statistical analysis acrossmany runs) and specific empirical studies can investi-gate.

5.1 Basic Phototactic Behaviors

First, in order to check for long-term stability of thetwo phototactic behaviors and maintenance of theinternal dynamics, the agent is tested for longer suc-cessions of lights (only eight were evaluated duringevolution). In the case of interacting with a single lightA, or two-lights-A (constant light A and blinking lightB), the agent shows a long-term stability (more than100 lights) for phototaxis A and the maintenance of theinternal dynamics. On the other hand, in the case ofinteracting with a single light B, or two-lights-B (blink-ing light A and constant light B), the stability is lessthan in the former case. One or two internal neuronssometimes stay within the homeostatic region for lightA even when approaching light B. Even so, this is not amajor problem for our purposes because there is still adifference in stable states of the other neurons betweentwo types of phototaxis. In terms of behavioral pat-terns, the agent typically goes straight to the targetlight.

Due to the weaker stability of phototaxis B, thefollowing experiments were run for fewer than 100successive lights, which was long enough to show thepreference.

5.2 Transitions

When lights are presented simultaneously during evo-lution, a cue telling the agent which light to visit isgiven by the difference between the lights (constant orblinking). A choice under this condition can be regardedas being environmentally pre-determined before theagent starts interacting. The agent is now tested undera situation (unseen during evolution) where no cue isgiven, that is, by making both lights constant.

Figure 2 shows, for 100 consecutive trials, the finaldistances to both light sources at the end of each trial(very short distance means the agent has approachedthe light at the end) and how long the internal dynam-ics stay within each homeostatic region on averageover the whole trial. As can be seen, the agent always“chooses” one of the two lights, it never stays inbetween the two. This is not trivial because the two-light situation has not been experienced by the agentand it might have produced oscillations or deadlocksas a result of competition between sensory inputs. It islikely that the conditions for such behaviors are rare,however. It is also shown that the selection is neither

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

Iizuka & Di Paolo Toward Spinozist Robotics 367

random nor permanent but durable or quasi-stable(only four changes are counted in this run of 100 pres-entations).

Examples of behavioral patterns are shown inFigure 3 when interacting with the first, the 10th, andthe 30th pair of lights. At the first pair, the agent takesa side trip while ignoring both lights. This is normaland corresponds to an initial period where plastic rulesare settling the initially random weight values. The 10thand 30th presentation are typical examples of approach-ing patterns to light A or B while ignoring the other.These are similar patterns to those observed in the tasksituations during the evolution (single lights, and twolights, one blinking).

Stopping plasticity. In order to confirm whether theplastic mechanism plays a role in the transition fromone type of phototaxis to the other, plasticity is turned

off at different points during the same 100 sequence oflight presentations. The experiment is started with thesame initial configuration of Figure 2 and plasticity isturned off (weights remain at their current value) atthe 20th or 50th presentation before the transitions takeplace. Figure 4 shows the distance to both light sourcesand the proportion of homeostasis.

As can be seen in both cases, the agent’s behaviordoes not switch in the period tested; it keeps approach-ing the light it preferred before plasticity was stopped.While the behaviors are sustained, the internal dynam-ics is also maintained within the each region as muchas before plasticity was stopped.

It should be noted that it is possible for a CTRNNto switch the internal dynamics into another regionwithout changing the network weights (e.g., Phattan-asri, Chiel, & Beer, 2007). If so, it means that theagent’s behavior transition might be happening with-

Figure 2 Left: Final distance to each light at the end of trials on serial presentations of 100 pairs of constant lights.Right: Proportion of neurons that have stayed within the homeostatic region for each light in correspondence with trialson the left.

Figure 3 Spatial trajectories at the 1st, the 10th and the 30th presentation in Figure 2.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

368 Adaptive Behavior 15(4)

out explicit synaptic plasticity. However, it is clearthat, at least in the examples explored, non-plastic tran-sitions are not common. After stopping the plastic mech-anism, neither the behaviors nor the internal dynamicsseem to change. This shows that the homeostatic adapta-tion works as required and the homeostatic regions areassociated with the phototactic behaviors as weexpected even in novel situation with both sourcesconstantly emitting light.

5.3 What Makes a Preference Change?

It was shown that plasticity drastically affects the pos-sibility of behavioral transitions but this does notmean that the transitions are caused solely by factorsinternal to the agent. Generally, it is difficult to dis-cuss causation in the context of complex embodied

dynamical systems. However, in this section, we try tomake a distinction between endogenous dynamics andexternally driven interactions in terms of the suscepti-bility to the environment in order to study the effect ofdifferent factors affecting the switch of preference.

5.3.1 Persistence of Preference We investigate thepersistence of the preference for a light type. In thisexperiment, the positions of the two lights areswapped at some point during the approaching behav-ior. If the agent has a consistent preference, the agentshould seek the light it prefers and re-approach the tar-get in its new position. This procedure resembles theone applied by Beer (2003) to study the agent’s com-mitment to a behavioral outcome in his work on mini-mal categorical discrimination.

Figure 4 Final distances and proportion of homeostatic neurons when plasticity is stopped after the 20th or the 50thpresentation. Other settings are the same as in Figure 2.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

Iizuka & Di Paolo Toward Spinozist Robotics 369

The experiment was performed by using the sameagent as before, which has interacted with the sameenvironment as in Figure 2 until the 50th presentationat a point where it exhibits a preference for light A. Inthis trial, it takes approximately 750 timesteps for theagent to reach the target.

Starting from the same conditions, we investigatedin detail the effect of a single position swap occurringat a timestep t during this trial. The approached light isrecorded for all tested values of t. Figure 5 shows thefinal distance to the lights as a function of the timingwhere positions are swapped. It also shows the agent’strajectory when the light sources were swapped at t =150 as an example.

This trial is taken from a period where the agenthas a preference for light A, which means if nothing hap-pens, the agent will approach light A. The result showsthat swapping at the earlier timesteps allows the agentto persist in its preference and approach the target

light. It can be considered that there is no big differ-ence between the stimulus strengths of the two lights atearlier timesteps and the agent can still “pick” whichlight to approach, following its preference. For positionswaps occurring later, the change in environmental fac-tors becomes stronger and the agent will approach A or Bdepending on its current orientation. If positions areswapped after 350 timesteps, the agent is close enough toreceive a very strong stimulus from light B after swap-ping, which seems to produce a transition, and so theagent’s behavior is now to stay around light B. Thissingle instance is enough to show that strong enoughvariations in environmental factors (e.g., orientation,strength of stimuli) can produce a change of preferencein the agent’s behavior. Interestingly, there is an indeter-minate period around t = 340 where phototaxis A and Bare mixed. We have not investigated this period deeplyyet but it could be expected that ambiguous environmen-tal factors and/or internal dynamics produce a conflict.

Figure 5 Right: Distance to the lights (squares for light A and crosses for B) as a function of the time when light posi-tions are swapped. The agent has interacted until the 50th presentation shown in Figure 2. The distances are recordedafter 800 timesteps. Left: Trajectory when the light is swapped at 150 timesteps. The upper plot shows the original tra-jectory (lack of smootheness indicates sharp angular turns).

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

370 Adaptive Behavior 15(4)

The swapping experiments show both the persist-ence of the preference and the influence of the environ-ment. To study the persistence more clearly, anotherexperiment is performed. The swapping experimentsare reproduced as before but in addition one of the twolights vanishes. The following three cases are tested:(B → A, A → x), (A → B, B → x), and (A → x, B →B). The distances to the only remaining light after 800timesteps are shown in Figure 6.

The notation (B → A, A → x) indicates that, attime t, the position of light B is changed to that of lightA and the original light A disappears. Therefore, theagent suddenly sees light B on the way to light A (andno other lights). As with the swapping experiments, ifthe agent is close enough to get a strong stimulus fromlight B, the agent changes the preference to light B andremains close to it, which can be seen in the corre-sponding plot. However, at earlier values of the swap-ping time t, the agent does not approach light B evenif the agent is on the way to the position where A was.Notice that the behavior of approaching a single lighthas been explicitly selected during evolution. This is asituation where the agent should approach whateversingle light is available. In spite of this, the agent doesnot approach the solitary light B. This is very strongevidence in support of an endogenous sustaining ofthe A preference.

In the case of (A → B, B → x), the agent keepsapproaching light A while following the preferenceeven if the light is placed in a different position. Thisis expected. In contrast, in the case of (A → x, B →B), light A disappears and light B remains in the sameposition. Both in this and the previous case only onelight remains at the original B position. The only dif-

ference is that the light is A in the previous case and Bin this one. Any other neural states, body direction,and so forth, are same. Therefore the agent receivesthe same intensity of light B in the latter case as of Ain the former, where the agent could eventually locatelight A. This implies that the agent must now be ableto sense light B. However, the agent does not approachit for most values of t. As in the first case, this resultshows a strong persistence of the preference evenwhen the external situation should be expected tochange it.

5.3.2 Effects of Reducing External VariabilityAlthough the importance of endogenous factors in theformation and persistence of preference has beenestablished, the internal dynamics are not independentof the history of environmental coupling. The agentcreates the preferences through interactions. Randomelements, such as the position of the lights with respectto the orientation of the agent, can have an effect onthe probability of switching to a different behavior. Tofurther show that the environmental factors do affectthe preference, the agent is tested in an environmentwith fewer fluctuations where each new pair of lightsalways appears in a same position relative to the body,which is at distance of 130 units on a bearing of π/4 tothe right (left) of the agent’s current heading for lightA (light B). Under this condition, we counted howmany transitions between preferences take place dur-ing 100 trials (presentation of a pair of lights). Morethan three successive visits to the same light type wasdefined as evidence of a sustained preference. Onaverage over 100 independent runs, 0.25 transitions

Figure 6 Distances to the remaining light as a function of the timing where one of two lights vanishes (and positionsare swapped). The graphs are made in the same way as Figure 5. The notation above the plots is explained in the text.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

Iizuka & Di Paolo Toward Spinozist Robotics 371

took place during 100 trials under this condition incontrast to 1.16 transitions under the normal condition.A reduction in environmental fluctuations produces thestabilization of the sensorimotor flow and then the pref-erence is also stabilized. We should also note that thenumber of transitions is reduced, but not to zero, thusindicating that changes do occur due to the interactionbetween internal dynamics and environment even whenexternal uncertainty (in light position) is removed.

A different test was carried out in an environmentwhere the light sources remained fixed in the sameplace. We tested the agent in this configuration fora time corresponding to the 100 consecutive presen-tations of lights as in the above experiments. Onaverage over 1,000 independent runs, transitions takeplace 0.039 times during this period. Again we obtaina drastic reduction of transitions by removing externalvariability. In this particular case, the reduction isachieved by the agent remaining close to a very strongsource of stimulation, which in some sense seems tobe the most difficult condition to switch away from.However, we notice that, as before, the number oftransitions is not fully reduced to zero even in theseextreme conditions.

5.3.3 Transitions Between High and Low Suscepti-bility to Perturbations The emerging picture is onewhere both endogenous dynamics and environmentalfactors play a role in the sustaining and the changingof a preference. Does it still make sense to questionwhether the choice that an agent actually makes corre-sponds to a spontaneous or externally-driven “deci-sion”? Put in these traditional terms, the answer is no.However, it is possible to capture in more detail thedynamical relation between the different factors inorder to formulate a clearer distinction. This distinc-tion is made operational by observing the agent’spotential behaviors in different situations departingfrom a same initial state. If the agent “decides” to goto one of the lights by a preference that is endog-enously sustained, its behavior must be robust to vari-ations in environmental factors. In contrast, “decisions”that are highly-affected by environmental variationcan be attributed to the role played by external factors.We label the two possibilities respectively as strongand weak commitment to a choice.

Based on this idea, we select the agent’s states(neural and bodily) corresponding to different times in

Figure 2. For each selection of initial states, we recordwhich light is approached by the agent as a function ofdifferent initial angular positions of the two lights(both placed at the same distance). This is the closestwe can get in the present setting to a quantitativemeasure of preference. The results are shown by differ-ent shades of gray for the final destination in Figure 7. Inthe case of Figure 7a, in which the agent originally hasthe preference of light B, the “decision” is stableagainst the various initial positions of the lights. Theagent robustly approaches light B for practically allthe angular positions tested. Therefore, the “decision”to approach B does not depend on environmental vari-ation in this case. (As demonstrated in the swappingexperiments, in some of these cases, the agent activelyavoids the light that is presented directly in front of itand searches for the alternative light even if its posi-tion is such that no stimulus from that light is directlyimpinging on the sensors.) The same is also true in thecase of Figure 7c. Except for the small region where itselects light B, the agent approaches light A whereverelse the lights are placed. In contrast, in Figures 7band 7d the agent is rather “uncommitted” because theapproached target changes depending on the lights’positions.

In order to see how this dependency of the “deci-sion” on environmental variability changes during thehistory of interactions, the proportions of dark gray(light A) and light gray (light B) circles in the plots ofFigure 7 is calculated (Figure 8) for different timescorresponding to times in Figure 2 (which shows theactual choices taken). It is shown that when the agenthas a preference for light B, because the proportion ofB-circles at the beginning of this period is high theagent effectively ignores light A and keeps approach-ing light B. It could be said that the agent’s behavior iscommitted in the sense that it does not depend on theenvironmental factors as mentioned above. When thepreference changes from light B to A, and for a whileafter that, the proportions stay around 0.5, which meansthat the agent does not have a strong commitment towhich light should be selected as target. There is amplescope for environmental factors to alter the agent’sbehavior. Then, the choice of light A changes towardsa more stable (or committed) dynamics.

We are not implying with these results that dur-ing periods of weak environmental dependence, theendogenous dynamics are solely responsible for theagent’s performance. In all cases, behavior is the out-

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

372 Adaptive Behavior 15(4)

come of a tightly coupled sensorimotor loop. It isclear that the mode of environmental dependence,whether weak or strong, changes over time and thatthis is a property of the agent’s own internal dynam-ics as well as the history of interaction. During theperiods of high susceptibility to external variations,the agent is highly responsive to environmental var-iability resulting in less commitment towards agiven target. In contrast, during periods of weaksusceptibility, the consistent selection of a target is aconsequence of low responsiveness to environmen-tal variability.

The important point is that the autonomy of theagent’s behavior can be seen as the flow of alternatinghigh and low susceptibility, which is an emergentproperty of the homeostatic mechanism in this case(but might be the result of other mechanisms in gen-eral). There is nothing, apart from the flow of neural

and sensorimotor dynamics, that represents a mode ofcommitment to one preference or another. There are nointernal functional modules and no external instructions.Nevertheless, the existence of the different modes canbe determined and measured. It should be made clearthat this picture is quite in contrast with the idea thatautonomy may be simply measured as how muchbehavior is determined internally versus how much isexternally driven. Strong autonomy (in this contextthe capability of defining one’s own goals) is orthogo-nal to this issue because all behavior is conditioned byboth internal and external factors at all times. It is themode of responsiveness to variations in such factorsthat can be described as committed or open, and itwould be a property of strong autonomous systemsthat they can transit between these modes (maybe inless contingent ways than this agent, e.g., in terms ofneeds, longer-term goals, etc.).

Figure 7 Light preference of the agent corresponding to the states at the (a) 20th, (b) 25th, (c) 50th or (d) 95th presen-tation of lights in Figure 2, against different light positions. Horizontal and vertical axes indicate the initial angles of lightsA and B relative to the agent’s orientation respectively. The positions of lights whose difference is less than π/2 are re-moved in order to better determine which light the agent is approaching. The dark gray circles show that the agent ap-proaches light A. The light gray circles correspond to light B and black shows the agent does not approach either of lights.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

Iizuka & Di Paolo Toward Spinozist Robotics 373

6 Discussion

Though minimal, the dynamical model discussed inthis article exhibits some important aspects of behav-ioral preferences such as durability and transitionsthrough mutual constraining of internal and externaldynamics. We hasten to say that the understanding ofpreferred behaviors is not exhausted by the dynamicalproperties explored in this article. In particular, prefer-ence in natural cognitive systems implies a capabilityto appreciate the value of a choice. The current modeldoes not capture this extra level of complexity, thougha dynamical approach to value-generation may indeedbe possible (Di Paolo, 2005) and it may be close to theconstraining of dynamical modes shown in the presentmodel. Much less justice is done to the problem bytraditional approaches where preferences (or motiva-tions, moods, etc.) are reified as internal variables andwith which the view presented here should be con-trasted.

The model shows the effectiveness of the dynami-cal approach in allowing the posing of the right ques-tions and suggesting solutions. It is possible to clearlyformulate some constitutive properties of preferenceand the specification immediately suggests a road

towards a minimal design. In particular, it seems thatwhat is necessary is a process that can link more thanone mode of internal stable dynamical flows with thecorresponding interactional dynamics. This link cansuccessfully produce the property of persistence with-out permanence in our model. It also makes concretethe description of the mutual constraining betweentwo levels (neural dynamics, agent/environment inter-actions). Consequently, the model lends support toGoldstein’s and Merleau-Ponty’s multi-causal view ofpreferred behavior by producing a case study of itsviability and providing insights into its dynamicalbasis. We have shown the logical consistency of theview that persistence of preferences and their transi-tions cannot be attributed to either internal or externalfactors alone. Yet there is a sense in which we can saythat internal and interactive dynamics are implicatedin generating stronger or weaker degrees of commit-ment.

The model allows us to make some observations.It is important to note that in several cases the agentkeeps searching for a target light even when it is nolonger detectable and avoids approaching the alterna-tive target that is present. Here it is possible to draw aparallel between the persistence of preference and the

Figure 8 The proportion of dark gray (solid line: light A) and light gray (dotted line: light B) circles that appear in theplots of Figure 7 and others like them corresponding to different times. The values of the horizontal axis correspond tothe number of the lights the agent has interacted with in Figure 2.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

374 Adaptive Behavior 15(4)

similar phenomenon of object permanence observed ininfant experiments by Piaget (1954). According toPiaget, the concept of an object as something that hasan ongoing existence independent of the observer isnot immediately given. The lack of such a concept ishis explanation of the famous A-not-B error in which7–12 month old infants search for a toy, not in thelocation they have just seen it being hidden, but in thelocation where they searched for it in the past. How-ever, there are alternative more parsimonious expla-nations. The perseverance observed in such casesmay respond to simpler dynamical mechanisms (oftendescribed as motor memories; Thelen, Schoener,Scheier, & Smith, 2001). A recent study by Wood andDi Paolo (2007) actually demonstrates that the samehomeostatic mechanisms used in the current study(two homeostatic regions corresponding to two behav-ioral options) are sufficient to reproduce the pattern oferrors observed in infants and their disappearance withage, lending support to the idea that motor memoriesare at the basis of perseverance and providing a plausi-ble sensorimotor explanation for the origins of thehigher cognitive capacity of object permanence.

The method shows another use for homeostaticadaptation in combination with evolutionary techniquesin shaping both behavioral and internal requirements forthe neural/body/interaction system. However, we mayask whether this is the simplest model of behavioralpreference. It seems a priori a good idea to attempt thesame experiments using non-plastic CTRNNs withhomeostasis, or even without it. We predict that thelatter case would be brittle, that is, largely driven bythe environmental configurations, and so show littleor no persistence of preference. The other case, non-plastic CTRNNs with homeostasis is less easy to pre-dict. Systematic comparisons along these lines areplanned in order to establish the role played by struc-tural plasticity.

On analyzing the role of environmental factors(position of lights) on the choice of target we find theagent presents different degrees of openness to envi-ronmental variability, or put differently, commitmentto a target goal. These assertions can be made opera-tional in terms of the stability of the dynamics. Theresults allow us to formulate the following dynamicalhypothesis: A switch in preference will take place notnecessarily as specific internal variables acquire spe-cific values, but rather as a result of changes in thestability landscape of the neural and sensorimotor

dynamical flows between committed and open modes.In some modes, the landscape will be highly stable toenvironmental variability. Alternative choices andopportunities for behavioral change will not affect thebehavioral and neural flow. The agent may even be“blind” to stimulations corresponding to these alterna-tive behaviors. Such are the committed modes. Inother cases, even if the actual behavior shows a stabletrajectory towards a target, the sensitivity to environ-mental variability may be higher. These are the open oruncommitted modes which may result, in the appropri-ate circumstances in a change of preference. Betweenthe two modes lies a spectrum of intermediate possi-bilities.

The results justify the choice of the Spinozistinspiration for the design of our model. This viewallows us to pose the problem of preferred behavior indynamical terms, and of the change of preference interms of change of conatus. In turn, the operationali-zation of commitment proposed in this article feedsback into the task of understanding conatus dynami-cally. In this way, a dynamical systems approach topreferences (and associated cognitive phenomena suchas decision making) looks for global dynamical proper-ties at the internal and interactive levels to determinewhether a behavior is preferred or not, chosen withstrong or weak commitment. Of course, in many casesthe specific determination of preference carried out inthis article (resetting the agent to a given state andaltering its environment) may be hard or impossible toachieve. In such cases, alternative or derivative opera-tionalizations will be required.

As a final note on autonomy, it is clear that in ourmodel the achievement of committed or open modes ofsensorimotor flows is done through the history of inter-action by the agent itself. However, the fact remainsthat its autonomy is severely limited by the arbitraryimposition of the two internal homeostatic regions. Webelieve that in reality the condition of using a strictregion corresponding to zero plasticity may be relaxedand that the dynamics may consist of moving gradientsof plasticity and the spontaneous formation of highlystable regions where plastic change is small and in gen-eral pointing back into the same stable region. Designingan agent where such homeostatic regions are themselvesthe consequence of the agent’s own activity will be afurther step towards strongly autonomous behavior. Ina sense, such an agent will not only be switching spon-taneously between a choice of externally provided

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

Iizuka & Di Paolo Toward Spinozist Robotics 375

goals, it will be creating its own goals as a consequenceof its history of interactions.

Note

1 The description of the homeostatic bounded sets as boxescould cause some confusion with the traditional perspec-tive that we have criticized above, describing it as “boxol-ogy.” We would like to clarify that we use this term torefer to the practice of putting a cognitive function thatbelongs to the level of the whole agent into a computa-tional module inside the agent’s architecture. This is dif-ferent from creating areas (boxes) in the space of theagent’s internal dynamics that are both (1) generally stableand (2) linked to a particular behavior. Nothing is speci-fied about the nature of this link in functional terms. Infact, the behavioral function and the internal organizationare linked bi-directionally, since behavior will impact onthe internal dynamics, provoking plastic changes, and theinternal landscape will constraint the domain of possiblebehaviors. So, the functional nature of the dynamicswithin a box is not fixed which is a general feature of theboxology approach.

Acknowledgments

We are very grateful to two anonymous reviewers for theirhelpful comments. This research was partially supported by theJapanese Ministry of Education, Science, Sports, and Culture,Grant-in-Aid for JSPS Fellows, 17-4443.

References

Ashby, W. R. (1960). Design for a brain: The origin of adap-tive behaviour (2nd ed.). London: Chapman and Hall.

Beer, R. D. (1990). Intelligence as adaptive behavior: Anexperiment in computational neuroscience. San Diego:Academic Press.

Beer, R. D. (1996). Toward the evolution of dynamical neuralnetworks for minimally cognitive behavior. In P. Maes,M. J. Mataric, J-A. Meyer, J. B. Pollack, & S. W. Wilson(Eds.), From Animals to Animats 4: Proceedings of the4th International Conference on Simulation of AdaptiveBehavior (pp. 421–429). Cambridge, MA: MIT Press.

Beer, R. D. (1999). Arches and stones in cognitive architecture.Adaptive Behavior, 11, 299–305.

Beer, R. (2003). The dynamics of active categorical perceptionin an evolved model agent. Adaptive Behavior, 11, 209–243.

Bennett, J. (1984). A study of Spinoza’s Ethics. Indianapolis:Hackett Publishing.

Bennett, J. (1992). Spinoza and teleology: A reply to Curley. InE. Curley & P. F. Morean (Eds.), Spinoza, issues anddirections (pp. 53–57). Leiden: E. J. Brill.

Bourgine, P., & Stewart, J. (2004). Autopoiesis and cognition.Artificial Life, 10, 327–346.

Breazeal (Ferrell), C. (1998). A motivational system for regu-lating human-robot interaction. In Proceedings of AAAI-98, Madison, WI (pp. 54–61).

Bryson, J. (2003). Action selection and individuation in agentbased modelling. In D. L. Sallach & C. Macal (Eds.), TheProceedings of Agent 2003: Challenges of social simu-lation. Argonne: Argonne National Laboratory (pp. 317–330).

Clark, A. (1997). Being there. Cambridge, MA: MIT Press.Curley, E. (1990). On Bennett’s Spinoza: The issue of teleol-

ogy. In E. Curley & P. F. Morean (Eds.), Spinoza, issuesand directions (pp. 39–52). Leiden: E. J. Brill.

Di Paolo, E. A. (2000). Homeostatic adaptation to inversion inthe visual field and other sensorimotor disruptions. In J.Meyer, A. Berthoz, D. Floreano, H. Roitblat, & S. Wilson(Eds.), From Animals to Animats VI: Proceedings of the6th International Conference on Simulation of AdaptiveBehavior (pp. 440–449). Cambridge, MA: MIT Press.

Di Paolo, E. A. (2005). Autopoiesis, adaptivity, teleology, agency.Phenomenology and the Cognitive Sciences, 4, 429–452.

Goldstein, K. (1934/1995). The organism. New York: ZoneBooks.

Harvey, I., Di Paolo, E., Wood, R., Quinn, M., & Tuci, E. A.(2005). Evolutionary Robotics: A new scientific tool forstudying cognition. Artificial Life, 11, 79–98.

Humphrys, M. (1997). Action selection methods using rein-forcement learning. Unpublished doctoral dissertation,University of Cambridge, UK.

Ito, M., Noda, K., Hoshino, Y., & Tani, J. (2006). Dynamic andinteractive generation of object handling behaviors by asmall humanoid robot using a dynamic neural networkmodel. Neural Networks, 19, 323–337.

Jonas, H. (1966). The phenomenon of life: Towards a philo-sophical biology. Evanston: Northwestern University Press.

Juarrero, A. (1999). Dynamics in action: Intentional behavioras a complex system. Cambridge, MA: MIT Press.

Kelso, J. A. S. (1995). Dynamic patterns: The self-organizationof brain and behavior. Cambridge, MA: MIT Press.

Matson, W. (1977). Death and destruction in Spinoza’s ethics.Inquiry, 20, 403–417.

Maturana, H., & Varela, F. (1980). Autopoiesis and cognition:The realization of the living. Boston: Reidel.

Merleau-Ponty, M. (1962). Phenomenology of perception(Colin Smith, Trans.). London: Routledge & Kegan Paul.

Ogai, Y., & Ikegami, T. (forthcoming). Microslip as a simu-lated artificial mind.

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from

376 Adaptive Behavior 15(4)

Phattanasri, P., Chiel, H., & Beer, R. (n.d.) The dynamics ofassociative learning in evolved model circuits. Submittedfor publication.

Piaget, J. (1954). The construction of reality in the child. NewYork: Basic Books.

Stewart, J. (1992). Life=cognition: The epistemological andontological significance of artificial life. In P. Bourgine &F. Varela (Eds.), Toward a practice of autonomous sys-tems: Proceedings of the First European Conference onArtificial Life (pp. 475–483). Cambridge, MA: MIT Press.

Thelen, E., Schoener, G., Scheier, C., & Smith, L. (2001). Thedynamics of embodiment: A dynamic field theory ofinfant perseverative reaching. Behavioural and Brain Sci-ences, 24, 1–86.

Thelen, E., & Smith, L. B. (1994). A dynamic systemsapproach to the development of cognition and action.Cambridge, MA: MIT Press.

Thompson, E., & Varela, F. (2001). Radical embodiment: Neu-ral dynamics and conscious experience. Trends in Cogni-tive Science, 5, 418–425.

Varela, F., & Thompson, E. (2003). Neural synchrony and theunity of mind: A neurophenomenological perspective. InA. Cleeremans (Ed.), The unity of consciousness binding,

integration, and dissociation (pp. 266–287). Oxford Uni-versity Press.

Velásquez, J. (1997). Modeling emotions and other motivationsin synthetic agents. In Proceedings of AAAI-97. MenloPark: AAAI Press, pp. 10–15.

Weber, A., & Varela, F. (2002). Life after Kant: Natural pur-poses and the autopoietic foundations of biological indi-viduality. Phenomenology and the Cognitive Sciences, 1,97–125.

Wheeler, M. (1997). Cognition’s coming home: The reunion oflife and mind. In P. Husbands & I. Harvey (Eds.), Pro-ceedings of the Fourth European Conference on ArtificialLife (pp. 10–19). Cambridge, MA: MIT Press.

Williams, H. (2004). Homeostatic plasticity in recurrent neuralnetworks. In S. Schaal, A. Ijspeert, A. Billard, S. Vijaya-kumar, & J. Meyer (Eds.), From Animals to Animats 8:Proceedings of the 8th International Conference on theSimulation of Adaptive Behavior (pp. 344–353). Cam-bridge MA: MIT Press.

Wood, R., & Di Paolo, E. (2007). New models for old ques-tions: Evolutionary robotics and the “A not B” error. In F.Almeida e Costa, L. M. Rocha, E. Costa (Eds.) Proceed-ings of the 9th European Conference on Artificial life.Berlin: Springer-Verlag, pp. 1141–1150.

About the Authors

Hiroyuki Iizuka received a Ph.D. in multi-disciplinary sciences from the University ofTokyo, Japan, in 2004. Since 2005, he has been a research fellow of the Japan Societyfor the Promotion of Science. In 2005 and 2006, he was also a visiting research fellow atthe Centre for Computational Neuroscience and Robotics at the University of Sussex.His research interests include embodied cognition, complex adaptive systems and theorigin of life.

Ezequiel Di Paolo studied physics in Buenos Aires and obtained an M.Sc. in nuclearengineering in 1994 from the Instituto Balseiro, Argentina. He did a D.Phil. at the Univer-sity of Sussex on models and theories of social behavior. After a postdoc at the GermanNational Research Center for Information Technology, GMD, in Sankt Augustin, hereturned to Sussex in 2000 where he is now a reader in evolutionary and adaptive sys-tems at the Centre for Computational Neuroscience and Robotics (CCNR). His researchinterests include adaptive behavior in natural and artificial systems, biological modeling,complex systems engineering, evolutionary robotics, psychology, and enactive cognitivescience. Address: Department of Informatics, University of Sussex, Brighton, BN1 9QH,UK. E-mail: [email protected]

© 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution. at PENNSYLVANIA STATE UNIV on April 14, 2008 http://adb.sagepub.comDownloaded from