Development of Symbiotic Brain-Machine Interfaces Using a Neurophysiology Cyberworkstation

10
J.A. Jacko (Ed.): Human-Computer Interaction, Part II, HCII 2009, LNCS 5611, pp. 606–615, 2009. © Springer-Verlag Berlin Heidelberg 2009 Development of Symbiotic Brain-Machine Interfaces Using a Neurophysiology Cyberworkstation Justin C. Sanchez 1,2,3 , Renato Figueiredo 4 , Jose Fortes 4 , and Jose C. Principe 3,4 1 Department of Pediatrics 2 Department of Neuroscience 3 Department of Biomedical Engineering 4 Department of Electrical and Computer Engineering University of Florida Gainesville, Florida [email protected], [email protected], [email protected] Abstract. We seek to develop a new generation of brain-machine interfaces (BMI) that enable both the user and the computer to engage in a symbiotic rela- tionship where they must co-adapt to each other to solve goal-directed tasks. Such a framework would allow the possibility real-time understanding and modeling of brain behavior and adaptation to a changing environment, a major departure from either offline learning and static models or one-way adaptive models in conventional BMIs. To achieve a symbiotic architecture requires a computing infrastructure that can accommodate multiple neural systems, re- spond within the processing deadlines of sensorimotor information, and can provide powerful computational resources to design new modeling approaches. To address these issues we present or ongoing work in the development of a neurophysiology Cyberworkstation for BMI design. Keywords: Brain-Machine Interface, Co-Adaptive, Cyberworkstation. 1 Introduction Brain-machine interfaces (BMIs) offer tremendous promise as assistive systems for motor-impaired patients, as new paradigms for man-machine interaction, and as vehi- cles for the discovery and promotion of new computational principles for autonomous and intelligent systems. At the core of a motor BMI, is an embedded computer pro- grammed with information processing models of the motor system to dialogue di- rectly and in real-time with the user’s sensori-motor commands derived directly from the nervous system. Thus, two key problems of BMI research are (1) to establish the proper experimental interactive paradigm between biological and machine intelli- gence and (2) to model how the brain plans and controls motion. Both the search for solutions to these problems and the solutions themselves require the development of new computational frameworks into in-vivo contexts. Many groups have conducted research in BMIs and the approach has been strongly signal processing based without much concern to incorporate the design principles of the biologic system in the interface. The implementation path has either taken an

Transcript of Development of Symbiotic Brain-Machine Interfaces Using a Neurophysiology Cyberworkstation

J.A. Jacko (Ed.): Human-Computer Interaction, Part II, HCII 2009, LNCS 5611, pp. 606–615, 2009. © Springer-Verlag Berlin Heidelberg 2009

Development of Symbiotic Brain-Machine Interfaces Using a Neurophysiology Cyberworkstation

Justin C. Sanchez1,2,3, Renato Figueiredo4, Jose Fortes4, and Jose C. Principe3,4

1 Department of Pediatrics 2 Department of Neuroscience

3 Department of Biomedical Engineering 4 Department of Electrical and Computer Engineering

University of Florida Gainesville, Florida

[email protected], [email protected], [email protected]

Abstract. We seek to develop a new generation of brain-machine interfaces (BMI) that enable both the user and the computer to engage in a symbiotic rela-tionship where they must co-adapt to each other to solve goal-directed tasks. Such a framework would allow the possibility real-time understanding and modeling of brain behavior and adaptation to a changing environment, a major departure from either offline learning and static models or one-way adaptive models in conventional BMIs. To achieve a symbiotic architecture requires a computing infrastructure that can accommodate multiple neural systems, re-spond within the processing deadlines of sensorimotor information, and can provide powerful computational resources to design new modeling approaches. To address these issues we present or ongoing work in the development of a neurophysiology Cyberworkstation for BMI design.

Keywords: Brain-Machine Interface, Co-Adaptive, Cyberworkstation.

1 Introduction

Brain-machine interfaces (BMIs) offer tremendous promise as assistive systems for motor-impaired patients, as new paradigms for man-machine interaction, and as vehi-cles for the discovery and promotion of new computational principles for autonomous and intelligent systems. At the core of a motor BMI, is an embedded computer pro-grammed with information processing models of the motor system to dialogue di-rectly and in real-time with the user’s sensori-motor commands derived directly from the nervous system. Thus, two key problems of BMI research are (1) to establish the proper experimental interactive paradigm between biological and machine intelli-gence and (2) to model how the brain plans and controls motion. Both the search for solutions to these problems and the solutions themselves require the development of new computational frameworks into in-vivo contexts.

Many groups have conducted research in BMIs and the approach has been strongly signal processing based without much concern to incorporate the design principles of the biologic system in the interface. The implementation path has either taken an

Development of Symbiotic Brain-Machine Interfaces 607

unsupervised approach by finding causal relationships in the data [1], a supervised approach using (functional) regression [2], or more sophisticated methods of sequen-tial estimation [3] to minimize the error between predicted and known behavior. These approaches are primarily data-driven techniques that seek out correlation and structure between the spatio-temporal neural activation and behavior. Once the model is trained, the procedure is to fix the model parameters for use in a test set that as-sumes stationarity in the functional mapping. Some of the best known linear models that have used this architecture in the BMI literature are the Wiener filter (FIR) [4, 5] and Population Vector [6], generative models [7-9], and nonlinear dynamic neural networks (a time delay neural network or recurrent neural networks [10-12]) models that assume behavior can be captured by a static input-output model and that the spike train statistics do not change over time. While these models have been shown to work well in specific scenarios, they carry with them these strong assumptions and will likely not be feasible over the long term.

Fig. 1. Evolution of BMI modeling; (a) First generation BMIs with strong static assumptions for both the model and user; (b) Data-driven approach for relaxing static modeling assumptions though the use of multiple models trained with an error response; (c) Co-adaptive BMIs have a common goal for the user and model thus allowing for dynamic sharing of control

Figure 1 illustrates the ongoing evolution in BMI modeling. Conventional BMIs assume a static, single model to translate the spatio-temporal patterns of brain activity into a desired response available from an able user. In a test set, the model translates the intent of the user (also assumed to be static) into behavior without requiring body movements (Fig. 1a). BMIs of this type proved the possibility of translating thoughts into actions, but do not work reliably for multiple tasks in a clinical setting. To build upon these efforts in the past 3 years we considered a biologically plausible architec-ture with multiple models specialized for parts of the trajectory, thus improving both reliability and accuracy (Fig. 1b). Our research revealed an imbalance between the roles of the user and the BMI. When the desired response is used in training it simpli-fies model building, but also enforces a role of signal translator to the BMI which is incompatible with the dynamic nature of brain activity and the tasks in the environ-ment. To create a BMI capable of capturing new and complex tasks with minimal training and accommodate the changing brain activity, a co-adaptive system is

608 J.C. Sanchez et al.

necessary. The externally imposed information should be kept at a minimum, i.e. the system should learn how to evaluate actions and learn from mistakes independently (Fig. 1c). This lead us to evaluate an alternate training paradigm.

We are particularly interested in developing an emergent system where the user and computer cooperatively seek to maximize goals while interacting with a complex, dynamical environment. Both the user and the computer are in a symbiotic relation-ship where they must co-adapt to each other to solve tasks. This behavior depends on a series of events or elemental procedures that promote specific brain or behavioral syntax, feedback, and repetition over time [13]; hence, the sequential evaluative proc-ess is always ongoing, adheres to strict timing and cooperative-competitive processes. With these processes, intelligent motor control and more importantly goal-directed behavior can be built with closed-loop mechanisms which continuously adapt internal and external antecedents of the world, express intent though behavior in the environ-ment, and evaluate the consequences of those behaviors to promote learning. Collec-tively these components contribute to forming a Perception-Acton Cycle (PAC) which plays a critical role in organizing behavior in the nervous system [14]. This form of adaptive behavior relies on continuous processing of sensory information that is used to guide a series of goal-directed actions. Most importantly, the entire process is regulated by external environmental and internal neurofeedback, which is used to guide the adaptation of computation and behavior. The PAC in goal-directed behavior provides several key concepts in the formation of a new framework for BMI. How-ever, unlike the PAC that is central in animal interaction with the world, the PAC in a BMI will be distributed between the user and the computer, which we will call an agent because it is assistive in nature. Next, we introduce the prerequisites for modify-ing the PAC to incorporate two entities.

Table 1. User-Neuroprosthetic Prerequisites for Co-Adaptation

User Computer Agent Representation Brain States Environmental States

Valuation Goal-Directed Goal-Directed Action Selection Neuromodulation Competition

Outcome Measures Internal Reward Expectation

Predicted Error

Learning Reinforcement Based Reinforcement Based Co-Adaptation Dynamic Brain

Organization Optimization of Parameters

2 Minimal Prerequesites for Co-adaptation

In order to symbiotically link artificial intelligent tools with neural systems, a new set of protocols must be derived to enable and empower dialogue between two seemingly different entities. A minimal set of six prerequisites given in Table 1 describe the essential computation that is required to enact a symbiotic PAC. These prerequisites are based on concepts considered to be key in value-based decision making [15]. Unique to the development of intelligent BMIs is that the user and neuroprosthetic

Development of Symbiotic Brain-Machine Interfaces 609

each have their own perspective and contribution to each prerequisite as described below.

Representation. Internal to the user, the spatio-temporal structure of neural activation forms a coding of intent for action in the external world. At any given moment, the neural code can be sampled as a brain state defined as the vector of values (from all recording electrodes) that describe the operating point within a space of all possible state values. The syntax or sequence of brain states must be able to support a suffi-ciently rich computational repertoire and must encode a range of values with suffi-cient accuracy and discriminability. These brain states could contain either a local or distributed representation depending on where the signals are being collected.

While the representation of brain states are embedded internally in the user, the representation of the BMI is embodied in the environment [16]. The BMI connection created from the brain state to the environment forms a channel for communication from the internal to external worlds. In the external world, the state representation of the neuroprosthetic includes the sequence of sensory information about the environ-ment that is relevant to goal directed behavior. For example, environmental state could be action oriented and update the position or velocity of the BMI. It is impor-tant to note that the state need not contain all the details about the environment but a sufficiently rich sequence that summarizes the important information that lead to the current state. If the environmental state representation has this property, it could be considered to be Markov.

Valuation. Valuation is the process of how a system assigns value to actions and behavior outcomes. For goal-directed motor behavior, we seek systems that compute with action-outcome sequences and assign high value to outcomes that yield desirable rewards. In the design of BMIs, it is desirable for both the user to be highly respon-sive to each other through the immediate update of value as soon as an outcome changes. This approach is very different from habitual valuation which does not par-ticipate in continual self-analysis [17]. For BMI, one of the main computational goals is to develop real-time methods for coupling the valuation between the user and BMI for a variety of action-outcome associations.

Action Selection. To complete goal-directed behaviors, both the user and neuropros-thetic tool must perform actions in the environment. The method of precisely timing and triggering particular actions that support the task objective is dependent on if there are internal or external representations used. For the user, action selection is performed through the intentional and transient excitation or inhibition (neuromodula-tion) of neural activity that is capable of supporting a subset of actions. This func-tional relationship between neuromodulation and the signaling of actions defines the process of action selection. It is expected that under the influence of intelligent BMIs, the primary motor cortex will undergo functional reorganization during motor learn-ing [18-20]. This reorganization in action selection is due in part to how the neuro-prosthetic tool synergistically performs action selection in the external environment. Computationally, choosing one action from a set of actions can be implemented through competition, where actions compete with each other to be activated. Using a set of discriminant functions, the action or actions with the highest values can be declared the winner and selected for use in the goal-directed behavior.

610 J.C. Sanchez et al.

Outcome Measures. To determine the success of the goal-directed behavior, both the user and neuroprosthetic tool have different measures of outcome. The prediction error, as its name implies, is the consequence of uncertainty in goal achievement and can be linked either directly to an inherent characteristic of the environment or to internal representations of reward in the user. Reward expectation of the user is ex-pressed in reward centers of the brain [21] and evaluates the states of environment in terms of an increase or decrease in the probability of earning reward. During the cy-cles of the PAC the reward expectation of the user can be modulated by the novelty and type of environmental conditions encountered. Ideally, the goal of intelligent BMIs is to create synergies in both outcome measures so that the expectations are proportional to the prediction error of the neuroprosthetic tool.

Learning. Because of the rich interactions between the user and the neuroprosthtic, the way that the system learns cannot be a fixed input output mapper (as in conven-tional BMIs), but it has to be a state dependent system that utilize experience. Throughout this process it develops a model of the world, which in BMIs will include a model of the interaction with the user. Reinforcement Learning (RL) is a computa-tional framework for goal based learning and decision-making. Learning through interaction with environment distinguishes this method from other learning para-digms. There have been many developments in the machine learning RL paradigm [22-25] which originated from the theory of optimal control in Markov Decision Processes [26]. One of its strengths is the ability to learn which control actions will maximize reward given the state of the environment [27]. Traditionally, RL is applied to design optimal controllers because it learns how a computer agent (CA) should provide control actions to its interface with an environment in order to maximize rewards earned over time [26]. The CA is initially naïve but learns through interac-tions, eventually developing an action selection strategy, which achieves goals to earn rewards. However, in symbiotic BMIs where computational models are conjoined with neurophysiology in real-time, it may be possible to acquire the knowledge to bring more realism into the theory of reinforcement learning [28].

Co-Adaptation. Here we make the distinction between adaptability and cooperative co-adaptation that refers to a much deeper symbiotic relationship between the user and the computational agent who share control to reach common goals, hence, ena-bling the possibility of continual evolution of the interactive process. This approach must go beyond the simple combination of neurobiological and computation models because this does not elucidate the relationship between the linked, heterogeneous responses of neural systems to behavioral outcomes. A co-adaptive BMI must also consider the interactions that influence the net benefits of behavioral, computational, and physiological strategies. First, adaptive responses in the co-adaptive BMI will likely occur at different spatial and temporal scales. Second, through feedback the expression of neural intent continuously shapes the computational model while the behavior of the BMI shapes the user. The challenge is to define appropriate BMI architectures to determine the mechanistic links between neurophysiologic levels of abstraction and behavior and understand the evolution of neuroprosthetic usage. The details of how co-adaptive BMIs are engineered through reinforcement learning will be presented next.

Development of Symbiotic Brain-Machine Interfaces 611

3 Development of a Cyberworkstation to Test Co-adaptive BMIs

With the prerequisites of co-adaptive BMIs defined, the computational and engineer-ing design challenge becomes one of architectural choices and integrating both the user and computer into a cooperative structure. Toward this goal we have an ongoing collaboration to develop a Cyberworkstation (CW) which consolidates software and hardware resources across three collaborating labs (Neuroprosthetics Research Group (NRG), Advanced Computing and Information Systems (ACIS) Lab), and Computa-tional NeuroEngineering Laboratory (CNEL) at the University of Florida [29]. This consolidation facilitates local hardware control via powerful remote processing. Fig-ure 1 illustrates this concept with a closed loop BMI experiment where neural signals are sampled from behaving animals (at NRG) and sent across the network for comput-ing. Then a variety of BMI models process these data on a pool of servers (at ACIS). Model results are aggregated and used to control robot movement (at NRG).

Fig. 2. Infrastructure for online and offline analysis

The CW was engineered to meet the following specifications for symbiotic BMI applications:

1. Real-time operation that scale with biological responses (<100ms) 2. Parallel processing capability for multi-input, multi-output models 3. Large memory for neurophysiologic data streams and model parameteri-

zation 4. Customizable signal processing models. 5. Integrated analysis platform for visualization of modeling and physiologi-

cal parameters 6. Data warehousing 7. Simple user interfaces

The application-side requirements create information technology (IT) require-

ments: 1. High performance and real-time computation 2. Resource multiplexing (virtualization) 3. Real-time communication 4. Template-based framework-workflow 5. Data streaming and storage

612 J.C. Sanchez et al.

Our research has been directed to developing middleware that can aggregate and reserve the necessary resources, create appropriate parallel execution environments, and guarantee the necessary Quality-of-Service (QoS) in computation and communi-cation to meet response deadlines while provided the capabilities listed above. In effect, the middleware deploys an on-demand neuroscience research test-bed which consolidates distributed software and hardware resources to support time-critical and resource-demanding computing. Additionally, scientists are provided a powerful interface to dynamically manage and analyze experiments.

Online experimenting is based on a real-time, closed-loop system consisting of three key components: in vivo data acquisition from multielectrode arrays implanted in the brain, parallel computational modeling for neural decoding, and prosthetic control. Figure 2 shows the entire CW, including each of the three components. Neu-ral signals are sampled from behaving animals (at NRG) and sent across the network for computing. A variety of BMI models (designed to investigate principles of neural coding) process these data in parallel on a pool of physical and virtual servers (at ACIS). Their results are aggregated and used to control robot movement (at NRG).

This CW infrastructure supports both online (left side of Fig. 2) and offline (right side of Fig. 2) experiments. In the online scenario, to the CW supports real-time com-putation needed for in-vivo experiments, while in the offline scenario, the resources required in past experiments are recreated so that the saved data from past experi-ments can be replayed and analyzed. To accommodate both types of experiments, we developed a flexible closed-loop control template which consists of four main phases: data acquisition and flow, network transfer mechanisms, and motor control model processing. Each template phase is described next.

Data Acquisition and Flow. A local program (PentusaData) samples neural signals from multi-channel digital signal processing (DSP) device. The DSP contains buffers which store estimated firing rates from electrodes implanted in behaving animals at the NRG laboratory. After acquiring neural activity from the brain, PentusaData gen-erates an input data structure and sends it over the network to the ACIS processing cluster.

Once the data reaches ACIS, MPI processes are used to handle it. The master proc-ess (GlobalBMI) manages and distributes data to other worker processes (LocalBMI) that execute motor control models. These processes communicate locally to exchange data and determine the responsibilities of each motor control model toward the final output. The worker model implementation comprises both computation of motor commands from the neural signal and continuous model adaptation. Based on the responsibility assigned to each model [30], all of the worker model results are aggre-gated and sent to the GlobalBMI which creates the output data structure to send back to NRG. Then the local program (PentusaData) completes the robot motor task based on the motor command in the output data structure.

This CW sampled 32 electrodes (up to 96 neurons on the DSPs used in NRG) in 2.7ms with a standard deviation of 5.59ms. (To accomplish this required paralleliza-tion of the DSP). Expanding this to 192 neurons is now possible with the latest TDT software component (SortSpike3) - this would double amount of necessary sampling but should still be less than 10% of a 100ms computing budget. The theoretical

Development of Symbiotic Brain-Machine Interfaces 613

maximum number of electrodes the CW could support involves a tradeoff between the other processes in the computation budget and network transfer speed.

Network Transfer. Communication and control of the CW will require the three subsystems (PentusaData, GlobalBMI, and LocalBMI) to perform all of their compu-tation within the 100ms deadline for each behavioral action. Since the NRG and ACIS sites are approximately 500m apart, the overhead and limitations in network commu-nication need to be addressed. Using TCP protocol to transfer data can achieve reli-ability via transport-layer timeout and retransmission, but it is difficult to control these mechanisms with the desired recovery policies and timing constraints [ref]. Instead, the CW implemented a custom reliability mechanism transfer upon the faster but less reliable UDP protocol. The middleware on the data acquisition server moni-tors the elapsed time after sending out the data. A timeout happens when it detects that it is unlikely to get the computation results back in time to meet the deadline. It then stores this failed data sample in a circular first-in-first-out (FIFO) buffer, and starts a new closed-loop cycle by polling for the next sample. Benchmarking of the closed loop communication protocol revealed that the round trip could be reliably completed within 5ms.

Model Computation. The possibility of concurrent execution of many BMI models in the CW will require a tremendous amount of computing power and storage capac-ity. Thus, the system needs to efficiently aggregate resources for model computation and integrate with onsite (local) signal sampling and robotic movement. The CW computes these models-sets in parallel using MPI. Parallel computation is conducted upon resources provided through virtualization to provide efficient utilization of re-sources. Virtualization allows many VMs to be created upon a single physical ma-chine and transparently share its computing and storage resources. Each VM can be customized with the necessary execution environment, including operating system and libraries, to support seamless deployment of a BMI model. Multiple models can run concurrently with dedicated VMs, where resources are dynamically provisioned according to model demands and timing requirements.

For offline experiments, virtual machines allow the model computation to share re-sources with other jobs in an isolated manner. These jobs are managed via a cluster management system that provides queuing and scheduling of job executions on the virtual machines. Because online experiments have stricter timing requirements for the model computation, a set of physical resources must be reserved in advance to prepare a cluster of virtual machines for the online computation. The existing offline jobs on these reserved resources can be transparently suspended or relocated during the online experiment by suspending or migrating their VMs.

4 Conclusion

We have presented progress here towards a new framework to seamlessly bridge computational resources with in vivo experimentation in order to build symbiotic BMIs operating in dynamic environments. Online and offline BMI experiments can be performed on it in a closed-loop manner that includes in vivo data acquisition, reliable network transfer, and parallel model computation. With this computational

614 J.C. Sanchez et al.

infrastructure we seek to deploy next generation decoding algorithms that incorporate dynamics responses of both the brain and a computer agent working toward goals.

This solution for distributed BMIs could lay the ground for scalable middleware techniques that, in the long run, can support increasingly-elaborate neurophysiologic research test beds in which subjects can carry out more complex tasks. These ad-vanced test beds will be essential for the development and optimization of the compu-tational components that can be implemented on future workstations with multiple multi-core processors which, collectively, will be able to provide the necessary re-sources deeper study into BMI neural coding and function.

References

1. Buzsáki, G.: Rhythms of the Brain. Oxford University Press, New York (2006) 2. Kim, S.P., Sanchez, J.C., Rao, Y.N., Erdogmus, D., Principe, J.C., Carmena, J.M., Lebe-

dev, M.A., Nicolelis, M.A.L.: A Comparison of Optimal MIMO Linear and Nonlinear Models for Brain-Machine Interfaces. J. Neural Engineering 3, 145–161 (2006)

3. Brown, E.N., Kass, R.E., Mitra, P.P.: Multiple Neural Spike Train Data Analysis: State-of-the-art and Future Challenges. Nature Neuroscience 7, 456–461 (2004)

4. Serruya, M.D., Hatsopoulos, N.G., Paninski, L., Fellows, M.R., Donoghue, J.P.: Brain-machine interface: Instant neural control of a movement signal. Nature 416, 141–142 (2002)

5. Wessberg, J., Stambaugh, C.R., Kralik, J.D., Beck, P.D., Laubach, M., Chapin, J.K., Kim, J., Biggs, S.J., Srinivasan, M.A., Nicolelis, M.A.L.: Real-time prediction of hand trajectory by ensembles of cortical neurons in primates. Nature 408, 361–365 (2000)

6. Helms Tillery, S.I., Taylor, D.M., Schwartz, A.B.: Training in cortical control of neuro-prosthetic devices improves signal extraction from small neuronal ensembles. Reviews in the Neurosciences 14, 107–119 (2003)

7. Moran, D.W., Schwartz, A.B.: Motor cortical representation of speed and direction during reaching. Journal of Neurophysiology 82, 2676–2692 (1999)

8. Taylor, D.M., Tillery, S.I.H., Schwartz, A.B.: Direct cortical control of 3D neuroprosthetic devices. Science 296, 1829–1832 (2002)

9. Wu, W., Black, M.J., Gao, Y., Bienenstock, E., Serruya, M., Donoghue, J.P.: Inferring hand motion from multi-cell recordings in motor cortex using a Kalman filter. In: SAB Workshop on Motor Control in Humans and Robots: on the Interplay of Real Brains and Artificial Devices, University of Edinburgh, Scotland, pp. 66–73 (2002)

10. Chapin, J.K., Moxon, K.A., Markowitz, R.S., Nicolelis, M.A.: Real-time control of a robot arm using simultaneously recorded neurons in the motor cortex. Nature Neuroscience 2, 664–670 (1999)

11. Gao, Y., Black, M.J., Bienenstock, E., Wu, W., Donoghue, J.P.: A quantitative comparison of linear and non-linear models of motor cortical activity for the encoding and decoding of arm motions. In: The 1st International IEEE EMBS Conference on Neural Engineering, Capri, Italy (2003)

12. Sanchez, J.C., Kim, S.P., Erdogmus, D., Rao, Y.N., Principe, J.C., Wessberg, J., Nicolelis, M.A.L.: Input-output mapping performance of linear and nonlinear models for estimating hand trajectories from cortical neuronal firing patterns. In: International Work on Neural Networks for Signal Processing, Martigny, Switzerland, pp. 139–148 (2002)

13. Calvin, W.H.: The emergence of intelligence. Scientific American 9, 44–51 (1990)

Development of Symbiotic Brain-Machine Interfaces 615

14. Fuster, J.M.: Upper processing stages of the perception-action cycle. Trends in Cognitive Sciences 8, 143–145 (2004)

15. Rangel, A., Cramerer, C., Montague, P.R.: A framework for studying the neurobiology of value-based decision making. Nature Reviews Neuroscience 9, 545–556 (2008)

16. Edelman, G.M., Mountcastle, V.B.: Neurosciences Research Program. In: The mindful brain: cortical organization and the group-selective theory of higher brain function. MIT Press, Cambridge (1978)

17. Dayan, P., Niv, Y., Seymour, B., Daw, N.D.: The misbehavior of value and the discipline of the will. Neural Networks 19, 1153–1160 (2006)

18. Jackson, A., Mavoori, J., Fetz, E.E.: Long-term motor cortex plasticity induced by an elec-tronic neural implant. Nature 444, 56–60 (2006)

19. Kleim, J.A., Barbay, S., Nudo, R.J.: Functional reorganization of the rat motor cortex fol-lowing motor skill learning. Journal of Neurophysiology 80, 3321–3325 (1998)

20. Rioult-Pedotti, M.S., Friedman, D., Hess, G., Donoghue, J.P.: Strengthening of horizontal cortical connections following skill learning. Nature Neuroscience 1, 230–234 (1998)

21. Schultz, W.: Multiple reward signals in the brain. Nature Reviews Neuroscience 1, 199–207 (2000)

22. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. The MIT Press, Cam-bridge (1998)

23. Doya, K., Samejima, K., Katagiri, K., Kawato, M.: Multiple model-based reinforcement learning. Neural Computation 14, 1347–1369 (2002)

24. Rivest, F., Bengio, Y., Kalask, J.: Brain Inspired reinforcement learning, in NIPS, Van-couver, CA (2004)

25. Jong, N., Stone, P.: Kernel Based models for reinforcement learning. In: Proc. ICML Workshop on Kernel machines for Reinforcement Learning, Pittsburgh, PA (2006)

26. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press, Cambridge (1998)

27. Worgotter, F., Porr, B.: Temporal sequence learning, prediction, and control: a review of different models and their relation to biological mechanisms. Neural Computation 17, 245–319 (2005)

28. DiGiovanna, J., Mahmoudi, B., Fortes, J., Principe, J.C., Sanchez, J.C.: Co-adaptive Brain Machine Interface via Reinforcement Learning. In: IEEE Transactions on Biomedical En-gineering (Special issue on Hybrid Bionics) (2008) (in press)

29. Zhao, M., Rattanatamrong, P., DiGiovanna, J., Mahmoudi, B., Figueiredo, R.J., Sanchez, J.C., Principe, J.C., Fortes, J.C.: BMI Cyberworkstation: Enabling Dynamic Data-Driven Brain-Machine Interface Research through Cyberinfrastructure. In: IEEE International Conference of the Engineering in Medicine and Biology Society, Vancouver, Canada, pp. 646–649 (2008)

30. Wolpert, D., Kawato, M.: Multiple paired forward inverse models for motor control. Neu-ral Networks 11, 1317–1329 (1998)