An FPGA platform for on-line topology exploration of spiking neural networks
-
Upload
independent -
Category
Documents
-
view
3 -
download
0
Transcript of An FPGA platform for on-line topology exploration of spiking neural networks
An FPGA platform for on-line topology exploration
of spiking neural networks
Andres Upegui*, Carlos Andres Pena-Reyes, Eduardo Sanchez
Logic Systems Laboratory, Swiss Federal Institute of Technology, IN-Ecublens, 1015 Lausanne, Switzerland
Received 20 October 2003; revised 9 July 2004; accepted 19 August 2004
Available online 15 September 2004
Abstract
In this paper we present a platform for evolving spiking neural networks on FPGAs. Embedded intelligent applications require both high
performance, so as to exhibit real-time behavior, and flexibility, to cope with the adaptivity requirements. While hardware solutions offer
performance, and software solutions offer flexibility, reconfigurable computing arises between these two types of solutions providing a trade-
off between flexibility and performance. Our platform is described as a combination of three parts: a hardware substrate, a computing engine,
and an adaptation mechanism. We present, also, results about the performance and synthesis of the neural network implementation on an
FPGA.
q 2004 Elsevier B.V. All rights reserved.
Keywords: Neural hardware; Spiking neuron; Evolvable hardware; Topology evolution; Dynamic reconfiguration; FPGA
1. Introduction
Living organisms, from microscopic bacteria to giant
sequoias, including animals such as butterflies and humans,
have successfully survived on earth during millions of years.
If one has to propose but one key to explain such a success,
there would certainly be adaptation. Two types of
adaptation can be identified on living organisms: at species
level, and at individual level. Adaptation at species level,
also known as evolution [1,2], refers to the capability of a
given species to adapt to an environment by means of
natural selection and reproduction. While adaptation at
individual level, also known as learning [3], refers to the
behaviour changes on an individual, performed by interact-
ing with an environment.
Although several artificial approaches have been largely
explored by researchers, in contrast with nature, adaptation
has been very elusive to human technology. Among other
0141-9331/$ - see front matter q 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.micpro.2004.08.012
* Corresponding author. Tel.: C41 21 693 67 14; fax: C41 21 693 37 05.
E-mail addresses: [email protected] (A. Upegui), carlos.pena@
epfl.ch (C.A. Pena-Reyes), [email protected] (E. Sanchez).
properties, adaptivity makes artificial neural networks
(ANNs) one of the most common techniques for machine
learning. Adaptivity refers to the modification performed to
an ANN in order to allow it to execute a given task. Several
types of adaptive methods might be identified according to
the modification done. The most common methods modify
either the synaptic weights [4] or/and the topology [5–7].
Synaptic-weight modification is the most widely used
approach, as it provides a relatively smooth search space.
On the other hand, the sole topology modification provides a
highly rugged landscape of the search space (i.e. small
changes on the network may result in very different
performances), and, even though these adaptation tech-
niques substantially explore the space of computational
capabilities of the network, it is very difficult to converge to
a solution.
A hybrid of both methods could achieve better
performance, because the weight-adaptation method con-
tributes to smooth the search space, making it easier to find a
solution. Growing [5], pruning [6], and evolutionary
algorithms (EAs) [7] are adaptive methods widely used to
modify an ANN topology, that, in association with weight
Microprocessors and Microsystems 29 (2005) 211–223
www.elsevier.com/locate/micpro
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223212
modification, may converge to a solution. We propose, thus,
a hybrid method where an adaptation of the structure is done
by modifying the network topology, allowing the explora-
tion of different computational capabilities. The evaluation
of these capabilities is done by weight-learning, finding in
this way a solution for the problem at hand.
However, topology modification has a high compu-
tational cost. Besides the fact that weight learning can be
time-consuming, it would be multiplied by the number of
topologies that are being explored. Under these conditions,
on-line applications would be unfeasible, unless it is
available enough knowledge of the problem to restrict the
search space just to perform small topology modifications.
A part of the problem can be solved with a hardware
implementation, which highly reduces the execution time as
the evaluation of the network is performed with the neurons
running in parallel. However, a complexity problem
remains: while on software, additional neurons and
connections imply just some extra loops, on hardware
there is a finite area (resources) that limits the number of
neurons that can be placed on a network. This is due to the
fact that each neuron has a physical existence that occupies
a given area and that each connection implies a physical
cable that must connect two neurons. Moreover, if an
exploration of topologies is done, the physical resources
(connections and neurons) for the most complex possible
networks must be allocated in advance, even if the final
solution is less complex. This fact makes of connectionism a
very important issue since a connection matrix for a large
number of neurons is considerably resource-consuming.
Current Field Programmable Gate Arrays (FPGAs) allow
tackling this resource availability problem thanks to their
dynamic partial reconfiguration (DPR) feature [8], which
allows reusing internal logic resources. This feature permits
to dynamically reconfigure some physical logic units, while
the circuit remains operational, reducing the size of the
hardware requirements, and optimizing the number of
neurons and the connectivity resources.
In this paper we propose a reconfigurable hardware
platform using DPR, which tackles the ANN topology-
search problem. Section 2 presents an introduction to the
bio-inspired techniques used in our platform. In Section 3
we present a brief description of FPGAs, and more precisely
to dynamic partial reconfiguration. Section 4 presents a
description of the full platform. In Section 5 we describe the
hardware substrate necessary to support our platform. In
Section 6 we discuss the implementation of a GA on our
hardware platform. In Section 7 we present a spiking neuron
model exhibiting a reduced connectionism schema and low
hardware resources requirements. In Section 8 we present
some preliminary results: a simulation of a network solving
a problem of frequency discrimination, and its respective
FPGA implementation as a validation for the network.
Section 9 contains a discussion about the possibilities and
limitations of the platform, and gives some directions for
further work. Finally, Section 10 concludes.
2. Background: bio-inspired techniques
Nature has always stimulated the imagination of humans,
but it is only very recently that technology is allowing the
physical implementation of bio-inspired systems. They are
man-made systems whose architectures and emergent
behaviours resemble the structure and behaviour
of biological organisms [9]. Artificial neural networks
(ANNs), evolutionary algorithms (EAs), and
fuzzy logic (FL) are the main representatives of a new,
different approach to artificial intelligence. Names like
‘computational intelligence’, ‘soft computing’, ‘bio-
inspired systems’, or ‘natural computing’ among others,
are used to denominate the domain involving these and
other related techniques. Whatever the name, these
techniques exhibit the following features: (1) their role
models in different extents are natural processes such as
evolution, learning, or reasoning; (2) they are intended to be
tolerant of imprecision, uncertainty, partial truth, and
approximation; (3) they deal mainly with numerical
information processing using little or no explicit knowledge
representation. We present below a brief description of
ANNs and EAs, and the hybrid between them: Evolutionary
ANNs (EANNs).
2.1. Artificial neural networks
As said by Haykin: ‘A neural network is a massively
parallel distributed processor made up of simple processing
units, which has a natural propensity for storing experiential
knowledge and making it available for use. It resembles the
human brain in two respects: (1) Knowledge is acquired
through a learning process. (2) Synaptic weights are used to
store the knowledge.’ [4]. Among other features, ANNs
provides nonlinearity (an ANN made up of nonlinear neurons
has a natural ability to realize nonlinear input–output
functions), are universal approximators (ANNs can approxi-
mate input–output functions to any desired degree of
accuracy, given an adequate computational complexity),
are adaptable (adjustable synaptic weights and network
topology, can adapt to its operating environment and track
statistical variations), are fault tolerant (an ANN has the
potential to be fault-tolerant, or capable of robust perform-
ance, in the sense its performance degrades gradually under
adverse operating conditions), and intend to be neuro-
biologically plausible (neurobiologists look to neural net-
works as a research tool for the interpretation of neurobio-
logical phenomena. By the same token, engineers look to the
human brain for new ideas to solve difficult problems) [4].
In other terms, an artificial neural network is a system
that learns to map a function from an input vector to an
output vector. It consists on a set of simple units which are
called artificial neurons. Each neuron has an internal state
which depends on its own input vector. From this state the
neuron maps an output that is sent to other units through
parallel connections. Each connection has a synaptic weight
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223 213
that multiplies the signal travelling through it. So, the final
output of the network is a function of the inputs and the
synaptic weights of the ANN.
In general learning deals with adjusting these synaptic
weights, but some algorithms modify also the network
architecture—i.e. the network connectionism or the neuron
model. Three main types of learning algorithms are
identified: supervised, unsupervised and reinforcement
learning. In supervised learning, the desired output from
the network is known in advance, so modifications are done
in order to reduce the resulting error. It is often used for data
classification and non-linear control. In unsupervised
learning modifications depend on correlations among the
input data, so the network is intended to identify these
correlations without knowing them in advance. It is used for
clustering, pattern recognition, and reconstruction of
corrupted data, among others. Finally, in reinforcement
learning, modifications are done based on a critic’s score,
which indicates how well the ANN performs, but there is no
explicit knowledge of the desired solution. It is often used in
systems that interact with an environment such as robot
navigation and games (e.g. backgammon, chess).
2.2. Evolutionary algorithms
Evolutionary computation makes use of a metaphor of
natural evolution according to which a problem plays the role
of an environment wherein lives a population of individuals,
each representing a possible solution to the problem. The
degree of adaptation of each individual to its environment is
expressed by an adequacy measure known as the fitness
function. The phenotype of each individual, i.e. the candidate
solution itself, is generally encoded in some manner into its
genome (genotype). Evolutionary algorithms potentially
produce progressively better solutions to the problem. This
is possible thanks to the constant introduction of new
‘genetic’ material into the population, by applying so-called
genetic operators which are the computational equivalents of
natural evolutionary mechanisms.
The archetypal evolutionary algorithm proceeds as
follows: an initial population of individuals, P(0), is
generated at random or heuristically. Every evolutionary
step t, known as a generation, the individuals in the current
population, PM(t), are decoded and evaluated according to
some predefined quality criterion, referred to as the fitness.
Then, a subset of individuals, P 0(t)—known as the mating
pool— is selected to reproduce, according to their fitness.
Thus, high-fitness (‘good’) individuals stand a better chance
of ‘reproducing,’ while low-fitness ones are more likely to
disappear.
As they combine elements of directed and stochastic
search, evolutionary techniques exhibit a number of
advantages over other search methods. First, they usually
need a smaller amount of knowledge and fewer assumptions
about the characteristics of the search space. Second, they
are less prone to get stuck in local optima. Finally, they
strike a good balance between exploitation of the best
solutions, and exploration of the search space.
Among the applications, we can find topics as diverse as
molecular biology, analogue and digital circuit synthesis,
robot control, etc.
2.3. Evolutionary artificial neural networks
Adaptation refers to a system’s ability to undergo
modifications according to changing circumstances, thus
ensuring its continued functionality. In this context, learning
and evolution are two fundamental forms of adaptation.
Evolutionary artificial neural networks refer to a special
class of artificial neural networks in which evolution is
applied as another form of adaptation in substitution of, or in
addition to, learning. Evolutionary algorithms are used to
perform various tasks, such as connection weight training or
initialization, architecture design, learning rule adaptation,
and input feature selection. Some of these approaches are
examined below.
–
Evolution of connection weights. In this strategyevolution replaces learning algorithms in the task of
minimizing the neural network error function. Global
search, conducted by evolution, allows overcoming the
main drawback presented by gradient-descent-based
algorithms which often get trapped in local minima. It
is also useful for problems in which an error-gradient is
difficult to compute or estimate. This approach has been
widely used as reflected by the numerous references
presented by Yao [7].
–
Evolution of architectures. The architecture of anartificial neural network refers to its topological
structure. Architecture design is crucial since an under-
sized network may not be able to perform a given task
due to its limited capability, while an oversized one may
overlearn noise in the training data and exhibit poor
generalization ability. Constructive and destructive
algorithms for automatic design of architectures are
susceptible to becoming trapped at structural local
optima. Research on architectural evolution of neural
networks have concentrated mainly in the design of
connectivity [10–12].
–
Evolution of learning rules. The design of trainingalgorithms used to adjust connection weights depends on
the type of architectures under investigation. It is
desirable to develop an automatic and systematic way
to adapt the learning rule to an architecture and to the
task to be performed. Research into the evolution of
learning rules is important not only in providing an
automatic way of optimizing learning rules and in
modelling the relationship between learning and evol-
ution, but also in modelling the creative process since
newly evolved learning rules can deal with a complex
and dynamic environment. Representative advances of
this research are [13,14].
Fig. 1. Design layout with two reconfigurable modules. (From Ref. [8]).
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223214
3. Dynamic partial reconfiguration on FPGAs
FPGAs are programmable logic devices that permit the
implementation of digital systems. They provide an array of
logic cells that can be configured to perform a given
function by means of a configuration bitstream. This
bitstream is generated by a software tool, and usually
contains the configuration information for all the internal
components. Some FPGAs allow performing partial recon-
figuration (PR), where a reduced bitstream reconfigures
only a given subset of internal components. Dynamic Partial
Reconfiguration (DPR) is done while the device is active:
certain areas of the device can be reconfigured while other
areas remain operational and unaffected by the reprogram-
ming [8]. For the Xilinx’s FPGA families Virtex, Virtex-E,
Virtex-II, Spartan-II and Spartan-IIE there are three
documented styles to perform DPR: small bits manipulation
(SBM), multi-column PR, independent designs (ID) and
multi-column PR, communication between designs (CBD).
Under the SBM style the designer manually edit low-
level changes. Using the FPGA Editor the designer can
change the configuration of several kinds of components
such as: look-up-table equations, internal RAM contents,
I/O standards, multiplexers, flip-flop initialization and reset
values. After editing the changes, a bitstream can be
generated, containing only the differences between the
before and the after designs. For complex designs, SBM
results inaccurate due to the low-level edition and the lack
of automation in the generation of the bitstreams.
ID and CBD allow the designer to split the whole system
in modules. For each module, the designer must generate
the configuration bitstream starting from an HDL descrip-
tion and going through the synthesis, mapping, placement,
and routing procedures, independently of other modules.
Placement and timing constraints are set separately for each
module and for the whole system. Some of these modules
may be reconfigurable and others fixed (see Fig. 1). A
complete initial bitstream is generated for the fixed and
initial reconfigurable modules. Partial bitstreams are
generated for each reconfigurable module.
The difference between these two styles of reconfigura-
tion is that CBD allows the inter-connection of modules
through a special bus macro, while ID does not. This bus
macro guarantees that, each time partial reconfiguration is
performed the routing channels between modules remain
unchanged, avoiding contentions inside the FPGA and
keeping correct connections between modules. While ID
results limited for neural-network implementation because
it does not support communication among modules, CBD is
well suited for implementing layered topologies of networks
where each layer matches with a module.
CBD has some placement constraints, among which: (1)
the size and the position of a module cannot be changed; (2)
input–output blocks (IOBs) are exclusively accessible by
contiguous modules; (3) reconfigurable modules can com-
municate only with neighbour modules, and it must be done
through bus macros (see Fig. 1); and (4) no global signals
are allowed (e.g. global reset), with the exception of clocks
that use a different bitstream and routing channels [8].
4. Description of the platform
The proposed platform consists of three parts: a hardware
substrate, a computation engine, and an adaptation mech-
anism. Each of them can be addressed on a separate way;
however, they are tightly correlated.
The hardware substrate supports the computation
engine. It must also provide, the flexibility for allowing
the adaptation mechanism to modify the engine. Maximum
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223 215
flexibility could be reached with a software specification of
the full system, however, computation with neural networks
is a task that is inherently parallel, and microprocessor-
based solutions perform poorly as compared to their
hardware counterparts. FPGAs provide high performance
for parallel computation and enhanced flexibility compared
to application specific integrated circuits (ASIC), constitut-
ing the best candidate for our hardware substrate.
The computation engine constitutes the problem solver
of the platform. We have chosen spiking neurons given their
low implementation cost in FPGA architectures [15–18],
but other neuron models could also be considered. Other
computational techniques are not excluded, such as fuzzy
systems, filters, or simple polynomial functions.
The adaptation mechanism provides the possibility to
modify the function described by the computational part.
Two types of adaptation are allowed: structural adaptation
and local learning. The first type results very intuitive given
the hardware substrate that we present, and consists on a
modular structural exploration, where different module
combinations are tested, as described on Section 6. This
principle applies also for any kind of computational
technique and can be implemented using different search
algorithms such as swarm optimization [19]. The second
type of adaptation depends directly on the computational
technique used as it is specific for each one of them and
refers to the type of adaptation that does not modify the
physical topology. For our system, implemented with neural
networks, it refers to synaptic-weight learning which
implies modifying only the contents of a memory. For
neural network implementations it could also refer to
module-restricted growing and pruning techniques where
neurons might be enabled or disabled. In the same way, for
other computation methods, it must refer to adaptation
techniques specific for the given method.
5. Hardware substrate
A hardware substrate is required to support our platform.
It must provide good performance for real-time applications
and enough flexibility to allow topology exploration. The
substrate must provide a mechanism to test different
possible topologies in a dynamic way, to change con-
nectionism, and to allow a wide-enough search space.
Application specific integrated circuits (ASICs) provide
very high performance, but their flexibility for topology
exploration risks to be reduced to a connection matrix, given
the complexity of an ASIC design. Microprocessors offer
high degrees of flexibility, but in networks with large
number of neurons computed sequentially, execution time
could be very long, making them unsuitable for real-time
applications. Programmable logic devices appear as the best
solution providing high performance thanks to their hard-
ware specificity, and a high degree of flexibility given their
dynamic partial reconfigurability.
Under the constraints presented in Section 3 for DPR, we
propose a hardware substrate that contains two fixed and one
or more reconfigurable modules. Fixed modules constitute
the codification and de-codification modules. The codifica-
tion module, placed at the left side of the FPGA (referred to
the schema in Fig. 1), receives signals from the real-world
and codifies them as inputs for the neural network. This
codification may be a frequency or phase coding for spiking
neurons, or a discrete or continuous coding for perceptron
neurons. On the same way the de-codification module is
positioned at the right side of the FPGA and is the one who
interprets the outputs from the network to provide output
signals.
Reconfigurable modules contain the neural network;
each one of them can contain any component or set of
components of the network, such as neurons, layers,
connection matrices, and arrays of them. Different possible
configurations must be available for each module, allowing
different possible combinations of configurations for the
network. A search algorithm should be responsible to search
for the best combination of these configurations, specifi-
cally, a GA for our case, as presented in Section 6.
6. Our proposed on-line evolving ANN
Three main types of evolutionary ANN approaches
might be identified: evolution of synaptic weights, evolution
of learning rules, and evolution of topologies as summarized
in the exhaustive review done by Yao [7]. Evolution of
synaptic weights (learning by evolution) is far more time-
consuming than other learning algorithms. Evolution of
learning rules (learning to learn), where one searches for an
optimal learning rule, could be of further interest for our
methodology. Topology evolution is the most interesting as
it allows the exploration of a wider search space and,
combined with weight learning, is a powerful problem
solver.
DPR flexibility fits well for topology evolution. The main
consequence of the aforementioned features of DPR is a
modular structure, where each module communicates solely
with his neighbour modules through a bus macro (Fig. 1).
This structure matches well with a layered neural-network
topology, where each reconfigurable module contains a
network layer. Inputs and outputs of the full network should
be previously fixed, as well as the number of layers and the
connectivity among them (number and direction of connec-
tions). While each layer can have whatever kind of internal
connectivity, connections among them are fixed and
restricted to neighbour layers.
For each module, there exists a pool of different possible
configurations. Each configuration may contain a layer
topology (i.e. a certain number of neurons with a given
connectivity). As illustrated in Fig. 2, each module can be
configured with different layer topologies, provided that
they offer the same external view (i.e. the same inputs and
Fig. 2. Layout of the reconfigurable network topology.
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223216
outputs). Several generic layer configurations are generated
to obtain a library of layers, which may be used for different
applications.
A GA [20,21] is responsible for determining which
configuration bitstream is downloaded to the FPGA. The
GA considers a full network as an individual (Fig. 3). For
each application the GA may find the combination of layers
that best solves the problem. Input and output fixed modules
contain the required logic to code and decode external
signals and to evaluate the fitness of the individual
depending on the application (the fitness could also be
evaluated off-chip).
As in any GA the phenotype is mapped from the
genome, in this case the combination of layers for a
network. Each module has a set of possible configurations
and an index is assigned to each configuration, the
genome is composed of a vector of these indexes. The
genome length for a network with n modules, and c(i)
possible configurations for the ith module (with iZ1,2,.n), is given by LZ
Pl(i). For a binary genome
encoding l(i)Zlog2(c(i)), while for a positive integer
encoding l(i)Z1.
Fig. 3. Evolution of a layered neural network. The genome uses a binary codificat
measure of the fitness is obtained, a new individual can be tested, and so on. When
the calculation of the fitness.
7. Neural model
Most neuron models, such as perceptron or radial basis
functions, use continuous values as inputs and outputs,
processed using logistic, gaussian or other continuous
functions [4,5]. In contrast, biological neurons process
spikes: as a neuron receives input spikes by its dendrites, its
membrane potential increases following a post-synaptic
response. When the membrane potential reaches a certain
threshold value, the neuron fires, and generates an output
pulse through the axon. The best known biological model is
the Hodgkin and Huxley model (H and H) [22], which is
based on ion current activities through the neuron
membrane.
The most biologically plausible models are not the best
suited for computational implementations. This is the
reason why other simplified approaches are needed [23].
The leaky integrate and fire (LI&F) model [24,25] is
based on a current integrator, modelled as a resistance and
a capacitor in parallel. Differential equations describe the
voltage given by the capacitor charge, and when a certain
voltage is reached the neuron fires. The spike response
ion. The genome maps an individual, a neural network in this case. When a
the full population is tested the genetic operands can be applied and restart
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223 217
model order 0 (SRM0) [24,25] offers a response that
resembles that of the LI and F model, with the difference
that the membrane potential is expressed in terms of
kernel functions [24] instead of differential equations.
Spiking-neuron models process discrete values repre-
senting the presence or absence of spikes; this fact allows
a simple connectionism structure at the network level and
a striking simplicity at the neuron level. However,
implementing models like SRM0 and LI and F on digital
hardware is largely inefficient, wasting many hardware
resources and exhibiting a large latency due to the
implementation of kernels and numeric integrations. This
is why a functional hardware-oriented model is necessary
to achieve fast architectures at a reasonable chip area cost.
7.1. The proposed neuron model
Our simplified integrate-and-fire model [26], as standard
spiking models, uses the following five concepts: (1)
membrane potential; (2) resting potential; (3) threshold
potential; (4) postsynaptic response; and (5) after-spike
response (see Fig. 4). A spike is represented by a pulse. The
model is implemented as a Moore finite state machine. Two
states, operational and refractory, are allowed.
During the operational state, the membrane potential is
increased (or decreased) each time a pulse is received by an
excitatory (or inhibitory) synapse, and then it decreases (or
increases) with a constant slope until the arrival to the
resting value. If a pulse arrives when a previous postsyn-
aptic potential is still active, its action is added to the
previous one. The membrane potential dynamics is
Fig. 4. Response of the model to a train of input spikes, and
described by
uðtÞZuðtK1ÞKKðuðtK1ÞÞCXn
iZ1
WisiðtK1Þwith KðuðtÞÞ
Zk1 if uðtÞOUrest
Kk2 otherwiseð1Þ
(
where u(t) is the membrane potential at time t, Urest is the
constant resting potential, n is the number of inputs to the
neuron, Wi is the synaptic weight for input i, si(t) is the input
spike at input i and at time t, and k1 and k2 are positive
constants that determine, respectively, the decreasing and
increasing slopes.
When the firing condition is fulfilled (i.e. potential Rthreshold) the neuron fires, the potential takes on a
hyperpolarization value called after-spike potential and the
neuron passes then to the refractory state.
After firing, the neuron enters in a refractory period in
which it recovers from the after-spike potential to the resting
potential. Two kinds of refractoriness are allowed: absolute
and partial. Under absolute refractoriness, input spikes are
ignored, and the membrane potential is given by
uðtÞ Z uðt K1ÞCk2 (2)
Under partial refractoriness, the effect of input spikes is
attenuated by a constant factor. The membrane potential in
this case would be expressed as
uðtÞ Z uðt K1ÞCk2 C
Xn
iZ1
Wisiðt K1Þ
a(3)
where a is a constant positive integer, and determines the
attenuation factor. The refractory state determines the time
the Moore state machine that describes such response.
Fig. 5. Hebbian learning windows. When neuron n3 fires at tf3, the learning
window of neurons n1 and n2 are disabled and enabled, respectively. At
time tf3 synaptic weight W13 is decreased by the learning algorithm, while
W23 is increased.
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223218
needed by a neuron to recover from firing. This time is
completed when the membrane potential reaches the resting
potential, and the neuron comes back to the operational
state.
Our model simplifies some features with respect to SRM0
and LI and F, in particular, the post-synaptic response. The
way in which several input spikes are processed affects the
system dynamics: under the presence of two simultaneous
input spikes, SRM0 performs a linear superposition of post-
synaptic responses, while our model, in a similar way as LI
and F, adds the synaptic weights to the membrane potential.
Even though our model is less biologically plausible than
SRM0 and LI and F, it is still functionally similar.
7.2. Learning
Weight learning is an issue that has not been fully solved
on spiking neuron models. Several learning rules have been
explored by researchers, being Synaptic Time Dependant
Plasticity (STDP), a type of hebbian learning, one of the
most studied [24,25]. In general, hebbian learning modifies
Fig. 6. Proposed hardware neuron (a)
the synaptic weight Wij, considering the simultaneity of the
firing times of the pre- and post-synaptic neurons i and j.
Herein we will describe a simplified implementation of
hebbian learning oriented to digital hardware. Two func-
tions are added to the neuron model: active-window and
learning.
The active-window function determines whether the
learning-window of a given neuron is active or not (Fig. 5)
maintaining a value 1 during a certain time after the
generation of a spike by a neuron ni. The awi function is
given by
awiðtÞ Z stepðtfi ÞKstepðt
fi CwÞ (4)
where tif is the firing time of ni and w is the size of the
learning window. This window allows the receptor neuron
(nj) to determine the synaptic weight modification (DWij)
that must be done.
The learning function modifies the synaptic weights of
the neuron, performing the hebbian learning (Fig. 5). Given
a neuron ni with k inputs, when a firing is performed by ni
the learning modifies the synaptic weights Wij (with jZ1,2,.,k) as follows
WijðtÞ Z Wijðt K1ÞCDWijðtÞ with : DWijðtÞ
Z a,awjðtÞKb (5)
where a is the learning rate and b is the decay rate. Both a
and b are positive constants such that aOb.
These two functions, active-window and learning,
increase the amount of interneuron connectivity as they
imply, respectively, one extra output and k extra inputs for a
neuron (Fig. 6a).
External view. (b) Architecture.
Fig. 7. Layout of the network implemented on hardware.
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223 219
7.3. The proposed neuron on hardware
Several hardware implementations of spiking neurons
have been developed on analog and digital circuits [15–18,
25]. Analog electronic neurons achieve postsynaptic
responses very similar to their biological counterpart;
however, analog circuits are difficult to setup and debug.
On the other hand, digital spiking neurons use to be less
biologically plausible, but easier to setup, debug, scale, and
learn, among other features. Additionally these models can
be rapidly prototyped and tested thanks to configurable logic
devices such as FPGAs.
The hardware implementation of our neuron model is
illustrated in Fig. 6. The neuron is basically composed
of: (1) a control unit; (2) a memory containing
parameters; (3) some logic resources to compute the
membrane potential; (4) two modules performing the
learning, and (4) logic resources to interface input and
output spikes. The control unit is a finite state machine
with two states: operational and refractory. An absolute
refractoriness is implemented on our neuron. The
computing of a time slice (iteration) is given by a
pulse at the input clk_div, and takes a certain number of
clock cycles depending on the number of inputs to the
neuron. The synaptic weights are stored on a memory,
which is swept by a counter. Under the presence of an
input spike, its respective weight is enabled to be added
to the membrane potential. Likely, the decreasing and
increasing slopes (for the post-synaptic and after-spike
responses, respectively,) are contained in the memory.
Although the number of inputs to the neuron is
parameterizable, increasing the number of inputs implies
raising both the area cost and the latency of the system.
Indeed, the area cost depends highly on the memory size,
which itself depends on the number of inputs to the neuron
(e.g. the 32!9-neuron on Fig. 4 has a memory size of 32!9 bits, where the 32 positions correspond to 30 input weights
and the increasing and decreasing slopes; 9 bits is the
arbitrarily chosen data-bus size). The time required for
computing a time slice is equivalent to the number of inputs
plus one—i.e. 30 inputs plus either the increasing or the
decreasing slope.
The dark blocks on Fig. 6, active-window and learning
module, perform the learning on the neuron. The active-
window block consists on a counter triggered when an
output spike is generated, and that stops when a certain
value, the learning window, is reached. The output aw_out
values logic-1 if the counter is active and logic-0 otherwise.
The learning module performs the synaptic weight
learning described above. This module computes the change
to be applied to the weights (DW), maintaining them
bounded. At each clock cycle the module computes the new
weight for the synapse pointed by the COUNTER signal;
however, these new weights are stored only if an output
spike is generated by the current neuron.
8. Experimental Setup and Results
The experimental setup consists of two parts: a Matlabw
simulation for a spiking neural network (Section 8.1), and
its respective validation on an FPGA (Section 8.2).
8.1. Network description and simulation
A frequency discriminator is implemented in order to test
the capability of the learning network to unsupervisedly
solve a problem with dynamic characteristics.
Using the 30-input neuron described in Section 7.3, we
implement a layered neural network with three layers,
fulfilling the constraints required for the on-line evolution
implementation described in Section 6. Each layer contains
10 neurons and is internally full-connected. Additionally,
layers provide outputs to the preceding and the following
layers, and receive outputs from them, as described in Fig. 7.
For the sake of modularity, each neuron has 30 inputs: 10
from its own layer, 10 from the preceding one, and 10 from
the next one.
To present the patterns to the network, the encoding
module takes into account the following considerations: (1)
we use nine inputs at layer 1 to introduce the pattern; (2) the
patterns consist on two sinusoidal waveforms with different
periods; (3) the waveforms are normalized and discretized
to nine levels; (4) every three time slices (iterations) a spike
is generated at the input corresponding to the value of the
discretized signal (Fig. 8).
The simulation setup takes into account the constraints
imposed by the hardware implementation. Table 1 presents
the parameter setup for the neuron model and for the
learning modules. Initial weights are integer numbers
generated randomly from 0 to 127.
Different combinations of two signals are presented as
shown in Fig. 8. In order to help the unsupervised
learning—described in Section 7.2—to separate the signals,
they are presented as follows: during the first 6000 time
slices the signal is swapped every 500 time slices, leaving,
between them, an interval of 100 time slices, where no input
Fig. 8. Neural activity on a learned frequency discriminator. The lowest nine lines are the input spikes to the network: two waveforms with periods of 43 and
133 time slices are presented. The next 10 lines show the neuron activity at layer 1, as well as the following lines show the activity of layers 2 and 3. After
10.000 time slices a clear separation can be observed at the output layer where neurons fire under the presence of only one of the waveforms.
Table 2
Signal periods presented to the network. Period units are time slices
Period 1 Period 2 Separation
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223220
spike is presented. Then, this interval between the signals is
removed, and these latter are swapped every 500 time slices.
Several combinations of signals with different periods
are presented to the network. Five tries are allowed for each
combination. Some of the signals are correctly separated at
least once, while others are not, as shown in Table 2. It must
be noticed that the ranges of periods that are separable
depends highly on the way in which the data are presented to
the network (encoding module). In our case, we are
generating a spike every three time slices; however, if
higher (or lower) frequencies are expected to be processed,
spikes must be generated at higher (or lower) rates. The
period range is also affected by the dynamic characteristic of
the neuron: i.e. the after-spike potential and the increasing
and decreasing slopes. They determine the membrane-
potential response after input and output spikes, playing a
fundamental role on the dynamic response of the full
network.
Table 1
Set-up parameters for each neuron and for the hebbian learning
Neuron parameters Learning parameters
Resting-potential 32 Learning rate 6
Threshold-potential 128 Decay rate 4
After-spike potential 18 Weight upper bound 127
Increasing slope 1 Weight lower bound K32
Decreasing slope 1 Learning window size w 16
Potential lower bound K128
8.2. The network on hardware
The same neural network described above is
implemented on a relatively small FPGA to validate
the network execution We work with a Spartan II
xc2s200 FPGA from Xilinx Corp. with a maximum
capacity of implementing up to 200.000 logic gates. This
FPGA has a matrix of 28!42 CLBs (configurable logic
blocks), each of them composed of two slices, which
contain the logic where the functions are implemented,
for a total of 2352 slices. The xc2s200 is the largest
device from the low-cost FPGA family Spartan II. Other
FPGAs families such as Virtex II offer up to 40 times
more logic resources.
40 100 Yes
43 133 Yes
47 73 No
47 91 Yes
50 100 Yes
73 150 No
73 190 Yes
101 133 Yes
115 190 No
133 170 No
133 190 No
Table 3
Synthesis results for a neuron, a layer, and a network
Unit synthesized Number of CLB
slices
FPGA percentage
A neuron (30 inputs) 53 2.25
A layer (10 neurons) 500 21.26
A network (three layers,
without modules)
1273 54.21
A network (three layers,
modular design)
1500 63.78
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223 221
We implemented the 30-input neuron described in
Section 7.3 with a data bus of 11 bits, however, the memory
maintains its width of 9 bits. The width of the data bus is
larger than the memory width preventing transitory over-
flow for arithmetic operations. Synthesis results about
different implementations of these neurons can be found
in [15]. The area requirement is very low compared to other
more biologically-plausible implementations (e.g. Ros et al.
[18] use 7331 Virtex-E CLB slices for two neurons).
However, measuring a quantitative comparison in terms of
performance with this or with other implementations would
be impossible, given the absence of a standard criterion for
measuring performance. Several criteria additional to
the minimum error achieved on different possible problems
might be taken into account, such as: execution-speed,
learning-speed, size of the neuron, generalization ability,
possibility of learning on-chip or off-chip, possibility of
learning on-line or off-line, biological plausibility, etc.
Table 3 presents the synthesis results for a neuron, a
layer, and the whole network with and without modular
design. Note that a layer of 10 neurons take less slices than
10 independent neurons thanks to synthesis optimization.
That should also apply for the whole network. However,
when the network is modular it is not possible to simplify
the implementation, given that each layer has clearly-
defined boundaries on the circuit, and cannot be merged
with neighbour modules.
To test the design, the sequence of input spikes (i.e. after
the encoding stage) is stored in a memory block. The
hardware network, both in simulation and on-chip, exhibits
similar behaviour to that of its Matlab counterpart: clear
frequency discrimination is obtained at the output of the
network, as some outputs generate spikes only responding to
a given input frequency.
The system achieves a speed of up to 54.4 MHz. The
latency for a time slice is 64 clock cycles, what means that
the duration of a time slice can go down to 1.17 ms. The
neuron was implemented with a latency of 64 clock slices to
allow it to interact with larger neurons with up to 62 inputs,
guaranteeing uniformity in the spike duration. However,
given that specifically for this network we use only 30-input
neurons, the latency could be reduced to 32 clock cycles.
This latency reduction may increase also, slightly,
the operation frequency of the system, since it implies
some reduction on the logic resources.
9. Further work
Our promising results have incited us to engage in further
investigation of this approach. We are currently pursuing
five lines of research: (1) improvement of adaptivity
techniques, introducing in particular, novel and more
effective learning algorithms; (2) enhancement of the
computation engine, providing other options for post-
synaptic potential response; (3) refining the implementation
of the platform in order to allow on-chip evolution; (4)
introducing interpretability options using fuzzy logic; and
(5) implementing more challenging applications. Below we
develop briefly each issue.
9.1. Adaptivity
Hebbian learning has shown not to be the best learning
technique for embedded applications given its unsupervised
nature. However, as in nature, it can be the base to
implement other, more advanced, learning strategies.
Reinforcement learning uses to be the most suitable type
of learning for systems adapting to real world environments.
A hybrid between hebbian and reinforcement learning could
be implemented in our system by adding dopaminergic
signals. In the same way, a hybrid between supervised and
hebbian learning could result adequate for some
applications.
There are, also, other adaptivity approaches to be
explored at the evolutionary level. Coevolutionary algor-
ithms [27] model the interaction between several species,
where each of them evolves separately but whose fitness is
affected by the interaction with other species. For our
system each module could be considered as a separated
evolving species.
9.2. On-chip evolution
Current trends in Systems-On-Chip have led to the
commercialization of FPGAs containing powerful hard-
wired microprocessors, which is the case for the Virtex II
pro FPGA containing a PowerPC. Given this availability,
why to run a GA on a PC instead of executing it on a high
performance processor inside the device being reconfi-
gured? If such high performance is not required, other lower
cost solutions are available, like soft processor cores (not
hardwired, but implemented on the FPGA logic cells).
9.3. Post-synaptic potential response
Other types of post-synaptic responses may be con-
sidered as the ones presented by Maass [23]. A type-A
neuron uses a post-synaptic response as the one of Fig. 9(a),
Fig. 9. Post-synaptic potential responses for neurons (a) type-A and (b)
type-B.
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223222
which could provide lower computing capabilities and
lower resources requirements for the FPGA implemen-
tation. On the same way, the post-synaptic response for a
type-B neuron (Fig. 9(b)) could improve computation,
requiring more logic resources.
9.4. Interpretability
Many human tasks may benefit from, and sometimes
require, decision explanation systems. Among them one can
cite diagnosis, prognosis, and planning. However, neural
networks produce outputs without providing any insight on
the underlying reasoning mechanism. Fuzzy inference
systems provide a formalism to represent knowledge in a
way that resembles human communication and reasoning.
Moreover, their layer structure, somewhat similar to that of
neural networks, makes them adequate to lie on our modular
architecture.
9.5. Application
Frequency discrimination is just the first step for a more
general field, which is signal processing. Challenging
applications such as electroencephalography (EEG) and
electrocardiography (ECG) signal analysis, and speech
recognition, are target applications that could benefit from
embedded smart artefacts that adapt by themselves for
different users. These are the type of problems where, given
their complexity, it is not easy to determine the best type of
architecture to solve them, and the evolution of a neural
network could provide the required flexibility to search for a
correct solution.
10. Conclusions
We have presented a platform defined by three parts: a
hardware substrate, a computation engine, and an adaptation
mechanism. We presented each of these three parts and how
they can be merged. We described the platform design,
simulation and validation, and proposed options to apply
different computation and adaptation techniques on our
platform.
The present work proposes a trade-off between flexibility
and performance by means of reconfigurable computing. By
that, there is a lot of future task to work yet. The validation
of the proposed architecture is presented, for understanding
purposes, as a software simulation (Matlab). However, it
must be noticed that we have implemented the full platform
on an FPGA board as a stand-alone platform.
As computation engine we presented a functional spiking
neuron model suitable for hardware implementation. The
proposed model neglects several characteristics from
biological and software oriented models. Nevertheless, it
keeps its functionality and is able to solve a relatively
complex task like temporal pattern recognition. Since the
neuron model is highly simplified, the lack of representation
power of single neurons must be compensated by a higher
number of neurons, which in terms of hardware resources
could be a reasonable trade-off considering the architectural
simplicity allowed by the model.
In the case of the frequency discriminator implemen-
tation, the use of the sole hebbian learning, given his
unsupervised nature, results effective but not efficient. This
is due to the nature of the problem—i.e. a classification
problem, with the desired output known in advance—where
a supervised algorithm would certainly perform better.
Although solutions were found for a given set of
frequencies, we consider that better solutions could be
found with the adequate amount of neurons. While for some
classification tasks it remains useful, hebbian learning
results inaccurate for other applications.
Spiking-neuron models seem to be the best choice for
this kind of implementation, given their low requirements of
hardware and connectivity [15–18], keeping good compu-
tational capabilities, compared to other neuron models [25].
Likewise, layered topologies, which are among the most
commonly used, seem to be the most suitable for our
implementation method. However, other types of topologies
are still to be explored.
A simple GA is proposed as adaptation mechanism;
however, different search techniques could be applied with
our methodology. GAs constitute one of the most generic,
simple, and well known techniques, however, we are
convinced that it is not the best one: it does not take into
account information that could be useful to optimize the
network, such as the direction of the error.
References
[1] S.J. Gould, The Structure of Evolutionary Theory, Belknap Press of
Harvard University Press, Cambridge, MA, 2002.
[2] M. Ridley, Evolution, 3rd ed., Blackwell Publishers, Oxford, 2004.
[3] T.M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997.
[4] S. Haykin, Neural Networks, A Comprehensive Foundation, 2nd ed.,
Prentice-Hall, New Jersey, 1999.
[5] A. Perez-Uribe, Structure-adaptable digital neural networks, PhD
Thesis, Lausanne, EPFL, 1999.
[6] R. Reed, Pruning algorithms—a survey, IEEE Transactions on Neural
Networks 4 (1993) 740–747.
[7] X. Yao, Evolving artificial neural networks, Proceedings of the IEEE
87 (1999) 1423–1447.
[8] Xilinx Corp., XAPP 290: Two Flows for Partial Reconfiguration:
Module Based or Small Bits Manipulations, 2002.
A. Upegui et al. / Microprocessors and Microsystems 29 (2005) 211–223 223
[9] C.G. Langton, Artificial Life: An Overview, MIT Press, Cambridge,
MA, 1995.
[10] H.A. Abbass, Speeding up backpropagation using multiobjective
evolutionary algorithms, Neural Computation 15 (2003) 2705–2726.
[11] M. Husken, C. Igel, M. Toussaint, Task-dependent evolution of
modularity in neural networks, Connection Science 14 (2002) 219–
229.
[12] C. Igel, M. Kreutz, Operator adaptation in evolutionary computation
and its application to structure optimization of neural networks,
Neurocomputing 55 (2003) 347–361.
[13] D. Floreano, J. Urzelai, Evolutionary robots with on-line self-
organization and behavioral fitness, Neural Networks 13 (2000)
431–443.
[14] Y. Niv, D. Joel, I. Meilijson, E. Ruppin, Evolution of reinforcement
learning in uncertain environments: a simple explanation for complex
foraging behaviors, Adaptive Behavior 10 (2002) 5–24.
[15] A. Upegui, C.A. Pena-Reyes, E. Sanchez, A hardware implementation
of a network of functional spiking neurons with hebbian learning,
presented at BioAdit—International Workshop on Biologically
Inspired Approaches to Advanced Information Technology, Lau-
sanne, 2004.
[16] D. Roggen, S. Hofmann, Y. Thoma, D. Floreano, Hardware spiking
neural network with run-time reconfigurable connectivity, presented
at Fifth NASA/DoD Workshop on Evolvable Hardware (EH 2003),
2003.
[17] O. Torres, J. Eriksson, J.M. Moreno, A. Villa, Hardware optimization
of a novel spiking neuron model for the POEtic tissue, Artificial
Neural Nets Problem Solving Methods, Pt Ii 2687 (2003) 113–120.
[18] E. Ros, R. Agis, R.R. Carrillo, E.M. Ortigosa, Post-synaptic time-
dependent conductances in spiking neurons: FPGA implementation of
a flexible cell model, Artificial Neural Nets Problem Solving
Methods, Pt Ii 2687 (2003) 145–152.
[19] J.F. Kennedy, R.C. Eberhart, Y. Shi, Swarm Intelligence, Morgan
Kaufmann Publishers, San Francisco, 2001.
[20] M.D. Vose, The Simple Genetic Algorithm: Foundations and Theory,
MIT Press, Cambridge, MA, 1999.
[21] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and
Machine Learning, Addison-Wesley, Reading, MA, 1989.
[22] A.L. Hodgkin, A.F. Huxley, A quantitative description of membrane
current and its application to conduction and excitation in nerve,
Journal of Physiology-London 117 (1952) 500–544.
[23] W. Maass, Networks of spiking neurons: the third generation of neural
network models, Neural Networks 10 (1997) 1659–1671.
[24] W. Gerstner, W. Kistler, Spiking Neuron Models. Single Neurons,
Populations, Plasticity, Cambridge University Press, Cambridge,
2002.
[25] W. Maass, C. Bishop, Pulsed Neural Networks, The MIT Press,
Cambridge, MA, 1999.
[26] A. Upegui, C.A. Pena-Reyes, E. Sanchez, A functional spiking neuron
hardware oriented model, Computational Methods in Neural Model-
ing, Pt 1 2686 (2003) 136–143.
[27] C.A. Pena-Reyes, Coevolutionary fuzzy modeling. PhD Thesis,
Lausanne, EPFL, 2002, pp. 148.
Andres Upegui is a PhD student at the Swiss Federal Institute of
Technology (EPFL), Lausanne, Switzerland. He obtained a diploma on
Electronic Engineering in 2000 from the Universidad Pontificia
Bolivariana (UPB), Medellın, Colombia. He joined the UPB micro-
electronics research group from 2000 to 2001. From 2001 to 2002 he
did the Graduate School on Computer Science at the EPFL, and then he
joined the Logic Systems Laboratory (LSL) as PhD student. His
research interests include reconfigurable computing, bio-inspired
techniques and processor architectures.
Carlos Andres Pena-Reyes received a diploma in Electronic
Engineering from the Universidad Distrital ‘Francisco Jose de Caldas’
Bogota, Colombia, in 1992. He finished postgraduate studies on
Industrial Automation at the Universidad del Valle, Cali, Colombia in
1997 and on Computer Science at the Swiss Federal Institute of
Technology at Lausanne—EPFL, in 1998. His PhD Thesis from the
EPFL was nominated to the prize ‘EPFL 2002 for the best thesis’. He was
Assistant Instructor at the Universities Javeriana and Autonoma in Cali,
Colombia in 1995 and lecturer at the University of Lausanne,
Switzerland in 2003. His research interests include computational
intelligence-based modelling techniques, in particular hybrid
approaches.
Eduardo Sanchez received a diploma in Electrical Engineering from
the Universidad del Valle, Cali, Colombia, in 1975, and a PhD from the
Swiss Federal Institute of Technology in 1985. Since 1977, he has been
with the Department of Computer Science at the Swiss Federal Institute
of Technology, Lausanne, where he is currently a Professor in the Logic
Systems Laboratory, engaged in teaching and research. He also holds a
professorship at the Ecole d’Ingenieurs du Canton de Vaud, University
of Applied Sciences of Western Switzerland. His chief interests include
computer architecture, VLIW processors, reconfigurable logic, and
evolvable hardware. Dr Sanchez was co-organizer of the inaugural
workshop in the field of bio-inspired hardware systems, the proceedings
of which are titled Towards Evolvable Hardware (Springer-Verlag,
1996).