Receptive field optimisation and supervision of a fuzzy spiking neural network

11
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Transcript of Receptive field optimisation and supervision of a fuzzy spiking neural network

This article appeared in a journal published by Elsevier. The attached

copy is furnished to the author for internal non-commercial research

and education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling or

licensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of the

article (e.g. in Word or Tex form) to their personal website or

institutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies are

encouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

Neural Networks 24 (2011) 247–256

Contents lists available at ScienceDirect

Neural Networks

journal homepage: www.elsevier.com/locate/neunet

Receptive field optimisation and supervision of a fuzzy spiking neural networkCornelius Glackin ∗, Liam Maguire, Liam McDaid, Heather SayersUniversity of Ulster, Faculty of Engineering, School of Computing and Intelligent Systems, Magee Campus, Londonderry, BT48 7JL, Northern Ireland, United Kingdom

a r t i c l e i n f o

Article history:Received 8 December 2008Received in revised form 19 November2010Accepted 30 November 2010

Keywords:Clustering methodsReceptive fieldsEvolutionary algorithmsSpiking neural networkSupervised learning

a b s t r a c t

This paper presents a supervised training algorithm that implements fuzzy reasoning on a spiking neuralnetwork. Neuron selectivity is facilitated using receptive fields that enable individual neurons to beresponsive to certain spike train firing rates and behave in a similar manner as fuzzy membershipfunctions. The connectivity of the hidden and output layers in the fuzzy spiking neural network (FSNN) isrepresentative of a fuzzy rule base. Fuzzy C-Means clustering is utilised to produce clusters that representthe antecedent part of the fuzzy rule base that aid classification of the feature data. Suitable cluster widthsare determined using two strategies; subjective thresholding and evolutionary thresholding respectively.The former technique typically results in compact solutions in terms of the number of neurons, and isshown to be particularly suited to small data sets. In the latter technique a pool of cluster candidates isgenerated using Fuzzy C-Means clustering and then a genetic algorithm is employed to select the mostsuitable clusters and to specify cluster widths. In both scenarios, the network is supervised but learningonly occurs locally as in the biological case. The advantages and disadvantages of the network topologyfor the Fisher Iris and Wisconsin Breast Cancer benchmark classification tasks are demonstrated anddirections of current and future work are discussed.

© 2010 Elsevier Ltd. All rights reserved.

1. Introduction

The history of neural network research is characterised by aprogressively greater emphasis paid to biological plausibility. Theevolution of neuron modelling with regard to the complexity ofcomputational units can be classified into three distinct gener-ations (Maass, 1997). The third generation of neuron modelling(spiking neurons) is based on the realisation that the precisemechanism bywhich biological neurons encode and process infor-mation is poorly understood. In particular, biological neurons com-municate using action potentials also known as spikes or pulses.The spatio-temporal distribution of spikes in biological neurons isbelieved to ‘hold the key’ to understanding the brain’s neural code(DeWeese & Zador, 2006).

There exists a multitude of spiking neuron models that canbe employed in spiking neural networks (SNNs). The modelsrange from the computationally efficient on the one hand to thebiologically accurate on the other (Izhikevich, 2004); the formerare typically of the integrate-and-fire variety and the latter areof the Hodgkin–Huxley type. All the models in this range exploittime as a resource in their computations but vary significantly inthe number and kinds of neuro-computational features that theycan model (Izhikevich, 2004). The extensive amount and variety of

∗ Corresponding author. Tel.: +44 2871 375249; fax: +44 2871 375470.E-mail address: [email protected] (C. Glackin).

neuron models exist in acknowledgement of the fact that there isa trade-off between the individual complexity of spiking neuronsand computational intensity.

In addition to the variety of neuron models, biological neuronscan have two different roles to play in the flow of informationwithin neural circuits. These two roles are excitatory and inhibitoryrespectively. Excitatory neurons are responsible for relaying in-formation whereas inhibitory neurons locally regulate the activityof excitatory neurons. There is also experimental evidence to sug-gest that the interaction between these two types of neuron isresponsible for synchronisation of neuron firing in the cortex(Börgers & Kopell, 2003). Ongoing physiological experiments con-tinue to illuminate the underlying processes responsible for thecomplex dynamics of biological neurons.

The degree to which these complex dynamics are modelled inturn limits the size and computational power of SNNs. Therefore,it is imperative to determine which biological features improvecomputational capability whilst enabling efficient description ofneuron dynamics. Ultimately neuro-computing seeks to imple-ment learning in a human fashion. In any kind of algorithm wherehuman expertise is implicit, fuzzy IF-THEN rules can provide a lan-guage for describing this expertise (Zadeh, 1965). In this paper,the rationale for the distribution of biologically-inspired compu-tational elements is prescribed by the implementation of fuzzyIF-THEN rules. This rationale will demonstrate how strictly biolog-ical models of neurons, synapses and learning can be assembled ina network topology using fuzzy reasoning. Two benchmark clas-sification datasets are used to demonstrate the capabilities of the

0893-6080/$ – see front matter© 2010 Elsevier Ltd. All rights reserved.doi:10.1016/j.neunet.2010.11.008

Author's personal copy

248 C. Glackin et al. / Neural Networks 24 (2011) 247–256

topology. The benchmarks are the Fisher Iris andWisconsin BreastCancer datasets.

In Section 2, unsupervised and supervised learning methods,dynamic synapses and receptive fields (RF) are reviewed. A briefdiscussion of how fuzzy reasoning can provide a basis for struc-turing the network topology in such a way that these various ele-ments are suitably utilised follows. Section 3 introduces a genericnetwork topology and outlines the specific models and algorithmsused to implement fuzzy reasoning. Section 4 is concerned withpre-processing of benchmark data, Fuzzy C-Means clustering andthresholding to determine cluster size. Experimental results andremarks for the complex non-linear Iris classification problem us-ing a subjective cluster thresholding approach are presented inSection 5, and results from the evolutionary optimisation of thethresholding technique for theWisconsin Breast Cancer dataset arepresented in Section 6. A discussion of generalisation and themaincontribution of the work are outlined in Section 7, and lastly con-clusions and future research directions are presented in Section 8.

2. Review

In this section, unsupervised and supervised learning methods,dynamic synapses and RFs are reviewed. Modelling synapses isan essential aspect of accurate representation of real neurons,and one of the key mechanisms to reproducing the plethora ofneuro-computational features in SNNs. Learning in all generationsof neural networks involves the changing of synaptic weightsin the network in order for the network to ‘learn’ some input–output mapping. From a biologically plausible point of viewsynaptic modification in spiking neurons should be based on thetemporal relationship between pre- and post-synaptic neurons, inaccordance with Hebbian principles (Hebb, 1949). In fact, Hebbianlearning and its ability to induce long-term potentiation (LTP) ordepression (LTD) provides the basis for most forms of learning inSNNs. Hebbian learning gains great computational power from thefact that it is a local mechanism for synaptic modification but alsosuffers from global stability problems as a consequence (Abbott &Nelson, 2000).

2.1. Unsupervised learning

There are several learning algorithms that can be used toevoke LTP or LTD of synaptic weights. STDP (Bi & Poo, 1998;Markram, Lübke, 1997) is a re-establishment of Hebb’s causalitycondition (Hebb, 1949) of strengthening the weight associatedwith a presynaptic neuron only if there is a high probability thatit caused a postsynaptic spike, and weakening the connectionotherwise. More specifically, the STDP learning rule dictates thatlong-term strengthening of the synaptic efficacy occurs when apre-synaptic spike (AP) precedes a post-synaptic one. Synapticweakening occurs with the reverse temporal order of pre andpostsynaptic spikes. The stability of STDP can be ensured by placinglimits in the strengths of individual synapses and a multiplicativeformof the rule introduces an adaptive aspect to learning, resultingin progressively smaller weight updates as learning progresses.

Bienenstock, Cooper and Munro’s model (BCM) (Bienenstock,Cooper, & Munro, 1982) compares correlated pre- and post-synaptic firing rates to a threshold in order to decide whetherto induce LTP or LTD. The threshold slides as a function of thepost-synaptic firing rate in order to stabilise the model. Despitecriticism for its lack of biological basis (Abbott & Nelson, 2000),BCM has been demonstrated to be related to STDP (Izhikevich &Desai, 2003). In particular, by restricting the number of pairings ofpre- and post-synaptic spikes included in the STDP rule, the BCMrule can be emulated using STDP.

BCM and STDP are of course unsupervised learning algorithms,and as such they do not obviously lend themselves to applicationsrequiring a specific goal definition, since this requires supervisedlearning.

2.2. Supervised learning

There are several methodologies to date for implementingsupervised learning in SNNs:• SpikeProp (Gradient Estimation) (Bohte, Kok, & La Poutré,

2002).• Statistical approach (Pfister, Barber, & Gerstner, 2003).• Linear algebra formalisms (Carnell & Richardson, 2005).• Evolutionary Strategy (Belatreche, Maguire, McGinnity, & Wu,

2003).• Synfire Chains (Sougné, 2000).• Supervised Hebbian Learning (Legenstein, Naeger, & Maass,

2005; Ruf & Schmitt, 1997).• Remote Supervision (Kasiński & Ponulak, 2005).

For a detailed review see Kasiński and Ponulak (2006).SpikeProp (Bohte et al., 2002) is a gradient descent training al-

gorithm for SNNs that is based on backpropagation. The discon-tinuous nature of spiking neurons causes problems with gradientdescent algorithms, but SpikeProp overcomes this issue by only al-lowing each neuron to fire once and by training the neurons to fireat a desired time. However, if weight updates leave the neuron ina state such that it will not fire, the algorithm cannot restore theneuron to firing for any new input pattern. Additionally, since eachneuron is only allowed to fire once, the algorithm can only be usedin a time-to-first-spike coding scheme which means that it cannotlearn patterns consisting of multiple spikes.

By employing a probabilistic approach to the Hebbian interac-tion between pre- and post-synaptic firing, it is possible to pro-duce a likelihood that is a smooth function of its parameters (Pfisteret al., 2003). The aim of this, of course, is that this allows gradientdescent to be applied to the changing of synaptic efficacies. Thisstatistical approach employs STDP-like learning windows and aninjected teacher current. Consequently, the method has been de-scribed (Kasiński & Ponulak, 2006) as a probabilistic version of Su-pervised Hebbian learning (Legenstein et al., 2005; Ruf & Schmitt,1997). Experiments with this approach have been limited to net-works consisting of only two spikes, so it is difficult to know howrobust the technique would be for larger networks.

Linear algebra formulisms involving definitions of inner prod-uct, orthogonality and projection operations for spike time seriesform the backbone of Carnell and Richardson’s work (Carnell &Richardson, 2005). The Gram–Schmitt process is used to find an or-thogonal basis for the input time series subspace, and this is thenused to find the subspace for the desired output. A batch style it-erative process is described that seeks to then minimise the errorbetween target and actual outputs by projecting the error intothe input subspace. The Liquid State Machine (LSM) (Maass,Nätschlager, & Markram, 2002) is used for the experiments. Suc-cessful training is dependent on the variability of input spikes, butsince the training requires batch learning themethod is unsuitablefor online learning (Carnell & Richardson, 2005).

Evolutionary strategies (ES) have been applied as a form ofsupervision for SNNs (Belatreche et al., 2003). ES differ fromgeneticalgorithms in that they rely solely on the mutation operator. Theaccuracy of the resulting SNN provides the basis for determiningthe fitness function and the ES population was shown to produceconvergence to an optimal solution. The learning capabilities ofthe ES were tested with the XOR and Iris benchmark classificationproblems. The Spike Response Model was used to model thespiking neurons in a fully connected feed-forward topology. Alimitation of this approach is that only the time-to-first-spike isconsidered by the ES (Belatreche et al., 2003). Additionally, aswith all evolutionary algorithms the evolutionary process is time-consuming and this renders them unsuitable for online learning.

A synfire chain (SFC) (Sougné, 2000) is a feed-forward multi-layered topology (chain) in which each pool of neurons in the

Author's personal copy

C. Glackin et al. / Neural Networks 24 (2011) 247–256 249

chain (subnet) must fire simultaneously to raise the potential ofthe next pool of neurons enough so that they can fire. STDP and anadditional non-Hebbian term form the basis of the learning rule.The learning rule is designed to regulate the firing activity and toensure that the waves of synchronous firing make it to the outputof the network. The ability of the network to learn an input–outputmapping is sensitive to the number and diversity of connectiondelays. As with many existing supervised SNN learning algorithmsthis technique relies on time-to-first-spike encoding, and as suchcannot learn patterns involving multiple spikes.

Supervised Hebbian Learning (SHL) (Legenstein et al., 2005;Ruf & Schmitt, 1997) is arguably the most biologically plausiblesupervised SNN learning algorithm. SHL simply seeks to ensurethat an output neuron fires at the desired time, with the inclusionof a ‘teaching’ signal. Since the teaching signal comprises ofintracellular synaptic currents, supervision may be envisioned assupervision by other neurons. It has been proven that SHL canlearn to reproduce the firing patterns of uncorrelated Poissonspike trains. Thus it can learn patterns involving multiple spikes.However, it suffers from the limitation that even after the goalfiring pattern has been reached; SHL will continue to change theweights. Thus constraints must be added to the learning rule toensure stability (Legenstein et al., 2005).

The Remote Supervision Method (ReSuMe) is closely relatedto SHL but manages to avoid its drawbacks (Kasiński & Ponulak,2005). The ‘remote’ aspect comes from the fact that teachingsignals are not delivered as currents to the learning neuron (aswithSHL). Instead a teaching signal and STDP-like Hebbian correlationare employed to co-determine the changes in synaptic efficacy.Two STDP-like learning windows are used to update synapticweights. The first window increases the weight whenever there isa temporal correlation between the input and the desired output(teaching signal). The second learning window is an anti-STDPwindow which decreases the weight based on the correlationbetween the input and the actual output. ReSuMe has thus faronly been applied to LSM networks and has demonstrated that itcan learn patterns involving multiple spikes to a high degree ofaccuracy (Kasiński & Ponulak, 2005). Irrespective of the particularneuron model used, the learning process converges quickly and ina stable manner.

Of all the supervised learning algorithms discussed, the ReSuMeapproach is perhaps the most efficient. However, synaptic efficacychanges on amuch shorter time-scale in biological neurons as wellas over the longer time-scale of learning. Hence synapses should bemodelled as dynamic not static entities.

2.3. Dynamic synapses

Synaptic efficacy changes over very short-time scales as well asover the longer time-scale of training. The rate at which synapticefficacy changes is determined by the supply of synaptic resourcessuch as neuro-transmitters and the number of receptor sites. Dy-namic synapse models are typically either deterministic (Tsodyks,Pawelzik, & Markram, 1998) or probabilistic (del Castillo & Katz,1954). However, whichever model is preferred, it is importantthat the modelled magnitude of the post-synaptic response (PSR)changes in response to pre-synaptic activity (Fuhrmann, Segev,Markram, & Tsodyks, 2002). Furthermore, biological neurons havesynapses that can either facilitate or depress synaptic transmissionof spikes (Tsodyks et al., 1998).

Modelling the dynamics of limited synaptic resources makesneurons selective (sensitive) to particular spike frequencies. Thefiltering effects of dynamic synapses occur because there is afrequency of pre-synaptic spike trains that optimises the post-synaptic output (Nätschlager & Maass, 2001; Thomson, 1997). Alikely explanation for this specificity of frequency is that for certain

pre-synaptic spike train frequencies the synapse will not run outof resources whereas for another it probably will. Between thesetwo pre-synaptic spike frequencies there will be an optimum statewhere the post-synaptic spike frequency ismaximised. Thismeansthat certain neurons and synapses can potentially be targeted byspecific frequencies of pre-synaptic spike trains. This phenomenonhas been described as ‘preferential addressing’ (Nätschlager &Maass, 2001).

The dynamic synapse model used in this research has thefollowing kinetic equations (Tsodyks et al., 1998):

dxdt

= zτrec

− USEx(tsp) (1)

dydt

= − yτin

+ USEx(tsp) (2)

dzdt

= yτin

− zτrec

. (3)

The three differential equations describe inactive, active andrecovered states respectively. Each pre-synaptic spike arriving attime tsp activates a fraction (USE) of synaptic resources, which thenquickly inactivate with a time constant of a few milliseconds (τin)and recover slowly with a time constant of τrec . These equationsare used to calculate the post-synaptic current, which is taken tobe proportional to the fraction of resources in the active state:

Isyn(t) = ASEy(t). (4)

The post-synaptic membrane potential is calculated using a leaky-integrate and fire (LIF) passive membrane function given by:

τmemdvdt

= −v + RinIsyn(t) (5)

where τmem is the membrane time constant, v is the membranepotential and Rin is the input resistance. The choice of parametersin the dynamic synapse model determines the type of synapticresponse. The hidden layer neurons in the FSNN are connected tothe input layer using facilitating synapses. Facilitating synapsesare marginally more complicated to implement than depressingsynapses because they involve the inclusion of another differentialequation (Tsodyks et al., 1998):

dUSE

dt= − USE

τfacil+ U1(1 − USE)δ(t − tsp) (6)

where τfacil is the facilitation time constant and U1 is the initialvalue of USE . For facilitating synapses, the equation describes thevariable rate of consumption of synaptic resources.

Depressing synapses model the fact that biological neurons of-ten only respond to the first few spikes in a spike train beforerunning out of synaptic resources. These synapses are imple-mented using the dynamic synapsemodel (Tsodyks et al., 1998) byreplacing Eq. (6) with a constant value for USE . The action potentialwith these types of synapses is only significantly high in magni-tude for a very short interval. Depressing synapses have been de-scribed as being coincidence detectors (Pantic, Torres, & Kappen,2003) since they often require synchrony of post-synaptic poten-tials in order to cause firing of a post-synaptic neuron.

Constructing a network of neurons using synapses that operateat different frequency bands is desirable from the perspective ofpromoting neuron selectivity and richness of information flow.However, it is particularly difficult to tune dynamic synapsemodels to operate at specific frequency bands by changing thevariousmodel parameters. Oneway to guarantee that synapses areresponsive to certain spike train firing rates is with the use of RFs.

Author's personal copy

250 C. Glackin et al. / Neural Networks 24 (2011) 247–256

Fig. 1. Generic FSNN topology.

2.4. Receptive fields

As far back as 1953, experiments with retinal ganglion cells inthe frog showed that the cell’s response to a spot of light grewas the spot grew until some threshold had been reached (Barlow,1953). The part of the visual world that can influence the firing ofa neuron is referred to as the RF of the neuron (Rieke, 1997). InBarlow’s work (Barlow, 1953), it was demonstrated that a spot oflight within the centre of the RF produces excitation of the neuron,whereas when the spot of light is larger than the RF or outside theRF inhibition occurs.

The implications for SNNs are that RFs can be used inconjunction with neuron models to promote feature selectivityand hence enhance the ‘richness’ of information flow. It should bepointed out that RFs in this research behave in a similar mannerto membership functions and, unlike the visual and auditory RFsfound in the biology, the RFs in this research are being utilised toprocess purely numerical data.

3. FSNN (fuzzy SNN) topology

Biological neurondynamics are determinedby the relationshipsbetween spike trains, synaptic resources, post-synaptic currentsand membrane potentials. Neuron selectivity can be furtherstrengthened using RFs. The dilemma is in the way in which allthese various elements can be combined in a logical way resultingin SNNs that provide insight in turn into the biological neuron’scode, and are useful from an engineering perspective. Biologicalneurons obviously implement a form of human reasoning. Humanreasoning is fuzzy in nature and involves a much higher level ofknowledge representation (Zadeh, 1965). Fuzzy rules are typicallydefined in terms of linguistic hedges e.g. low, high, excessive,reduced etc. Taking a cue from fuzzy reasoning, the aim of thispaper is to demonstrate how the components necessary to define afuzzy rule in turn dictate the distribution of the various biologicallyplausible computational elements in a SNN. Fuzzy IF-THEN rulesare of the form:IF (x1isA1) AND . . . . AND (xN is AN)THEN (y is Z) (7)where x1–xN represent the network inputs, A1–AN representhidden layer RFs and y is the network output. Fig. 1 shows the FSNNtopology.

At first glance it seems like any other fully connected feed-forward topology. However, each layer uses various computationalelements to manage the information flow and implement fuzzyreasoning.

The following subsections will outline the components in eachof the layers in the three-layer SNN.

3.1. Input layer

The function of the input neurons is to simply encode featuredata into an appropriate frequency range. Spike trains are thengenerated from the data using a linear encoding scheme. There isalways a significant amount of pre-processing of input data thatis required in order to convert typically numerical feature datainto temporal spike train data. Here the encoding scheme takesthe frequency data points and converts them into an inter-spikeinterval (ISI) which is then used to create linear input spike trains.Thus each data point is encoded to a particular firing rate, which isused to generate an input spike train with a constant ISI. It is thenthe task of the network to alter the flow of information dependingon the encoded firing rate of the input data.

3.2. Hidden layer

All of the synapses in the FSNN are dynamic. There are fa-cilitating synapses between the input and hidden layers and de-pressing synapses between the hidden and output layers. GaussianRFs are placed at every synapse between the input and the hiddenneurons.

Since the input data to the FSNN is encoded linearly as spiketrains of particular firing rates, the RFs utilised in this work needto be selective to these differences in firing rate. In the biology,RFs usually operate in the spatial domain, and relay spike trainsof various frequencies depending on where an input stimulus ispresented in relation to the position of the RF. In this work, data isencoded into firing rates, the RFs are selective to input firing rateand as such are conceptualised as operating in the input spatialdomain. In this way, the RFs provide gain control for hidden layerneurons by scaling the synaptic response to input spike trains.

The RFs determine where an input firing rate fi is in relationto the central operating firing rate of the RF FO. The weight isthen scaled by an amount ϕij when calculating the PSR. It isstraightforward for the synapse to measure the input firing rateof the RF by measuring the ISI of the linear encoded spike trains.This whole process relates to the ‘IF (xi is Ai)’ part of the fuzzyrule, where xi is the input and Ai represents the RF. In this waythe RF makes the synapse frequency selective in such a way thatthe numerous parameters in the dynamic synapse model do nothave to be carefully tuned. This is aided by the fact that the synapticweights of the hidden layer neurons are fixed and uniform in such away that scaling of theweight by ϕij produces the desired variationin firing rate response of the hidden layer neurons. This is furtherfacilitated by keeping the frequency range of encoded spike trainssufficiently small, [10 40] Hz in this work. Although it should beunderstood that the weights do not need to be carefully tuned, ASEvalues (see Eq. (4)) in the vicinity of 900 pA should suffice for mostapplications.

Employing facilitating synapses for this purpose makes thehidden layer synapses respond like a non-linear filter. A singlehidden layer containing facilitating synapses can approximate allfilters that can be characterised by the Volterra series (Maass &Sontag, 2000). Like the Taylor series, the Volterra series can beused to approximate the non-linear response of a system to agiven input. The Taylor series takes into account the input to thesystem at a particular time, whereas the Volterra series takes intoaccount the input at all other times. Such a filtering capabilityhas been recently reported in the literature. It is thought that theinterplay between facilitation and depression is responsible forthis capability, with low-pass (Dittman, Kreitzer, & Regehr, 2000),high-pass (Fortune & Rose, 2001), and band-pass (Abbott & Regehr,2004) filtering effects all having been observed.

Spiking neurons sum the post-synaptic potentials to calculatethemembrane potential. The function of each hidden layer neuronis to impose the remaining part of the antecedent fuzzy IF-THEN

Author's personal copy

C. Glackin et al. / Neural Networks 24 (2011) 247–256 251

Fig. 2. The conjunctive ‘AND’ part of the fuzzy rule using excitatory/inhibitoryreceptive fields.

rule, namely the conjunctive ‘AND’. This is not straightforwardsince simply summing the post-synaptic potentials is tantamountto performing a disjunctive ‘OR’. The collective aim of the RFsconnecting to each hidden layer neuron is to only allow spikes tofilter through to the output layer when all of the input frequenciespresented at each synapse are within the Gaussian RFs. This can beensured by making the RF excitatory within the RF and inhibitoryoutside, as with the centre-surround RFs observed in retinalganglion cells (Glackin, McDaid, Maguire, & Sayers, 2008a, 2008b).In terms of biological plausibility, this gating property of RFs hasbeen observed (Tiesinga, Fellous, Salinas, José, & Sejnowski, 2004).The degree of synchrony of inhibitory synaptic connections causessome RFs to act as gates, transmitting spikes when synchrony islow, and preventing transmission of spikes when synchrony ishigh (Tiesinga et al., 2004). This has been observed in vitro byinjecting currents into rat cortical neurons using a dynamic clampandmodifying the amount of inhibitory jitter. Of course, the gatingproperty itself is unlikely to be as clear cut as traditional ‘AND’gates, instead it is likely to be fuzzy in itself ranging between‘OR’ and ‘AND’ on a continuous scale. In this paper, the RFs aredesigned to be conjunctive ‘AND’ gates. The reason for this is thatthere is currently no rationale for the use of any other types ofgates in terms of clustering and because there are constraints onthe number of RFs that can be implemented on the computer.To produce the ‘AND’ gates, the excitatory (positive) part of theGaussian RF scales theweight in the range [0, 1], and the inhibitory(negative) part of the RF scales the weight in the range [0, (1 −N)], where N is defined as the number of input neurons. In thisway, if even one input firing rate lies outside the RF, the resultantinhibitory post-synaptic potential will have sufficient magnitudeto ‘cancel out’ the post-synaptic potentials from all the othersynapses connecting to the same hidden layer neuron. Linearspikes encoded with firing rate xi are routed to the RFs located atsynapseswij. Each synapsewij contains a centre-surround RF of thekind shown in Fig. 2.

The RFs are characterised by a central excitatory Gaussian withinhibitory surround. The activation of the RF ϕij, scales the weightwij, according to the position of the input firing rate xi and theoperating firing rate of the RF fij. More formally, the activation ϕijof the RF on the synapses wij is given by:

ϕij(xi − fij) =

e−(xi−fij)

2

2σ2ij if e

−(xi−fij)2

2σ2ij

>0.01

1 − N if e−(xi−fij)

2

2σ2ij

<0.01(8)

where xi is the firing rate of the linear input spike train from inputi,N is the size of the input layer, and fij is the operating firing rate ofthe RF. σ 2

ij is the width parameter of the RF located on synapse wij.The Gaussian RFs are asymptotic at ϕij = 0, therefore an arbitrarytolerance of 0.01 on ϕij is used to sharply implement the inhibitorypart of the RF and prevent calculation of excessively small floatingpoint numbers in MATLAB.

Fig. 3. Illustration of the learning properties of ReSuMe (remote supervision). Thelearning rules are exponential functions thatmodify the synapticweights accordingto the time difference between the desired spike time and input spike time (leftplot); and according to the time difference between the output spike time and theinput spike time (right plot).

3.3. Output layer

The aim of the depressing synapses connecting the hiddenlayer to the output layer is to produce spikes in the output ina regimented stable manner. It is then the task of the ReSuMesupervised learning algorithm (Kasiński & Ponulak, 2005) toassociate the hidden layer neurons to the output layer neurons.Fig. 3 illustrates the ReSuMe process. The learning windows aresimilar to STDP (Wd) and anti-STDP (Wout ). The parameters sd andsout denote the time delays (td − t in) and (tout − t in) respectively.

A modified version of the ReSuMe algorithm was implementedusing differential equations that enable the algorithm to beexecuted in a one-pass manner in much the same way that (Song,Miller, & Abbott, 2000) implemented the additive STDP rule. Theequations used are shown in Eqs. (9).

soutdWout

dt= −Wout and sd

dW d

dt= −W d. (9)

Thus performing the fuzzy inferencing between the hidden layer(antecedents), and the output layer (consequents).

In summary, the presented FSNN topology provides a rationalefor the use of RFs, excitatory and inhibitory neurons, as well asfacilitating and depressing synapses. The next section of the paperdescribes how such a network may be used to solve two complexnon-linear benchmark classification problems.

4. Pre-processing the data

To test the validity of the approach presented in this paper,two well-known benchmark classification datasets were utilised.The two benchmarks are the Fisher Iris (Fisher, 1936) and theWisconsin Breast Cancer (Asuncion & Newman, 2007) datasets.Arguably the two datasets have been used repeatedly for testingclassification capabilities for numerous numerical, statistical andmachine learning algorithms, and the plethora of benchmarkresults make them invaluable for refining new approaches.

4.1. Fisher Iris data

The Iris classification problem (Fisher, 1936) is well-known inthe field of pattern recognition. The data-set contains 3 classesof 50 types of Iris plant. The 3 species of plant are Iris Setosa,Iris Versicolour, and Iris Virginica. In the interests of clarity thethree classes shall be referred to as Class 1, Class 2, and Class 3respectively. Class 1 is linearly separable from Classes 2 and 3.

Author's personal copy

252 C. Glackin et al. / Neural Networks 24 (2011) 247–256

Classes 2 and 3 are not linearly separable and make Iris a complexnon-linear classification problem. The lack of sufficient trainingdata, a problem which is exacerbated by partitioning for cross-validation tests for generalisation, makes training and particularlygeneralisation for the Iris problematic. With a training set of 90data points (2/3 of the data), 5misclassifications in the training setwill result in less than 95% accuracy, and with 60 data points in thetesting (the remaining 1/3 of the data), 4 misclassifications resultsin a sub-95% testing error. In this way, classification problemsinvolving larger data sets are ‘more forgiving’.

4.2. Wisconsin breast cancer data

The Wisconsin Breast Cancer Dataset (Asuncion & Newman,2007) is another well-known benchmark classification problem.The Wisconsin breast cancer dataset contains 699 instances, with458 benign (65.5%) and 241 (34.5%) malignant cases. The datasethas only 6 samples with missing data and these samples wereremoved. Each instance is described by 9 features with an integervalue in the range 1–10 and a class label.

4.3. Positioning the receptive fields

The task of determining the number, position and spread ofRFs (membership functions) is an important step in tuning a fuzzysystem. This is because the maximum number of possible rulesgoverning the fuzzy inferencing process is determined by R = NM

where R is the number of rules, M is the number of membershipfunctions, and N is the number of inputs. For the Iris classificationtask there are 4 inputs (4 features), therefore the number of rules(number of hidden layer neurons) is given by R = 4M . This meansthat the number of possible rules grows rapidly as the numberof membership functions increases. This phenomenon is knownas ‘rule explosion’. There are many methodologies for optimisingRF placement. Methodologies include statistical techniques andclustering algorithms (Abdelbar, Hassan, Tagliarini, & Narayan,2006). For this dataset, Fuzzy C-Means (FCM) clustering was usedto calculate the positions for RF placement that facilitate separationof the input feature data into their respective classes.

Fuzzy clustering is distinct from hard clustering algorithmssuch as K -means in that a data point may belong to more thanone cluster at the same time. FCM clustering was first developedby Dunn in 1973 (Dunn, 1973) and later refined by Bezdek in 1981(Bezdek, 1981). FCM is better at avoiding local minima than K -means but it is still susceptible to the problem in some cases. Thealgorithm for FCM is an iterative procedure that aims to minimisethe following objective function.

Jm =m�

i=1

C�

i=1

umij �xi − cj�2, (1 ≤ m < ∞) (10)

where m refers to the degree of fuzziness, and should be a realnumber greater than 1. It is commonly set to 2 in the absence ofexperimentation or domain knowledge. uij is the membership ofany data point xi in the cluster j. cj is the centre of the cluster j, and� ∗ � represents any norm that measures the similarity (distance)between the coordinate xij and the cluster centre. Through iterationof the algorithm, the membership is updated with the followingfunction, until the termination criteria ε is met.

uij = 1c�

k=1

� �xi−cj��xi−ck�

� 2m−1

cj =

N�i=1

umij · xi

N�i=1

umij

. (11)

The stopping criteria is given by:

maxij

{|u(k=1)ij − um

ij |} < ε, (12)

ε is in the interval [0, 1], and k are the iteration steps.The Iris training set was clustered with FCM in a five-

dimensional sense (4 features and the class data). One advantageof FCM is that it can cope when clustering the input data withthe class data without altering the class information very much inthe resulting clusters. This was the case with the Iris data, for themost part resulting in clusters associated with a particular class.Of course, since FCM is a fuzzy clustering technique, data samplesare allowed to belong to multiple classes with varying degrees ofmembership. Careful selection of appropriate cluster widths canensure, where possible, that hidden layer clusters are associatedwith a single class. MATLAB’s FCM algorithm was used to performthe clustering.

4.4. Thresholding

The FCM program returns the cluster positions and a fuzzymembership matrix U for all the data samples in the trainingset. By setting appropriate thresholds on the fuzzy membershipsto each cluster, it is possible to determine which data samplesshould be within each cluster and which should be excluded. Careshould be taken with the thresholding to ensure that each datasample belongs to at least one cluster. Otherwise the networks’hidden layer will ignore the data sample. Fig. 4 shows an exampleof thresholding the fuzzy membership matrix U obtained usingMATLAB’s FCM clustering algorithm.

The figure showsU on the left and the thresholds on the clusterson the right. The membership values are obtained by calculatingthe Euclidean distance between the data points (1–9 in this simpleexample) and the clusters (1–6). The Euclidean distances are thenscaled by the sum of the Euclidean distances in each columnso that the total membership is equal to 1. Of course Euclideandistance is not the only distance measure that can be used todetermine the distance between a cluster and a datapoint. Severalother distance measures were investigated including (Manhattanand Mahalanobis distances to mention but a few) but subjectivelyEuclideanwas preferred for its simplicity and ease of thresholding.Once the thresholding has determined which data samples shouldbelong to each cluster (shaded in grey in Fig. 4), the variances foreach feature can be calculated from the features of each includeddata sample. These feature variances are then used to specify thewidths of each Gaussian RF.

5. Fisher Iris classification results (subjective thresholdingtechnique)

The simple example shown in Fig. 4 illustrates the thresholdingtechnique for a small number of data points and clusters, butfor larger sets optimisation of the thresholds is problematic. Thesubjective nature of the thresholding technique means that oftenthere is a trade-off between the classification of data points crisplyinto particular classes and the number of data points includedin the classification. Fig. 5 shows the hidden layer output afterthresholding has been performed for the Iris training dataset.

The original ordering of the training data is used for clarity, witheach successive 30 s interval corresponding to classes 1, 2 and 3.The order of the data does not affect the processing of the hiddenlayer dynamic synapses since synaptic resources are re-initialisedafter processing each data point. The hidden layer neurons havealso been put in order of class. As can be seen from Fig. 5, thereare 3, 3 and 4 clusters (and hence hidden neurons) associatedwith classes 1, 2 and 3 respectively. The resultant classification

Author's personal copy

C. Glackin et al. / Neural Networks 24 (2011) 247–256 253

Fig. 4. Intuitive thresholding technique.

Fig. 5. Hidden layer processing of Fisher Iris training data.

of the data is clear to see from the figure, however there aresome anomalies. The thresholding technique could not exclusivelyassociate hidden neurons 5 and 6 to class 2 data points. Similarly,hidden neurons 9 and 10 could not be exclusively associated withclass 3 data points without the hidden layer excluding some pointsaltogether. In all, there were 9 data points (10% of the training set)that were not uniquely associated with one particular class. Thismeans that 10% of the data in the training set is unresolved in termsof the FSNN associating the data correctly with the class data. It isthe task of the ReSuMe supervised training regime to resolve thiserror.

5.1. Resume training

The first step in implementing the ReSuMe trainingwas to spec-ify the supervisory spike trains to be used. The ReSuMe algorithmwas ‘forgiving’ in this respect producing good convergence for awide range of supervisory frequencies. Therefore supervisory spiketrains of a 50 Hz frequency were delivered to the appropriate classoutput neuron whenever the data sample belonged to that class.The supervisory signal for the other class neurons for the givendata sample was of zero frequency. The learning windows wereequal and opposite in magnitude producing a maximum possible

weight update of 0.05. Each training data sample is deemed cor-rectly classified when it produces the maximum number of out-put spikes at the correct output class neuron. The training resultedin 2/90 misclassified data samples (97.7% accuracy). Once trainingwas completed, all weights were then fixed and the unseen test-ing data (60 samples) were presented to the network. During thetesting phase 3/60 data pointsweremisclassified (95.0% accuracy),showing good generalisation.

6. Wisconsin breast cancer classification results (evolutionarythreshold optimisation)

When producing clusters for the Iris dataset, the classinformation was included in the clustering procedure as describedin Section 4.3. For the Wisconsin dataset it was desired to clusterthe feature data in ignorance of the class information. Clusteringwithout prior knowledge of class information is problematicbecause data points may be close together with regards to featuredata but differ solely with regards to class information. Thereforeany clusters generated may contain data points belonging to bothclasses. The task of the thresholding technique is complicatedby this cluster ‘fuzziness’. It was discovered that the subjectivethresholding approach was time-consuming with regards tolarger datasets. Preliminary experiments with FCM indicated thatin order to achieve objective values comparable with the Irisdataset (with data scaled into the same frequency range), moreclusters would be required to be generated. The larger number ofthresholds accompanying these clusters further complicates theoptimisation of the clusters. With MATLAB’s FCM algorithm it isrequired to specify the number of clusters required in advance.However, specifying too many can create ‘clusters’ that containonly one data-point; obviously these types of clusters could havedisastrous consequences for generalisation. It is desirable thereforeto use an optimisation technique to determine the requirednumber of clusters, and perform the thresholding for problemsthat involve larger datasets such as the Wisconsin Breast Cancerdataset.

A Genetic Algorithm (GA) was used to find a good solution tothe thresholding problem for the Wisconsin dataset, thus opti-mising the configuration of cluster sizes. GAs are particularly use-ful for finding good solutions for problems with high dimensionalparameter spaces (Goldberg, 1989; Holland, 1975). In the typical

Author's personal copy

254 C. Glackin et al. / Neural Networks 24 (2011) 247–256

Fig. 6. GA flow of operation.

GA considered in this research, a genotype encodes the solutionto the problem. The genotype is constructed from the individualcluster thresholds (alleles), and then a fitness function determinesthe merit of each solution to the problem by assigning a fitnessvalue to each genotype. In this way, by utilising thresholds as thebasis for the hidden layer RF configuration, individual chromo-somes in the GA are compact and as such well-suited to evolution-ary optimisation (Cantu-Paz & Kamath, 2005). The GA developedis of the indirect topological variety (Miller, Todd, & Hegde, 1989),since chromosomes only contain information about part of thestructure of the FSNN.

The following equations define fitness. ThemembershipmatrixU returned from the FCM algorithm is first augmented with thethreshold vector T (genotype) to give:

U =

u1,1 · · · u1,n. . .

... ui,j...

. . .

uM,1 · · · uM,n

������������

T1...Ti...TM

(13)

where M is the size of the hidden layer (number of clusters) andn is the number of data points in the training set. A membershipindex matrix S is constructed according to the following rules:

si,j =�1 (ui,j ≥ T )i0 (ui,j < Ti).

(14)

The elements of the thresholded-membership matrix V are thencalculated using:vi,j = ui,jsi,j. (15)It is known from the training data which elements of vi,j belongto which class, therefore we define v(q) as being an element of Vbelonging to class q. The fitness of each allele cannowbe evaluated:

Fi =

1 −

n�j=1

v(1)i,j

n�j=1

v(1)i,j +

n�j=1

v(2)i,j

�n�

j=1

v(1)i,j >

n�

j=1

v(2)i,j

1 −

n�j=1

v(2)i,j

n�j=1

v(1)i,j +

n�j=1

v(2)i,j

�n�

j=1

v(1)i,j <

n�

j=1

v(2)i,j

1

�n�

j=1

v(1)i,j =

n�

j=1

v(2)i,j

0

�n�

j=1

v(1)i,j =

n�

j=1

v(2)i,j = 0

.

(16)

Finally the total fitness of each genotype is given by:

F =M�

i=1

Fi. (17)

According to Eq. (16) high fitness (a fitness value of zero) isawarded to those thresholds that crisply classify the data, whereasthresholds that classify data points belonging to both classes resultin a fitness value close to 1 (low fitness), as do thresholds that donot classify the data at all. In this way the GA does not tell theclusters which class they belong to, it simply rewards crispness ofclassification.

As well as this objective, there are several constraints that needto be applied to theGA. The thresholds that are randomly initialisedor mutated must be in the range [0, 1] as numbers outside thisrange have no meaning in terms of fuzzy membership. Addition-ally, every data point must be processed by at least one clusterto ensure that data is not ignored by the hidden layer of the net-work. The FCMalgorithm is first used to generate 100 cluster candi-dates. A population of 100 randomly generated solutions is evolvedover 100 generations under the control of genetic operators. Theprincipal operators are selection (preferential sampling), cross-over (recombination) and mutation. Fig. 6 illustrates the flow ofoperation of the GA.

A population of 100 individuals are initialized in the range[0, 1], every individual in the population is evaluated accordingto Eqs. (13)–(17). Ranking of the individuals is then performedaccording to fitness. The two individuals with the highest fitnessare termed elite individuals and are retained for at least one moregeneration. A cross-over fraction of 0.8 is used to determine thatthe next 78 (0.8 of the population of non-elite individuals) are usedfor recombination. Recombination or reproduction is performedby randomly selecting multiple cross-over points according tothe uniform distribution. Then the remaining 20 individuals aremutated. Constrainedmutation is employed by necessity as valuesfor individual thresholds have no meaning outside the range [0,1], since this is the range of fuzzy membership of data-points toclusters. Thus the next generation is constructed and the algorithmiterates until the target fitness has been reached, or the fitness hasnot improved by a significant amount for 20 generations, or themaximum number of generations has been reached. Fig. 7 showsa plot of the mean fitness against the fittest individual at everygeneration.

As can be seen from the figure the mean fitness approaches thebest fitness when the GA approximates its best solution. It shouldbe noted however, that since the GA initialises its thresholdsrandomly that it is important to perform multiple runs.

The hidden layer output resulting from fold 3 of the trainingdata is shown in Fig. 8.

The data is in class order for clarity and the classificationfrom the hidden layer is clearly shown. The hidden layer neurons

Author's personal copy

C. Glackin et al. / Neural Networks 24 (2011) 247–256 255

Fig. 7. GA best and mean fitness for Wisconsin breast cancer training data (fold 3).

Fig. 8. Hidden layer output for Wisconsin breast cancer training set (fold 3).

(plotted vertically in the figure) are active in the first approxi-mately two thirds of the data (Class 1) or in the final third. The de-gree of activity of the hidden layer neurons is shown by the densityof the spikes. It is clear to see for each hidden layer neuron whichclass it is classifying, although there is of course a certain amount offuzziness when a particular hidden layer neuron produces spikesacross the whole range of input data (but favours one class overanother in terms of density of spikes). Additionally, as can be seenfrom the figure, the output of only 41 of the original clusters is pre-sented. The reason for this is that the GA sets the cluster sizes of theother 59 clusters to 0; therefore they are removed from the imple-mentation of the clusters in the hidden layer.

6.1. ReSuMe training

As with the Iris problem, once the hidden layer RFs havebeen configured it is the task of the ReSuMe supervised learningto associate hidden layer neurons to output class neurons. Thedifference between the Subjective and Evolutionary approachesis that the latter is by definition an automated technique and assuch cross-validation (CV) can be utilised to test for generalisation.Thus five-fold CV was implemented and the results are shown inTable 1.

Table 1Wisconsin results.Fold 1 2 3 4 5 Average

(%)Standard deviation

Number ofclusters

35 40 41 41 43

Training (%) 98.5 98.5 98.8 98.5 97.5 98.4 0.45Testing (%) 94.8 95.9 97.8 97.0 97.0 96.5 1.04

7. Discussion

Leave-one-out tests have been criticised as a basis for accuracyevaluation, with the conclusion that CV is more reliable (Kohavi,1995). However, CV tests are also not ideal. Theoretically about2/3 of results should be within a single standard deviation fromthe average, and 95% of results should be within two standarddeviations, so in a 10-fold CV you should very rarely see resultsthat are better or worse than two standard deviations. The maincriticism of generalisation tests is that running them severaltimes produces different solutions. Therefore, the search for thebest generalisation estimator is still an open problem (Dietterich,1998; Nadeau & Bengio, 2003). Even the best accuracy andvariance estimation is not sufficient, since performance cannotbe characterised by a single number. It would be much better toprovide full Receiver Operator Curves (ROC). However, while ROCcurves are considered to be superior it is useful to at least usegeneralisation tests that aid comparison to previous work on thebenchmark datasets presented in this paper.

The presented FSNN topology is biologically based in a numberof ways. As with all SNNs the topology uses the temporal encodingof data into spike trains. The topology contains facilitatingexcitatory and inhibitory dynamic synapses (hidden layer) that actas non-linear filters. There is even biological evidence to suggestthat the gating phenomenon implemented in its purest sense inthiswork (i.e. as pure ANDgates) exists in the biology. In particular,when there is synchrony between inhibitory connections (Tiesingaet al., 2004). However, it is unclear to what extent the formation ofRFs in the biology is evolved or learned; at best the mechanismsgoverning their use are poorly understood. Certainly there arestructural aspects of RFs that are likely evolved (for example retinalganglion cell and photoreceptors) and there is evidence that RFscan adapt to new stimuli. However, study of this phenomenon islimited by experimental technique and development, and it is fairto say the field has considerable scope for further development.

8. Conclusions and future work

This paper presents a detailed review of relevant publicationson existing training algorithms and associated biological models.To underpin the FSNN topology, results for the well-known FisherIris and Wisconsin Breast Cancer classification problems are pre-sented. The FSNN topology provides a rationale for the assemblyof biological components such as excitatory and inhibitory neu-rons, facilitating and depressing synapses, and RFs. In particular,the major contribution of this paper is how RFs may be configuredin terms of excitation and inhibition to implement the conjunctiveAND of the antecedent part of a fuzzy rule.

Currently, thework relies on the use of FCMclustering as a strat-egy for combating the rule-explosion phenomenon of fuzzy logicsystems (FLS). Although fuzzy reasoning provides a rationale forthe deployment of biological models, there are as yet no biologi-cal models for the positioning and size of RFs. Typically in FLSs thisis done using purely computational techniques such as gradient-descent algorithms or intuitively. Often membership functions are

Author's personal copy

256 C. Glackin et al. / Neural Networks 24 (2011) 247–256

chosen for the smoothness of control surfaces or for other reasons,it is as much art as science and very problem specific. Perhaps for-tunately, neuroscientists are studying RF formationmore intenselythan ever before and it seems plausible that a biological alternativeto using such computational clustering algorithms may exist. Ide-ally, it would be preferable for the FSNN to determine the number,placement and spread of fuzzy clusters, without relying on exter-nal statistical or clustering techniques in reflection of the biology.For this reason future work, in recognition of the growing inter-est in biological RFs, will involve the development of dynamic RFs.The ultimate aim is to develop a biologically plausible FSNN thattunes itself using strictly biological principles, and in that regardthis work represents a significant contribution.

References

Abbott, L., & Nelson, S. (2000). Synaptic plasticity: taming the beast. NatureNeuroscience, 3, 1178–1183.

Abbott, L. F., & Regehr, W. G. (2004). Synaptic computation. Nature, 431(7010),796–803.

Abdelbar, A., Hassan, D., Tagliarini, G., & Narayan, S. (2006). Receptive fieldoptimization for ensemble encoding. Neural Computing & Applications, 15, 1–8.

Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository.http://www.ics.uci.edu/~mlearn/MLRepository.html.

Barlow, H. (1953). Summation and inhibition in the frog’s retina. The Journal ofPhysiology, 119, 69.

Belatreche, A., Maguire, L., McGinnity, T., & Wu, Q. (2003). A method for thesupervised training of spiking neural networks. IEEE Cybernetics Intelligence-Challenges and Advances (CICA), 39–44.

Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms.Norwell, MA: Kluwer Academic Publishers.

Bienenstock, E., Cooper, L., & Munro, P. (1982). Theory for the development ofneuron selectivity: orientation specificity and binocular interaction in visualcortex. Journal of Neuroscience, 2, 32–48.

Bi, G., & Poo, M. (1998). Synaptic modifications in cultured hippocampal neurons:dependence on spike timing, synaptic strength, and postsynaptic cell type.Journal of Neuroscience, 18(24), 10464–10472.

Bohte, S., Kok, J., & La Poutré, H. (2002). Error-backpropagation in temporallyencoded networks of spiking neurons. Neurocomputing , 48, 17–37.

Börgers, C., & Kopell, N. (2003). Synchronization in networks of excitatory andinhibitory neurons with sparse, random connectivity. Neural Computation, 15,509–538.

Cantu-Paz, E., & Kamath, C. (2005). An empirical comparison of evolutionaryalgorithms and neural networks for classification problems. IEEE Transactionson Systems, Man and Cybernetics, 35(5), 915–927.

Carnell, A., & Richardson, D. (2005). Linear algebra for time series of spikes.In Proceedings of the 13th European symposium on artificial neural networks(pp. 363–368).

del Castillo, J., & Katz, B. (1954). Quantal components of the end-plate potential.The Journal of Physiology, 124, 560–573.

DeWeese, M., & Zador, A. (2006). Neurobiology: efficiency measures. Nature, 439,936–942.

Dietterich, T. (1998). Approximate statistical tests for comparing supervisedclassification learning algorithms. Neural Computation, 10, 1895–1923.

Dittman, J. S., Kreitzer, A. C., & Regehr, W. G. (2000). Interplay between facilitation,depression, and residual calcium at three presynaptic terminals. Journal ofNeuroscience, 20(4), 1374–1385.

Dunn, J. (1973). A fuzzy relative of the ISODATA process and its use in detectingcompact well-separated clusters. Cybernetics and Systems, 3, 32–57.

Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Annalsof Eugenics, 7, 179–188.

Fortune, E. S., & Rose, G. J. (2001). Short-term synaptic plasticity as a temporal filter.Trends in Neurosciences, 24(7), 381–385.

Fuhrmann, G., Segev, I., Markram, H., & Tsodyks, M. (2002). Coding of temporalinformation by activity-dependent synapses. Journal of Neurophysiology, 87,140–148.

Glackin, C., McDaid, L., Maguire, L., & Sayers, H. (2008). Implementing fuzzy rea-soning on a spiking neural network. In Proceedings of the 18th internationalconference on artificial neural networks. Part II (pp. 258–267).

Glackin, C., McDaid, L., Maguire, L., & Sayers, H. (2008). Classification using afuzzy spiking neural network. In The 2008 UK workshop on computationalintelligence (UKCI). http://www.cci.dmu.ac.uk/conferences/ukci2008/papers/Classification-using-a-Fuzzy-Spiking-Neural-Network.pdf.

Goldberg, D. (1989). Genetic algorithms in search, optimization and machine learning.Boston, MA, USA: Addison-Wesley Longman Publishing Co Inc.

Hebb, D. (1949). The organization of behavior: a neuropsychological theory. NewYork: Wiley.

Holland, J. (1975). Adaptation in natural and artificial systems. MI: Michigan Press.Izhikevich, E. (2004). Which model to use for cortical spiking neurons? IEEE

Transactions on Neural Networks, 15, 1063–1070.Izhikevich, E. M., & Desai, N. S. (2003). Relating STDP to BCM. Neural Computation,

15, 1511–1523.Kasiński, A., & Ponulak, F. (2005). Experimental demonstration of learning proper-

ties of a new supervised learning method for the spiking neural networks. InProceedings of the 15th international conference on artificial neural networks, Vol.3696 (pp. 145–153).

Kasiński, A., & Ponulak, F. (2006). Comparison of supervised learning methods forspike time coding in spiking neural networks. International Journal of AppliedMathematics and Computer Science, 16, 101.

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimationand model selection. In International joint conference on artificial intelligence,Vol. 14 (pp. 1137–1145).

Legenstein, R., Naeger, C., & Maass, W. (2005). What can a neuron learn withspike-timing-dependent plasticity? Neural Computation, 17, 2337–2382.

Maass, W. (1997). Networks of spiking neurons: the third generation of neuralnetwork models. Neural Networks, 10, 1659–1671.

Maass, W., Nätschlager, T., & Markram, H. (2002). Real-time computing withoutstable states: a new framework for neural computation based on perturbations.Neural Computation, 14, 2531–2560.

Maass, W., & Sontag, E. (2000). Neural systems as nonlinear filters. Neural Compu-tation, 12, 1743–1772.

Markram, H., Lübke, J., Frotscher, M., & Sakmann, B. (1997). Regulation of synapticefficacy by coincidence of postsynaptic APs and EPSPs. Science, 275(5297),213–215.

Miller, G., Todd, P., & Hegde, S. (1989). Designing neural networks using geneticalgorithms. In Proceedings of the third international conference on geneticalgorithms (pp. 379–384).

Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. MachineLearning , 52(3), 239–281.

Nätschlager, T., & Maass, W. (2001). Finding the key to a synapse. Advances inNeural Information Processing Systems, 138–144.

Pantic, L., Torres, J., & Kappen, H. (2003). Coincidence detection with dynamicsynapses. Network Computation in Neural Systems, 14, 17–33.

Pfister, J., Barber, D., & Gerstner, W. (2003). Optimal Hebbian learning: a proba-bilistic point of view. Lecture Notes in Computer Science, 92–98.

Rieke, F. (1997). Spikes: exploring the neural code. MIT Press.Ruf, B., & Schmitt, M. (1997). Learning temporally encoded patterns in networks of

spiking neurons. Neural Processing Letters, 5, 9–18.Song, S., Miller, K., & Abbott, L. (2000). Competitive Hebbian learning through

spike-timing-dependent synaptic plasticity. Nature Neuroscience, 3, 919–926.Sougné, J. (2000). A learning algorithm for synfire chains. In Connectionist models of

learning, development and evolution. Proceedings of the sixth neural computationand psychology workshop (p. 23).

Thomson, A. (1997). Synaptic interactions in neocortical local circuits: dualintracellular recordings in vitro. Cerebral Cortex, 7, 510–522.

Tiesinga, P., Fellous, J., Salinas, E., José, J., & Sejnowski, T. (2004). Inhibitory syn-chrony as a mechanism for attentional gain modulation. Journal of Physiology,98, 296–314.

Tsodyks, M., Pawelzik, K., & Markram, H. (1998). Neural networks with dynamicsynapses. Neural Computation, 10, 821–835.

Zadeh, L. (1965). Fuzzy sets. Information and Control, 8, 338–353.